Note
I extracted this from an in-house database engine. It's a work in progress, but I wanted to share and improve it in a separate repo. Feedback and contributions are very welcome!
logfile is a concurrent, append-only log file optimized for SSDs and high throughput in Go.
Most log file implementations flush after every write. This one batches writes into a single IO + fsync, which makes a huge difference on SSDs.
The library exposes two entry points so you can pick the right tradeoff between throughput and per-record durability:
Writeblocks until the appended record is durable. Many concurrentWritecalls naturally batch into one fsync (group commit). Use this when each record must be on disk before the caller continues and you have enough concurrent writers to amortize fsync latency.WriteAsyncreturns as soon as the record is buffered. A single writer can keep submitting records while the background flusher fsyncs the previous batch. CallFlushat a checkpoint to wait for durability. This is the high-throughput path for a small number of writers.
Internally, a single background flusher drains the buffer using a ping-pong pattern, so new appends can be staged while the previous batch is being fsynced. Once the flush completes, all waiting writers are notified and the next batch can be flushed.
- Group commit: concurrent
Writecalls are batched into one write + fsync - Pipelined async path: a single writer can pipeline records via
WriteAsyncfor very high throughput - Optimized for SSDs: minimizes write amplification with 4KB-aligned writes
- Zero heap allocations per write
- Safe: no torn writes. When
Writereturnsnil, the data is on disk. - CRC32 checksums for data integrity (optional)
- Small API surface: two write methods plus
FlushandClose
go get github.com/alialaee/logfile
See benchmark/README.md.
Real-disk throughput on an M4 MacBook Air at 900-byte records:
| Mode | Throughput |
|---|---|
Single writer, Write (fsync per call) |
0.02 MB/s |
Single writer, WriteAsync + Flush |
411 MB/s |
10 goroutines, WriteAsync |
406 MB/s |
10 goroutines, Write |
1.16 MB/s |
500 goroutines, Write |
58.7 MB/s |
Write throughput is bounded by fsync_latency / records_per_fsync, where the batch size is at most the number of concurrent writers. If you need high throughput from a small number of writers, use WriteAsync.
Cross-library comparison (20 concurrent writers, record sizes 512 bytes to 1 MB):
=== Logfile Write (This library)
Total bytes written: 477 MB
Time taken: 576.585416ms
Throughput: 828.85 MB/s
=== Direct File Write+Flush ===
Total bytes written: 119 MB
Time taken: 1.022756167s
Throughput: 116.82 MB/s
=== Tidwall WAL Write ===
Total bytes written: 119 MB
Time taken: 1.117973042s
Throughput: 106.87 MB/s
=== Hashicorp Raft Write ===
Total bytes written: 119 MB
Time taken: 1.197531459s
Throughput: 99.77 MB/s
=== Pebble Record Write ===
Total bytes written: 477 MB
Time taken: 678.86125ms
Throughput: 703.98 MB/s
f, _ := os.OpenFile("my.log", os.O_CREATE|os.O_RDWR, 0644)
lf, _ := logfile.New(f, 0, 1024*1024, true) // 1MB buffer, CRC enabled
offset, _ := lf.Write(context.Background(), []byte("hello world"))
// Read back
reader := logfile.NewReader(f, 0)
data, _ := reader.ReadNext(nil)
lf.Close()
f.Close()for _, rec := range records {
_, _ = lf.WriteAsync(context.Background(), rec)
}
// Block until everything submitted above is on disk.
_ = lf.Flush(context.Background())MIT