1. Priority
Medium-High - no data loss, but a single task can fill a reactor's local disk via unbounded journal-append spool and hard-fail every shard on that reactor (collateral outage), including the data plane's L1 stats rollup. Recovered per-incident by a reactor restart; latent on every data plane.
2. Scope and prevalence
The reactor (flowctl-go) spools journal appends to /mnt/local before persisting fragments to the fragment store. When appends arrive faster than they persist, or a single append/checkpoint is very large, the local spool grows with no cap and no backpressure. Theoretical reach: every data plane. Observed prevalence: a private AWS plane during a large Postgres backfill; the same class of reactor disk-fill has been seen before on another source-postgres backfill, so this is a recurring pattern, not a one-off.
3. Trigger (testable)
A high-volume capture (here source-amazon-rds-postgres) running a large backfill emits a checkpoint per backfill chunk, and each checkpoint is written to the reactor's gazette append buffer on /mnt/local. With a large backfill_chunk_size and large documents, a single checkpoint reaches tens of GB. Combined with high sustained append volume into a few hot collections, the spool grows until /mnt/local is exhausted, at which point every shard on the reactor fails with disk-full.
4. Root cause (confirmed)
Confirmed - no cap or backpressure on the local append/fragment spool. lsof +L1 showed flowctl-go (the reactor process) holding ~124 GB of deleted-but-open spool files: hundreds of /mnt/local/reactor/gazette-append* (~70 MB each) plus one anonymous #<inode> buffer of 71.6 GB (a single backfill-chunk checkpoint). The files were unlinked but held open, so du could not see them while the blocks stayed allocated.
5. Investigation steps / reproduction (testable)
df -h /mnt/local: 97% used, 0 free at peak. du -x /mnt/local: ~1.3 GB. podman system df: <1 GB. The ~124 GB gap was unaccounted.
lsof +L1 | grep mnt/local: holder flowctl-go, entries gazette-append* plus a 71.6 GB anonymous deleted file.
- Every shard on the reactor failed disk-full. The L1
catalog-stats derivation failed:
runTransactions: txnStartCommit: app.FinalizeTxn: h2 protocol error: error reading a body from connection
sudo systemctl restart reactor.service released the held fds; /mnt/local recovered. It refilled on each subsequent large backfill chunk until the capture's backfill_chunk_size was lowered and the reactor disk was enlarged.
6. Possible fixes
- Operational (used here): restart the reactor to release held spool; lower the capture's
backfill_chunk_size (smaller checkpoints); enlarge reactor /mnt/local.
- Platform: cap the local append/fragment spool and apply backpressure (stall or slow appends) before
/mnt/local is exhausted, instead of spooling unbounded and hard-failing every shard on the reactor.
- Investigate why a single 71.6 GB checkpoint buffer never persisted to the fragment store (was persistence keeping up with the append rate?).
7. References
1. Priority
Medium-High - no data loss, but a single task can fill a reactor's local disk via unbounded journal-append spool and hard-fail every shard on that reactor (collateral outage), including the data plane's L1 stats rollup. Recovered per-incident by a reactor restart; latent on every data plane.
2. Scope and prevalence
The reactor (
flowctl-go) spools journal appends to/mnt/localbefore persisting fragments to the fragment store. When appends arrive faster than they persist, or a single append/checkpoint is very large, the local spool grows with no cap and no backpressure. Theoretical reach: every data plane. Observed prevalence: a private AWS plane during a large Postgres backfill; the same class of reactor disk-fill has been seen before on another source-postgres backfill, so this is a recurring pattern, not a one-off.3. Trigger (testable)
A high-volume capture (here source-amazon-rds-postgres) running a large backfill emits a checkpoint per backfill chunk, and each checkpoint is written to the reactor's gazette append buffer on
/mnt/local. With a largebackfill_chunk_sizeand large documents, a single checkpoint reaches tens of GB. Combined with high sustained append volume into a few hot collections, the spool grows until/mnt/localis exhausted, at which point every shard on the reactor fails with disk-full.4. Root cause (confirmed)
Confirmed - no cap or backpressure on the local append/fragment spool.
lsof +L1showedflowctl-go(the reactor process) holding ~124 GB of deleted-but-open spool files: hundreds of/mnt/local/reactor/gazette-append*(~70 MB each) plus one anonymous#<inode>buffer of 71.6 GB (a single backfill-chunk checkpoint). The files were unlinked but held open, soducould not see them while the blocks stayed allocated.5. Investigation steps / reproduction (testable)
df -h /mnt/local: 97% used, 0 free at peak.du -x /mnt/local: ~1.3 GB.podman system df: <1 GB. The ~124 GB gap was unaccounted.lsof +L1 | grep mnt/local: holderflowctl-go, entriesgazette-append*plus a 71.6 GB anonymous deleted file.catalog-statsderivation failed:sudo systemctl restart reactor.servicereleased the held fds;/mnt/localrecovered. It refilled on each subsequent large backfill chunk until the capture'sbackfill_chunk_sizewas lowered and the reactor disk was enlarged.6. Possible fixes
backfill_chunk_size(smaller checkpoints); enlarge reactor/mnt/local./mnt/localis exhausted, instead of spooling unbounded and hard-failing every shard on the reactor.7. References