Context
Follow-up from #71 / PR #80. PR #80 cut idle and peak-write memory (buffer right-sizing + a workers knob), but the full multi-phase benchmark still peaks at ~32 MB RssAnon, and that peak comes from the concurrent query/store and query/search phases, not peak-write.
Evidence (16-core, ReleaseFast, tuned build from #80)
| Phase |
peak RssAnon |
| idle |
11 MB |
--only-peak write |
11.7 MB |
full suite (-e 5000 -w 8 --rate 0, all phases) |
32 MB |
Connection churn plateaus (no leak, per #64); this is transient working set under an 8-worker concurrent query hammer, not retention.
Where to look
streamQueryResults (src/handler.zig) allocates a [65536]u8 message buffer per stream call; with the full handler-pool concurrent, that is a few MB of stack at once. Could be right-sized or pooled.
query_cache.zig: bounded to 64 entries, but each entry dupes up to limit event JSONs (query_limit_max is 5000), so a few large entries can be several MB. Consider capping total cache bytes rather than entry count.
- The query path itself already uses a lazy LMDB iterator (good — not materializing full result sets), so the win is in the per-query/per-handler buffers and the cache sizing, not the scan.
Acceptance
Lower the concurrent-query working-set peak without regressing query throughput/latency. Validate with the full nostr-bench suite (not just --only-peak) and the #64 churn plateau.
Context
Follow-up from #71 / PR #80. PR #80 cut idle and peak-write memory (buffer right-sizing + a
workersknob), but the full multi-phase benchmark still peaks at ~32 MB RssAnon, and that peak comes from the concurrent query/store and query/search phases, not peak-write.Evidence (16-core, ReleaseFast, tuned build from #80)
--only-peakwrite-e 5000 -w 8 --rate 0, all phases)Connection churn plateaus (no leak, per #64); this is transient working set under an 8-worker concurrent query hammer, not retention.
Where to look
streamQueryResults(src/handler.zig) allocates a[65536]u8message buffer per stream call; with the full handler-pool concurrent, that is a few MB of stack at once. Could be right-sized or pooled.query_cache.zig: bounded to 64 entries, but each entry dupes up tolimitevent JSONs (query_limit_maxis 5000), so a few large entries can be several MB. Consider capping total cache bytes rather than entry count.Acceptance
Lower the concurrent-query working-set peak without regressing query throughput/latency. Validate with the full
nostr-benchsuite (not just--only-peak) and the #64 churn plateau.