Optional in-process memory composition snapshot at RDB save time#9
Open
artikell wants to merge 4 commits into
Open
Optional in-process memory composition snapshot at RDB save time#9artikell wants to merge 4 commits into
artikell wants to merge 4 commits into
Conversation
8e9eb63 to
dae4e3f
Compare
Introduce a new boolean configuration "rdb-save-stat-memory" (default: no). When enabled, the RDB save child process iterates each key in rdbSaveDb() and accumulates per-type key counts and approximate memory usage via objectComputeSize() (sample size 5, matching MEMORY USAGE). The result is sent to the parent process through the existing child_info_pipe using a new CHILD_INFO_TYPE_RDB_OBJECT_STATS message type. After the child exits, backgroundSaveDoneHandlerDisk() emits a single LL_NOTICE log line summarizing total keys/bytes and per-type breakdown (string/list/set/zset/hash/stream/module). This is useful for offline diagnostics of memory composition without paying the cost on every save. Also add an integration test in tests/integration/rdb.tcl that verifies both the disabled and enabled paths via log inspection after BGSAVE.
Simplify the per-key memory accounting collected when rdb-save-stat-memory is enabled to track only the object type, dropping the encoding axis. The summary log line now reports one entry per type (e.g. "string=N/MB list=N/MB") instead of per (type, encoding).
When `rdb-save-stat-memory` is enabled, sample 11 per-category memory
counters in the RDB save child process alongside the existing per-type
object stats, and emit a single `RDB memory breakdown: ...` log line in
the parent after the save completes.
The breakdown groups memory into three top-level categories:
- kv : user data, expiration kvstore, hash field-TTL kvstore
- user_meta : main keyspace kvstore overhead, misc (command tables
and pubsub channels), client I/O buffers
- sys : command log entries, AOF buffer, global replication
buffer, replication backlog (with rax index)
Sampling runs in the RDB child so values reflect the same COW snapshot
as the per-type stats. Only categories with an existing *MemUsage /
*AllocSize accessor are sampled to avoid introducing new public APIs;
subsystems that would require approximation (pubsub patterns dict,
tracking rax tables, latency events dict) are intentionally omitted.
The new fields are flattened into the existing `rdbObjectStats` struct
and travel through the existing `CHILD_INFO_TYPE_RDB_OBJECT_STATS`
child-info pipe message - no new struct, no new IPC type, no new
configuration knob.
Tests: add an integration case asserting the breakdown line appears
only when the option is enabled and that load-bearing categories
(kv.user, kv.expire, user_meta.kvstore, user_meta.misc) are non-zero
after a representative workload. Also fix the existing per-type stats
test to match the post `(type, encoding) -> type` log format.
dae4e3f to
3e59fc0
Compare
Roll back the conf-file block describing `rdb-save-stat-memory` to match `origin/unstable`. The runtime option itself remains intact in `config.c`; only the documentation block in `valkey.conf` is reverted so the shipped config file stays in sync with upstream.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The problem/use-case that the feature addresses
When operating Valkey at scale we frequently need to answer the question:
"On the moment of the last full snapshot, how exactly was memory distributed across user data, server-side metadata, and operational buffers?"
Today the only way to get this answer is to run
MEMORY STATS/INFO memoryad-hoc, which has two practical issues:BGSAVEevents, which is brittle and lossy. For diagnostics on customer clusters and for offline cost-attribution analysis we want a single deterministic record per RDB.We also want to keep this strictly opt-in and zero-cost when the operator does not enable it, so that production paths are not affected.
Description of the feature
Introduce a new boolean configuration
rdb-save-stat-memory(defaultno). When enabled, the RDB save child process samples real memory usage during the same walk that serializes each key, and emits twoLL_NOTICElog lines from the parent after the child exits. No new IPC type, no new struct, no impact on the on-disk RDB format, and zero overhead when the option is disabled.Log line 1 — per-type object stats
rdbSaveDb()accumulates the exact allocated footprint of each key (using the byte counter thatrdbSaveObject()already maintains for COW dismissal) into per-type counters, then sends a singleCHILD_INFO_TYPE_RDB_OBJECT_STATSmessage through the existingchild_info_pipe. The parent prints:Concrete example (after a small mixed workload):
Format rules:
<type>is one ofstring | list | set | zset | hash | stream | module(the sevenOBJ_*types).OBJ_TYPE_*enum order.countis the number of keys of that type that survived to be persisted.bytesis the exact allocated size (not sampled), captured during the COW snapshot — it includes therobjheader plus the type-specific payload (sds bytes, listpack/quicklist nodes, hashtable entries, rax nodes, etc.).total keys/mem_bytesare the sums across all types, so externallyΣ count == total keys,Σ bytes == mem_bytes.Log line 2 — per-category memory breakdown
In addition to the per-type line above,
rdbCollectMemBreakdown()runs in the same RDB child at the end ofrdbSaveRio()and fills 10 extra category counters (flattened into the samerdbObjectStatsstruct, no new struct, no new IPC type). The parent prints:Concrete example:
Category mapping:
Only categories with an existing
*MemUsage/*AllocSizeaccessor are sampled. Subsystems that would require approximation (pubsub patterns dict, tracking rax tables, latency events dict) are intentionally omitted to avoid introducing new public APIs and to keep numbers exact.Alternatives you've considered
backgroundSaveDoneHandlerDisk: simplest, but it reflects the state at save-completion time, not the state that was actually persisted. Rejected.MEMORY STATSon a cron: requires external scheduling, drifts from the snapshot, and pollutes the main thread. Rejected.valkey-check-rdband replicas read it, but couples the on-disk format with internal memory-accounting choices. Rejected for now — the log line is sufficient for the diagnostic use case and keeps the on-disk format stable.childInfoType: would force every consumer ofchild_info_pipeto handle a second message per save. Rejected in favor of flattening into the existingrdbObjectStatspayload.*MemUsagehelpers for pubsub-patterns / tracking / latency: adds API surface and approximation logic that operators cannot calibrate. Deferred — those subsystems are deliberately not reported.Additional information
src/rdb.c::rdbSaveDb(usesrio.processed_bytesdelta — zero extra walk)src/rdb.c::rdbCollectMemBreakdown(called fromrdbSaveRio)src/server.h::rdbObjectStats(per-type counters + 10 flattened breakdown fields)src/rdb.c::backgroundSaveDoneHandlerDiskkvstoreMemUsage,hashtableMemUsage,sdsAllocSize,raxAllocSize,getMemoryOverheadData's repl_buffer / repl_backlog split formulatests/integration/rdb.tcl—rdb-save-stat-memory logs per-type object stats after BGSAVE: asserts theRDB object stats:line is gated by the option and that each populated<type>=<count>/<bytes>Btoken hascount > 0.rdb-save-stat-memory logs memory breakdown after BGSAVE: asserts theRDB memory breakdown:line is gated by the option and that load-bearing categories (kv.user,kv.expire,user_meta.kvstore,user_meta.misc) are non-zero after a representative workload.no; existing log output is unchanged when disabled. The two extra log lines only appear when explicitly enabled.INFO persistence(e.g.rdb_last_object_stats_*/rdb_last_memory_breakdown_*) for scraping, or are the log lines sufficient? Happy to follow up with a separate PR if there is interest.