Skip to content

Optional in-process memory composition snapshot at RDB save time#9

Open
artikell wants to merge 4 commits into
unstablefrom
feature/rdb-save-stat-memory
Open

Optional in-process memory composition snapshot at RDB save time#9
artikell wants to merge 4 commits into
unstablefrom
feature/rdb-save-stat-memory

Conversation

@artikell
Copy link
Copy Markdown
Owner

@artikell artikell commented May 22, 2026

The problem/use-case that the feature addresses

When operating Valkey at scale we frequently need to answer the question:
"On the moment of the last full snapshot, how exactly was memory distributed across user data, server-side metadata, and operational buffers?"

Today the only way to get this answer is to run MEMORY STATS / INFO memory ad-hoc, which has two practical issues:

  1. The numbers are sampled in the main thread at the time the operator (or tooling) issues the command. They drift from the dataset that was actually persisted to disk, and they include any memory churn that happened in the meantime (incoming writes, eviction, defrag, etc.).
  2. There is no built-in audit trail. Reconstructing the per-category memory composition of a historical RDB requires correlating monitoring time-series with BGSAVE events, which is brittle and lossy. For diagnostics on customer clusters and for offline cost-attribution analysis we want a single deterministic record per RDB.

We also want to keep this strictly opt-in and zero-cost when the operator does not enable it, so that production paths are not affected.

Description of the feature

Introduce a new boolean configuration rdb-save-stat-memory (default no). When enabled, the RDB save child process samples real memory usage during the same walk that serializes each key, and emits two LL_NOTICE log lines from the parent after the child exits. No new IPC type, no new struct, no impact on the on-disk RDB format, and zero overhead when the option is disabled.

Log line 1 — per-type object stats

rdbSaveDb() accumulates the exact allocated footprint of each key (using the byte counter that rdbSaveObject() already maintains for COW dismissal) into per-type counters, then sends a single CHILD_INFO_TYPE_RDB_OBJECT_STATS message through the existing child_info_pipe. The parent prints:

[notice] RDB object stats: total keys=<N> mem_bytes=<B> | <type>=<count>/<bytes>B [<type>=<count>/<bytes>B ...]

Concrete example (after a small mixed workload):

[notice] RDB object stats: total keys=1234 mem_bytes=987654 | string=1000/450000B list=120/220000B hash=80/180000B set=20/80000B zset=14/57654B

Format rules:

  • <type> is one of string | list | set | zset | hash | stream | module (the seven OBJ_* types).
  • Empty buckets are skipped to keep the line compact; the order follows OBJ_TYPE_* enum order.
  • count is the number of keys of that type that survived to be persisted.
  • bytes is the exact allocated size (not sampled), captured during the COW snapshot — it includes the robj header plus the type-specific payload (sds bytes, listpack/quicklist nodes, hashtable entries, rax nodes, etc.).
  • total keys / mem_bytes are the sums across all types, so externally Σ count == total keys, Σ bytes == mem_bytes.

Log line 2 — per-category memory breakdown

In addition to the per-type line above, rdbCollectMemBreakdown() runs in the same RDB child at the end of rdbSaveRio() and fills 10 extra category counters (flattened into the same rdbObjectStats struct, no new struct, no new IPC type). The parent prints:

[notice] RDB memory breakdown: kv[user=<B>B expire=<B>B hash_meta=<B>B] user_meta[kvstore=<B>B misc=<B>B clients=<B>B] sys[cmdlog=<B>B aof_buf=<B>B repl_buf=<B>B repl_backlog=<B>B]

Concrete example:

[notice] RDB memory breakdown: kv[user=987654B expire=12288B hash_meta=0B] user_meta[kvstore=131072B misc=204800B clients=65536B] sys[cmdlog=0B aof_buf=0B repl_buf=0B repl_backlog=0B]

Category mapping:

Key-value data memory
 ├─ kv.user        : robj/sds bytes (mirrors total_bytes from line 1)
 ├─ kv.expire      : db->expires kvstore overhead
 └─ kv.hash_meta   : db->keys_with_volatile_items kvstore (hash field-TTL)

User metadata
 ├─ user_meta.kvstore : db->keys overhead (minus robj headers, no double-count vs kv.user)
 ├─ user_meta.misc    : commands + orig_commands + pubsub_channels
 └─ user_meta.clients : NORMAL + PUBSUB + PRIMARY client I/O buffers

System operational memory
 ├─ sys.cmdlog       : commandlog entry sds + dict overhead
 ├─ sys.aof_buf      : sdsAllocSize(server.aof_buf)
 ├─ sys.repl_buf     : repl_buffer_mem - repl_backlog_size
 └─ sys.repl_backlog : repl_backlog_size + rax index

Only categories with an existing *MemUsage / *AllocSize accessor are sampled. Subsystems that would require approximation (pubsub patterns dict, tracking rax tables, latency events dict) are intentionally omitted to avoid introducing new public APIs and to keep numbers exact.

Alternatives you've considered

  1. Sample in the parent inside backgroundSaveDoneHandlerDisk: simplest, but it reflects the state at save-completion time, not the state that was actually persisted. Rejected.
  2. Sample via MEMORY STATS on a cron: requires external scheduling, drifts from the snapshot, and pollutes the main thread. Rejected.
  3. Encode the breakdown into the RDB file itself (e.g. as an aux field): would let valkey-check-rdb and replicas read it, but couples the on-disk format with internal memory-accounting choices. Rejected for now — the log line is sufficient for the diagnostic use case and keeps the on-disk format stable.
  4. Add a new dedicated childInfoType: would force every consumer of child_info_pipe to handle a second message per save. Rejected in favor of flattening into the existing rdbObjectStats payload.
  5. Introduce new public *MemUsage helpers for pubsub-patterns / tracking / latency: adds API surface and approximation logic that operators cannot calibrate. Deferred — those subsystems are deliberately not reported.

Additional information

  • Code-pointers (current branch):
    • Per-key accumulation: src/rdb.c::rdbSaveDb (uses rio.processed_bytes delta — zero extra walk)
    • Breakdown sampling: src/rdb.c::rdbCollectMemBreakdown (called from rdbSaveRio)
    • Struct: src/server.h::rdbObjectStats (per-type counters + 10 flattened breakdown fields)
    • Parent log + valid-bit gating: src/rdb.c::backgroundSaveDoneHandlerDisk
    • Reused public APIs: kvstoreMemUsage, hashtableMemUsage, sdsAllocSize, raxAllocSize, getMemoryOverheadData's repl_buffer / repl_backlog split formula
  • Tests: tests/integration/rdb.tcl
    • rdb-save-stat-memory logs per-type object stats after BGSAVE: asserts the RDB object stats: line is gated by the option and that each populated <type>=<count>/<bytes>B token has count > 0.
    • rdb-save-stat-memory logs memory breakdown after BGSAVE: asserts the RDB memory breakdown: line is gated by the option and that load-bearing categories (kv.user, kv.expire, user_meta.kvstore, user_meta.misc) are non-zero after a representative workload.
  • Backwards compatibility: option defaults to no; existing log output is unchanged when disabled. The two extra log lines only appear when explicitly enabled.
  • Open question for maintainers: should this also be surfaced in INFO persistence (e.g. rdb_last_object_stats_* / rdb_last_memory_breakdown_*) for scraping, or are the log lines sufficient? Happy to follow up with a separate PR if there is interest.

@artikell artikell force-pushed the feature/rdb-save-stat-memory branch from 8e9eb63 to dae4e3f Compare May 22, 2026 10:04
artikell added 3 commits May 22, 2026 18:23
Introduce a new boolean configuration "rdb-save-stat-memory" (default: no).
When enabled, the RDB save child process iterates each key in rdbSaveDb()
and accumulates per-type key counts and approximate memory usage via
objectComputeSize() (sample size 5, matching MEMORY USAGE). The result is
sent to the parent process through the existing child_info_pipe using a
new CHILD_INFO_TYPE_RDB_OBJECT_STATS message type.

After the child exits, backgroundSaveDoneHandlerDisk() emits a single
LL_NOTICE log line summarizing total keys/bytes and per-type breakdown
(string/list/set/zset/hash/stream/module). This is useful for offline
diagnostics of memory composition without paying the cost on every save.

Also add an integration test in tests/integration/rdb.tcl that verifies
both the disabled and enabled paths via log inspection after BGSAVE.
Simplify the per-key memory accounting collected when
rdb-save-stat-memory is enabled to track only the object type, dropping
the encoding axis. The summary log line now reports one entry per type
(e.g. "string=N/MB list=N/MB") instead of per (type, encoding).
When `rdb-save-stat-memory` is enabled, sample 11 per-category memory
counters in the RDB save child process alongside the existing per-type
object stats, and emit a single `RDB memory breakdown: ...` log line in
the parent after the save completes.

The breakdown groups memory into three top-level categories:
  - kv        : user data, expiration kvstore, hash field-TTL kvstore
  - user_meta : main keyspace kvstore overhead, misc (command tables
                and pubsub channels), client I/O buffers
  - sys       : command log entries, AOF buffer, global replication
                buffer, replication backlog (with rax index)

Sampling runs in the RDB child so values reflect the same COW snapshot
as the per-type stats. Only categories with an existing *MemUsage /
*AllocSize accessor are sampled to avoid introducing new public APIs;
subsystems that would require approximation (pubsub patterns dict,
tracking rax tables, latency events dict) are intentionally omitted.

The new fields are flattened into the existing `rdbObjectStats` struct
and travel through the existing `CHILD_INFO_TYPE_RDB_OBJECT_STATS`
child-info pipe message - no new struct, no new IPC type, no new
configuration knob.

Tests: add an integration case asserting the breakdown line appears
only when the option is enabled and that load-bearing categories
(kv.user, kv.expire, user_meta.kvstore, user_meta.misc) are non-zero
after a representative workload. Also fix the existing per-type stats
test to match the post `(type, encoding) -> type` log format.
@artikell artikell force-pushed the feature/rdb-save-stat-memory branch from dae4e3f to 3e59fc0 Compare May 28, 2026 11:52
@artikell artikell changed the title Add rdb-save-stat-memory option to log per-type object stats RFE: Optional in-process memory composition snapshot at RDB save time May 28, 2026
@artikell artikell changed the title RFE: Optional in-process memory composition snapshot at RDB save time Optional in-process memory composition snapshot at RDB save time May 28, 2026
Roll back the conf-file block describing `rdb-save-stat-memory` to
match `origin/unstable`. The runtime option itself remains intact in
`config.c`; only the documentation block in `valkey.conf` is reverted
so the shipped config file stays in sync with upstream.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant