Optional in-process memory composition snapshot at RDB save time by artikell · Pull Request #9 · artikell/valkey

artikell · 2026-05-22T08:41:11Z

The problem/use-case that the feature addresses

When operating Valkey at scale we frequently need to answer the question:
"On the moment of the last full snapshot, how exactly was memory distributed across user data, server-side metadata, and operational buffers?"

Today the only way to get this answer is to run MEMORY STATS / INFO memory ad-hoc, which has two practical issues:

The numbers are sampled in the main thread at the time the operator (or tooling) issues the command. They drift from the dataset that was actually persisted to disk, and they include any memory churn that happened in the meantime (incoming writes, eviction, defrag, etc.).
There is no built-in audit trail. Reconstructing the per-category memory composition of a historical RDB requires correlating monitoring time-series with BGSAVE events, which is brittle and lossy. For diagnostics on customer clusters and for offline cost-attribution analysis we want a single deterministic record per RDB.

We also want to keep this strictly opt-in and zero-cost when the operator does not enable it, so that production paths are not affected.

Description of the feature

Introduce a new boolean configuration rdb-save-stat-memory (default no). When enabled, the RDB save child process samples real memory usage during the same walk that serializes each key, and emits two LL_NOTICE log lines from the parent after the child exits. No new IPC type, no new struct, no impact on the on-disk RDB format, and zero overhead when the option is disabled.

Log line 1 — per-type object stats

rdbSaveDb() accumulates the exact allocated footprint of each key (using the byte counter that rdbSaveObject() already maintains for COW dismissal) into per-type counters, then sends a single CHILD_INFO_TYPE_RDB_OBJECT_STATS message through the existing child_info_pipe. The parent prints:

[notice] RDB object stats: total keys=<N> mem_bytes=<B> | <type>=<count>/<bytes>B [<type>=<count>/<bytes>B ...]

Concrete example (after a small mixed workload):

[notice] RDB object stats: total keys=1234 mem_bytes=987654 | string=1000/450000B list=120/220000B hash=80/180000B set=20/80000B zset=14/57654B

Format rules:

<type> is one of string | list | set | zset | hash | stream | module (the seven OBJ_* types).
Empty buckets are skipped to keep the line compact; the order follows OBJ_TYPE_* enum order.
count is the number of keys of that type that survived to be persisted.
bytes is the exact allocated size (not sampled), captured during the COW snapshot — it includes the robj header plus the type-specific payload (sds bytes, listpack/quicklist nodes, hashtable entries, rax nodes, etc.).
total keys / mem_bytes are the sums across all types, so externally Σ count == total keys, Σ bytes == mem_bytes.

Log line 2 — per-category memory breakdown

In addition to the per-type line above, rdbCollectMemBreakdown() runs in the same RDB child at the end of rdbSaveRio() and fills 10 extra category counters (flattened into the same rdbObjectStats struct, no new struct, no new IPC type). The parent prints:

[notice] RDB memory breakdown: kv[user=<B>B expire=<B>B hash_meta=<B>B] user_meta[kvstore=<B>B misc=<B>B clients=<B>B] sys[cmdlog=<B>B aof_buf=<B>B repl_buf=<B>B repl_backlog=<B>B]

Concrete example:

[notice] RDB memory breakdown: kv[user=987654B expire=12288B hash_meta=0B] user_meta[kvstore=131072B misc=204800B clients=65536B] sys[cmdlog=0B aof_buf=0B repl_buf=0B repl_backlog=0B]

Category mapping:

Key-value data memory
 ├─ kv.user        : robj/sds bytes (mirrors total_bytes from line 1)
 ├─ kv.expire      : db->expires kvstore overhead
 └─ kv.hash_meta   : db->keys_with_volatile_items kvstore (hash field-TTL)

User metadata
 ├─ user_meta.kvstore : db->keys overhead (minus robj headers, no double-count vs kv.user)
 ├─ user_meta.misc    : commands + orig_commands + pubsub_channels
 └─ user_meta.clients : NORMAL + PUBSUB + PRIMARY client I/O buffers

System operational memory
 ├─ sys.cmdlog       : commandlog entry sds + dict overhead
 ├─ sys.aof_buf      : sdsAllocSize(server.aof_buf)
 ├─ sys.repl_buf     : repl_buffer_mem - repl_backlog_size
 └─ sys.repl_backlog : repl_backlog_size + rax index

Only categories with an existing *MemUsage / *AllocSize accessor are sampled. Subsystems that would require approximation (pubsub patterns dict, tracking rax tables, latency events dict) are intentionally omitted to avoid introducing new public APIs and to keep numbers exact.

Alternatives you've considered

Sample in the parent inside backgroundSaveDoneHandlerDisk: simplest, but it reflects the state at save-completion time, not the state that was actually persisted. Rejected.
Sample via MEMORY STATS on a cron: requires external scheduling, drifts from the snapshot, and pollutes the main thread. Rejected.
Encode the breakdown into the RDB file itself (e.g. as an aux field): would let valkey-check-rdb and replicas read it, but couples the on-disk format with internal memory-accounting choices. Rejected for now — the log line is sufficient for the diagnostic use case and keeps the on-disk format stable.
Add a new dedicated childInfoType: would force every consumer of child_info_pipe to handle a second message per save. Rejected in favor of flattening into the existing rdbObjectStats payload.
Introduce new public *MemUsage helpers for pubsub-patterns / tracking / latency: adds API surface and approximation logic that operators cannot calibrate. Deferred — those subsystems are deliberately not reported.

Additional information

Code-pointers (current branch):
- Per-key accumulation: src/rdb.c::rdbSaveDb (uses rio.processed_bytes delta — zero extra walk)
- Breakdown sampling: src/rdb.c::rdbCollectMemBreakdown (called from rdbSaveRio)
- Struct: src/server.h::rdbObjectStats (per-type counters + 10 flattened breakdown fields)
- Parent log + valid-bit gating: src/rdb.c::backgroundSaveDoneHandlerDisk
- Reused public APIs: kvstoreMemUsage, hashtableMemUsage, sdsAllocSize, raxAllocSize, getMemoryOverheadData's repl_buffer / repl_backlog split formula
Tests: tests/integration/rdb.tcl —
- rdb-save-stat-memory logs per-type object stats after BGSAVE: asserts the RDB object stats: line is gated by the option and that each populated <type>=<count>/<bytes>B token has count > 0.
- rdb-save-stat-memory logs memory breakdown after BGSAVE: asserts the RDB memory breakdown: line is gated by the option and that load-bearing categories (kv.user, kv.expire, user_meta.kvstore, user_meta.misc) are non-zero after a representative workload.
Backwards compatibility: option defaults to no; existing log output is unchanged when disabled. The two extra log lines only appear when explicitly enabled.
Open question for maintainers: should this also be surfaced in INFO persistence (e.g. rdb_last_object_stats_* / rdb_last_memory_breakdown_*) for scraping, or are the log lines sufficient? Happy to follow up with a separate PR if there is interest.

Introduce a new boolean configuration "rdb-save-stat-memory" (default: no). When enabled, the RDB save child process iterates each key in rdbSaveDb() and accumulates per-type key counts and approximate memory usage via objectComputeSize() (sample size 5, matching MEMORY USAGE). The result is sent to the parent process through the existing child_info_pipe using a new CHILD_INFO_TYPE_RDB_OBJECT_STATS message type. After the child exits, backgroundSaveDoneHandlerDisk() emits a single LL_NOTICE log line summarizing total keys/bytes and per-type breakdown (string/list/set/zset/hash/stream/module). This is useful for offline diagnostics of memory composition without paying the cost on every save. Also add an integration test in tests/integration/rdb.tcl that verifies both the disabled and enabled paths via log inspection after BGSAVE.

Simplify the per-key memory accounting collected when rdb-save-stat-memory is enabled to track only the object type, dropping the encoding axis. The summary log line now reports one entry per type (e.g. "string=N/MB list=N/MB") instead of per (type, encoding).

When `rdb-save-stat-memory` is enabled, sample 11 per-category memory counters in the RDB save child process alongside the existing per-type object stats, and emit a single `RDB memory breakdown: ...` log line in the parent after the save completes. The breakdown groups memory into three top-level categories: - kv : user data, expiration kvstore, hash field-TTL kvstore - user_meta : main keyspace kvstore overhead, misc (command tables and pubsub channels), client I/O buffers - sys : command log entries, AOF buffer, global replication buffer, replication backlog (with rax index) Sampling runs in the RDB child so values reflect the same COW snapshot as the per-type stats. Only categories with an existing *MemUsage / *AllocSize accessor are sampled to avoid introducing new public APIs; subsystems that would require approximation (pubsub patterns dict, tracking rax tables, latency events dict) are intentionally omitted. The new fields are flattened into the existing `rdbObjectStats` struct and travel through the existing `CHILD_INFO_TYPE_RDB_OBJECT_STATS` child-info pipe message - no new struct, no new IPC type, no new configuration knob. Tests: add an integration case asserting the breakdown line appears only when the option is enabled and that load-bearing categories (kv.user, kv.expire, user_meta.kvstore, user_meta.misc) are non-zero after a representative workload. Also fix the existing per-type stats test to match the post `(type, encoding) -> type` log format.

Roll back the conf-file block describing `rdb-save-stat-memory` to match `origin/unstable`. The runtime option itself remains intact in `config.c`; only the documentation block in `valkey.conf` is reverted so the shipped config file stays in sync with upstream.

github-actions Bot assigned artikell May 22, 2026

artikell force-pushed the feature/rdb-save-stat-memory branch from 8e9eb63 to dae4e3f Compare May 22, 2026 10:04

artikell added 3 commits May 22, 2026 18:23

artikell force-pushed the feature/rdb-save-stat-memory branch from dae4e3f to 3e59fc0 Compare May 28, 2026 11:52

artikell changed the title ~~Add rdb-save-stat-memory option to log per-type object stats~~ RFE: Optional in-process memory composition snapshot at RDB save time May 28, 2026

artikell changed the title ~~RFE: Optional in-process memory composition snapshot at RDB save time~~ Optional in-process memory composition snapshot at RDB save time May 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optional in-process memory composition snapshot at RDB save time#9

Optional in-process memory composition snapshot at RDB save time#9
artikell wants to merge 4 commits into
unstablefrom
feature/rdb-save-stat-memory

artikell commented May 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

artikell commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Log line 1 — per-type object stats

Log line 2 — per-category memory breakdown

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

artikell commented May 22, 2026 •

edited

Loading