Per-slot memory tracking in CLUSTER SLOT-STATS - no key cache by liorsve · Pull Request #10 · liorsve/valkey

liorsve · 2026-04-12T13:43:08Z

Commit history

The first two commits are squashed versions of prior tracking PRs that this work builds on:

Squashed hashtable + rax + stream + vset O(1) memory tracking — from PR Combined: hashtable + rax memory tracking with vset integration #9 (combined-hashtable-rax-tracking)
added logical size tracking — from PR Track logical quicklist memory incrementally via lpBytes + compressed size #4 (quicklist-logical-size-tracking)

Summary

Adds per-slot memory tracking to CLUSTER SLOT-STATS, reporting one new metric:

memory-logical-bytes: combined user data + container overhead (field-value pairs, set members, listpack bytes, stream entries, hashtable bucket arrays, rax node overhead, vset containers, quicklist node structs)

The metric supports ORDERBY for sorting slots by memory usage.

New function: `objectLogicalSize()`

O(1) function in object.c that reads incrementally-maintained tracked fields to compute logical size per type/encoding. Returns a single size_t combining data and overhead:

Type	Encoding	What's included
STRING	RAW/EMBSTR	sdsReqSize (header + content + null)
STRING	INT	0 (value embedded in robj pointer)
LIST	QUICKLIST	tracked_data_bytes + sizeof(quicklist) + len * sizeof(quicklistNode)
LIST	LISTPACK	lpBytes
SET	HASHTABLE	hashtableTrackedDataBytes + hashtableMemUsage
SET	INTSET	intsetBlobLen
SET	LISTPACK	lpBytes
HASH	HASHTABLE	hashtableTrackedDataBytes + hashtableMemUsage + vsetLogicalSize
HASH	LISTPACK	lpBytes
ZSET	LISTPACK	lpBytes
ZSET	SKIPLIST	0 (no O(1) tracking yet)
STREAM	STREAM	tracked_data_bytes + tracked_overhead + sizeof(stream)

Key modification paths and how each is tracked

All data and overhead modifications flow through write commands or explicit out-of-call mutation points. Each is covered by a before/after snapshot or a direct subtraction. The one exception is incremental rehashing during reads, which gets a dedicated lightweight check (see details below the table):

Path	Where handled	How
Normal commands (SET, HSET, SADD, RPUSH, XADD, DEL, etc.)	`call()` in server.c	Before/after key size snapshot via getKeysFromCommand around cmd->proc(c)
In-place mutations (HSET adding field, SREM, LPOP, XTRIM)	`call()` in server.c	Before/after key size snapshot via getKeysFromCommand around cmd->proc(c)
Key eviction (maxmemory pressure)	`performEvictions()` in evict.c	Subtract objectLogicalSize before dbGenericDelete
Key expiry (active expire cycle)	`deleteExpiredKeyAndPropagateWithDictIndex()` in db.c	Subtract objectLogicalSize before dbGenericDelete; skipped during lazy expiry (executing_command flag) to avoid double-count with call() hooks
Key expiry (lazy expire during command)	`call()` in server.c	Handled by before/after hooks; explicit expiry hook skips via executing_command flag to avoid double-count
Hash field expiry (partial -- some fields expire)	`dbReclaimExpiredFields()` in db.c	Before/after objectLogicalSize around hashTypeDeleteExpiredFields
Hash field expiry (all fields expire -- key deleted)	`dbReclaimExpiredFields()` in db.c	Subtract remaining objectLogicalSize before dbDelete
FLUSHALL / FLUSHDB	`signalFlushedDb()` in db.c	Zero memory_logical_bytes for all slots
RDB loading	`dbAddRDBLoad()` in db.c	Add objectLogicalSize to slot stats after insert
AOF RESP loading	`loadAppendOnlyFiles()` in aof.c	Full recount of all slots after AOF load completes (RESP commands bypass call())
Slot ownership changes	`clusterSlotStatReset()` in cluster_slot_stats.c	memset zeros entire slotStat including new field
Incremental rehash during hash/set reads	`call()` in server.c	Lightweight before/after check on argv[1] for COMMAND_GROUP_HASH/SET when dirty unchanged (see below)

Incremental rehash overhead tracking

The one exception to the "all changes go through writes" rule: hashtable bucket overhead can change during read commands.

Hash and set values using hashtable encoding undergo incremental rehashing. Every findBucket() call -- including those from read commands like HGET and SISMEMBER -- migrates one bucket from the old table to the new table. This changes hashtableMemUsage() (part of objectLogicalSize) without any data modification and without going through the write tracking path.

Each rehash step can change overhead by sizeof(bucket) (64 bytes) as child buckets are freed. When rehashing completes, the old table is freed entirely -- overhead can drop by hundreds of bytes in a single step.

Solution: A lightweight check in call() after cmd->proc(c):

Gate: Only fires when dirty didn't change (write path already handled it), the command belongs to COMMAND_GROUP_HASH or COMMAND_GROUP_SET, and clusterSlotStatsEnabled.
Before cmd->proc: clusterSlotStatsSnapshotRehashOverhead(c) looks up argv[1], checks if it's hashtable-encoded AND mid-rehash. If not, returns 0 (skip). If yes, snapshots current objectLogicalSize.
After cmd->proc: clusterSlotStatsApplyRehashOverhead(c) re-reads objectLogicalSize and applies the delta to slot_stats.

Only argv[1] is checked because all hash/set read commands are single-key. The only multi-key command that touches a second hash/set value is SMOVE, which is a write command -- already covered by the full before/after write path.

Cost: For non-hash/set commands: zero (group check). For hash/set reads where the HT isn't rehashing: one dbFind + encoding check + hashtableIsRehashing -- all return early. The actual delta application only runs when overhead genuinely changed (rare, only during active rehashing).

Self-correction without this fix: Without this check, the drift self-corrects on the next successful write command to that key, which refreshes the before/after baseline.

Struct changes

slotStat in cluster_legacy.h gains one field:

int64_t memory_logical_bytes;

client in server.h gains one field for per-command state:

size_t slot_mem_before;

DEBUG SLOT-VERIFY-MEMORY

New debug subcommand that independently walks all keys in a slot using computeObjectExpectedSize() -- an O(n) walk that does NOT use objectLogicalSize or any tracking field, only pre-existing APIs (sdslen, sdsHdrSize, lpBytes, intsetBlobLen, hashtableMemUsage, raxComputeLogicalSize, etc.). Compares the walk result against slot_stats and returns OK or an error with mismatch details.

Tcl integration tests

26 tests added to tests/unit/cluster/slot-stats.tcl:

String: SET, overwrite, integer encoding, DEL
Hash: in-place growth (50 fields with verification after each HSET), field expiry (partial and full key deletion)
Set: SADD 200 members then SREM 100, verifying decrease
List: RPUSH 100 items then LPOP 50, verifying decrease
Stream: XADD 50 entries across multiple rax nodes then XTRIM, verifying decrease
Cross-slot independence: two keys in different slots, DEL one, other unchanged
Same-slot accumulation: two keys with hash tags in same slot
Mixed types in same slot: string + hash + set with hash tags
FLUSHALL: resets all slot memory to zero
Key expiry: SET with PX, wait, verify memory drops to zero
Lazy expiry + write: SET with PX, disable active expiry, overwrite after expiry -- verifies no double-counting
MULTI/EXEC: transaction with string + 50-field hash + 50-member set + 50-item list
Eviction: fill slot, set tight maxmemory with allkeys-lru, verify after eviction
Hash field expiry (partial): HSETEX with TTL, verify memory decreases after fields expire
Hash field expiry (empty key): HSETEX where all fields expire, verify key deleted and memory is zero
RDB reload: create mixed types, DEBUG RELOAD, verify slot stats match
AOF reload: create keys, BGREWRITEAOF, add more keys, DEBUG LOADAOF -- verifies post-load recount
Mismatch detection: CONFIG RESETSTAT corrupts stats, DEBUG SLOT-VERIFY-MEMORY catches it
ORDERBY memory-logical-bytes: two keys with different sizes, verify descending order
Hash rehash overhead repro: 100 HSET + 50 HDEL + 100 more HSET with verify after each operation
Hash+set rehash overhead repro: 300 rounds of interleaved hash/set writes AND reads (HGET, SISMEMBER) with verify after each
Random fuzzer: 300 rounds x 200 random operations across 4 types and 2 slots with verify after each operation

…ed branch) Signed-off-by: Lior Sventitzky <liorsve@amazon.com>

…on with objectComputeSize test Signed-off-by: Lior Sventitzky <liorsve@amazon.com>

…on purpose Signed-off-by: Lior Sventitzky <liorsve@amazon.com>

Signed-off-by: Lior Sventitzky <liorsve@amazon.com>

liorsve added 2 commits April 9, 2026 13:20

Squashed hashtable + rax + stream + vset O(1) memory tracking (combin…

85c0f0c

…ed branch) Signed-off-by: Lior Sventitzky <liorsve@amazon.com>

added logical size tracking, added new correctness tests and comparis…

14448a2

…on with objectComputeSize test Signed-off-by: Lior Sventitzky <liorsve@amazon.com>

github-actions Bot assigned liorsve Apr 12, 2026

liorsve force-pushed the per-slot-memory-aggregation branch from 018d38d to 93895d3 Compare April 12, 2026 16:09

liorsve changed the title ~~Per-slot memory tracking in CLUSTER SLOT-STATS~~ Per-slot memory tracking in CLUSTER SLOT-STATS - no key cache Apr 13, 2026

liorsve force-pushed the per-slot-memory-aggregation branch 2 times, most recently from 5d8716d to 09a3a14 Compare April 13, 2026 08:45

liorsve mentioned this pull request Apr 13, 2026

Per-slot memory tracking via per-key cache + signalModifiedKey #11

Draft

liorsve force-pushed the per-slot-memory-aggregation branch 3 times, most recently from a41b6d7 to f3873f2 Compare April 13, 2026 14:10

call() based per slot memory stat & disabled quicklist tests failing …

f1193de

…on purpose Signed-off-by: Lior Sventitzky <liorsve@amazon.com>

liorsve force-pushed the per-slot-memory-aggregation branch from f3873f2 to f1193de Compare April 13, 2026 14:20

liorsve added 3 commits April 14, 2026 08:37

fix rehash bug

60026dd

Signed-off-by: Lior Sventitzky <liorsve@amazon.com>

unified data and overhead stats to 1

494b5e7

Signed-off-by: Lior Sventitzky <liorsve@amazon.com>

removed client struct field

51210e5

Signed-off-by: Lior Sventitzky <liorsve@amazon.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per-slot memory tracking in CLUSTER SLOT-STATS - no key cache#10

Per-slot memory tracking in CLUSTER SLOT-STATS - no key cache#10
liorsve wants to merge 6 commits into
unstablefrom
per-slot-memory-aggregation

liorsve commented Apr 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

liorsve commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Commit history

Summary

New function: objectLogicalSize()

Key modification paths and how each is tracked

Incremental rehash overhead tracking

Struct changes

DEBUG SLOT-VERIFY-MEMORY

Tcl integration tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

liorsve commented Apr 12, 2026 •

edited

Loading

New function: `objectLogicalSize()`