Per-slot memory tracking in CLUSTER SLOT-STATS - no key cache#10
Draft
liorsve wants to merge 6 commits into
Draft
Per-slot memory tracking in CLUSTER SLOT-STATS - no key cache#10liorsve wants to merge 6 commits into
liorsve wants to merge 6 commits into
Conversation
…ed branch) Signed-off-by: Lior Sventitzky <liorsve@amazon.com>
…on with objectComputeSize test Signed-off-by: Lior Sventitzky <liorsve@amazon.com>
018d38d to
93895d3
Compare
5d8716d to
09a3a14
Compare
a41b6d7 to
f3873f2
Compare
…on purpose Signed-off-by: Lior Sventitzky <liorsve@amazon.com>
f3873f2 to
f1193de
Compare
Signed-off-by: Lior Sventitzky <liorsve@amazon.com>
Signed-off-by: Lior Sventitzky <liorsve@amazon.com>
Signed-off-by: Lior Sventitzky <liorsve@amazon.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Commit history
The first two commits are squashed versions of prior tracking PRs that this work builds on:
Squashed hashtable + rax + stream + vset O(1) memory tracking— from PR Combined: hashtable + rax memory tracking with vset integration #9 (combined-hashtable-rax-tracking)added logical size tracking— from PR Track logical quicklist memory incrementally via lpBytes + compressed size #4 (quicklist-logical-size-tracking)Summary
Adds per-slot memory tracking to
CLUSTER SLOT-STATS, reporting one new metric:memory-logical-bytes: combined user data + container overhead (field-value pairs, set members, listpack bytes, stream entries, hashtable bucket arrays, rax node overhead, vset containers, quicklist node structs)The metric supports
ORDERBYfor sorting slots by memory usage.New function:
objectLogicalSize()O(1) function in
object.cthat reads incrementally-maintained tracked fields to compute logical size per type/encoding. Returns a singlesize_tcombining data and overhead:Key modification paths and how each is tracked
All data and overhead modifications flow through write commands or explicit out-of-call mutation points. Each is covered by a before/after snapshot or a direct subtraction. The one exception is incremental rehashing during reads, which gets a dedicated lightweight check (see details below the table):
call()in server.ccall()in server.cperformEvictions()in evict.cdeleteExpiredKeyAndPropagateWithDictIndex()in db.ccall()in server.cdbReclaimExpiredFields()in db.cdbReclaimExpiredFields()in db.csignalFlushedDb()in db.cdbAddRDBLoad()in db.cloadAppendOnlyFiles()in aof.cclusterSlotStatReset()in cluster_slot_stats.ccall()in server.cIncremental rehash overhead tracking
The one exception to the "all changes go through writes" rule: hashtable bucket overhead can change during read commands.
Hash and set values using hashtable encoding undergo incremental rehashing. Every
findBucket()call -- including those from read commands like HGET and SISMEMBER -- migrates one bucket from the old table to the new table. This changeshashtableMemUsage()(part ofobjectLogicalSize) without any data modification and without going through the write tracking path.Each rehash step can change overhead by
sizeof(bucket)(64 bytes) as child buckets are freed. When rehashing completes, the old table is freed entirely -- overhead can drop by hundreds of bytes in a single step.Solution: A lightweight check in
call()aftercmd->proc(c):dirtydidn't change (write path already handled it), the command belongs toCOMMAND_GROUP_HASHorCOMMAND_GROUP_SET, andclusterSlotStatsEnabled.cmd->proc:clusterSlotStatsSnapshotRehashOverhead(c)looks upargv[1], checks if it's hashtable-encoded AND mid-rehash. If not, returns 0 (skip). If yes, snapshots currentobjectLogicalSize.cmd->proc:clusterSlotStatsApplyRehashOverhead(c)re-readsobjectLogicalSizeand applies the delta toslot_stats.Only
argv[1]is checked because all hash/set read commands are single-key. The only multi-key command that touches a second hash/set value is SMOVE, which is a write command -- already covered by the full before/after write path.Cost: For non-hash/set commands: zero (group check). For hash/set reads where the HT isn't rehashing: one
dbFind+ encoding check +hashtableIsRehashing-- all return early. The actual delta application only runs when overhead genuinely changed (rare, only during active rehashing).Self-correction without this fix: Without this check, the drift self-corrects on the next successful write command to that key, which refreshes the before/after baseline.
Struct changes
slotStatin cluster_legacy.h gains one field:clientin server.h gains one field for per-command state:DEBUG SLOT-VERIFY-MEMORY
New debug subcommand that independently walks all keys in a slot using
computeObjectExpectedSize()-- an O(n) walk that does NOT use objectLogicalSize or any tracking field, only pre-existing APIs (sdslen, sdsHdrSize, lpBytes, intsetBlobLen, hashtableMemUsage, raxComputeLogicalSize, etc.). Compares the walk result against slot_stats and returns OK or an error with mismatch details.Tcl integration tests
26 tests added to
tests/unit/cluster/slot-stats.tcl: