feat: add statistics and adaptive growth to TagValueSeriesIDCache#27480
feat: add statistics and adaptive growth to TagValueSeriesIDCache#27480davidby-influx wants to merge 11 commits into
Conversation
Add statistics for hits and misses and cache capacity to TagValueSeriesIDCache to permit adaptive cache growth up to a configured max capacity when hit rate is below a configured target. Includes new configuration parameters and tests Closes #27479
There was a problem hiding this comment.
Pull request overview
This PR adds observability and adaptive growth for the TSI1 TagValueSeriesIDCache, allowing cache capacity to grow toward a configured maximum when observed hit rate is below target.
Changes:
- Adds cache hit/miss/eviction/size/capacity statistics and wires them through engine statistics.
- Adds adaptive cache capacity growth policy and related configuration validation.
- Adds unit and integration tests for statistics, adaptive resizing, and engine stats plumbing.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
tsdb/index/tsi1/index.go |
Adds cache sizing options, adaptive cache construction, and index statistics forwarding. |
tsdb/index/tsi1/cache.go |
Implements cache statistics, adaptive growth policy, resize logging, and capacity tracking. |
tsdb/index/tsi1/cache_test.go |
Adds tests for statistics and adaptive cache behavior. |
tsdb/index/inmem/inmem.go |
Adds no-op statistics method for interface compatibility. |
tsdb/index.go |
Extends the index interface with statistics reporting. |
tsdb/engine/tsm1/engine.go |
Includes index statistics in engine statistics output. |
tsdb/engine/tsm1/engine_test.go |
Tests TSI1 cache statistics are emitted through engine statistics. |
tsdb/config.go |
Adds adaptive cache config fields and validation. |
tsdb/config_test.go |
Adds validation coverage for adaptive cache configuration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Shut up the AI about an overflow if the cache capacity is the number of stars in the galaxy Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
|
The TSI tag-value series-ID cache now exposes hit, miss, eviction, size, and capacity statistics and can size itself within operator-set bounds instead of using a single fixed capacity. Adaptive sizing is enabled by setting both series-id-set-cache-max-size and series-id-set-cache-target-hit-rate; it stays disabled (fixed-capacity, as before) when either is zero. The cache grows by doubling toward the max ceiling when its windowed hit rate falls below target while under eviction pressure, and decays back down when the working set shrinks: it trims capacity and sheds only the least-recently-used entries left untouched over a self-tuning observation window. A cooldown between resizes prevents grow/shrink oscillation, and behavior is unchanged for deployments that do not opt in.
| // the observed Get hit rate falls below target. Resizes are driven from | ||
| // the eviction path: after every `capacity` evictions, the cache samples | ||
| // its windowed hit rate and grows once if the policy fires. The cache | ||
| // never shrinks. minSamples is a floor on the number of Get operations |
| # starts at series-id-set-cache-size and grows (doubling) up to | ||
| # series-id-set-cache-max-size whenever its observed query hit rate falls | ||
| # below series-id-set-cache-target-hit-rate while it is evicting. The cache | ||
| # never shrinks. Adaptive sizing trades higher heap usage for fewer cache |
| if c.SeriesIDSetCacheTargetHitRate < 0 || c.SeriesIDSetCacheTargetHitRate >= 1 { | ||
| return ErrSeriesIDSetCacheTargetHitRateRange |
| if target <= 0 || target >= 1 { | ||
| panic(fmt.Sprintf("NewAdaptiveTagValueSeriesIDCache: target must be in (0, 1), got %v", target)) |
The TSI tag-value series-ID cache now exposes hit, miss, eviction, size, and capacity statistics and can size itself within operator-set bounds instead of using a single fixed capacity. Adaptive sizing is enabled by setting both series-id-set-cache-max-size and series-id-set-cache-target-hit-rate; it stays disabled (fixed-capacity, as before) when either is zero. The cache grows by doubling toward the max when its windowed hit rate falls below target while under eviction pressure, and decays back down when the working set shrinks: after a coverage-sized window of Gets it trims capacity toward the observed LRU footprint, shedding only the least-recently-used entries left untouched over the window. A Gets-based cooldown between resizes and a conservatism-tunable eviction gate (a confidence bound on the at-target eviction distribution) prevent grow/shrink oscillation. Behavior is unchanged by default. Closes #27479
Split the eviction counter into Evictions (forced under Put pressure) and ShrinkEvictions (voluntary trim) so only forced evictions feed the binomial eviction-gate. Fix two adjacent bugs: evictLRULocked recedes deepestTouched to e.Prev() instead of nil-ing it when the LRU is the boundary, preventing a spurious shrink that sheds warm entries after a Put on a fully-warm cache; checkShrink rewinds the hit/miss baseline by one when opening a new window so the window-opening Get is counted in hitsW/missesW. Cap a single shrink event at maxShrinkEvictPerEvent (1024) to bound write-lock hold time on huge caches; decay continues over later windows. Add regression tests for both fixes, a policy-table row pinning the per-event cap, and an integration test (ShrinkRepeatsWhenCapped) that exercises two back-to-back capped shrinks and verifies decay continues when a single event would have shed more. Adapt existing tests for the new ShrinkEvictions stat field and get()'s new (set, hit) return.
Add statistics for hits and misses and cache
capacity to TagValueSeriesIDCache to permit
adaptive cache growth up to a configured
max capacity when hit rate is below a
configured target.
Includes new configuration parameters and tests
Closes #27479