Key cache by HMAC of mapping content, auto-reload on mtime change#36
Merged
Key cache by HMAC of mapping content, auto-reload on mtime change#36
Conversation
The cache was keyed by Podman's hex secret ID, which Podman regenerates every time setup runs its delete+create cycle. This caused the cache to be silently useless after every refresh timer fire — serve's in- memory dict had the old IDs, Podman was handing out new IDs, 100% cache miss rate until a manual serve restart. Observed on the test server: 1554 lookups across 30 minutes after a refresh fire, every single one fell through to the provider. When Infisical then went down, every container secret lookup failed with 502 — the exact outage the cache was built to prevent. Root-cause fix, one change: - Cache keys are HMAC-SHA256 of the mapping's canonical JSON bytes, not the Podman hex ID. Same mapping always yields the same key, no matter how often Podman churns the hex IDs. The HMAC key is random, per-host, stored inside the encrypted cache envelope — mapping hashes cannot be correlated across deployments. - Serve calls cache.maybe_reload() on the lookup hot path. A stat() per request (~1μs); actual reload only when setup has rewritten the file. Rotations propagate to serve without a restart. Cleanups enabled by the new design: - Drop _prune_stale_cache_entries from setup — no stale entries to prune when keys are content-derived. - Drop the id_map return from _register_secrets — cache doesn't need hex IDs any more. - Drop ExecStart=systemctl try-restart psi-secrets.service from the refresh wrapper — auto-reload handles it. Legacy v1 payloads (hex-ID keyed) are discarded on load; next save rewrites in v2 format with a freshly generated HMAC key. Container lookups during the one-time transition fall through to the provider exactly once.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The cache was keyed by Podman's hex secret ID, which Podman regenerates every time setup runs its delete+create cycle. This caused the cache to be silently useless after every refresh timer fire — serve's in-memory dict had the old IDs, Podman was handing out new IDs, 100% cache miss rate until a manual serve restart.
Observed on the test server: 1554 lookups across 30 minutes after a refresh fire, every single one fell through to the provider. When Infisical then went down, every container secret lookup failed with 502 — the exact outage the cache was built to prevent.
Root-cause fix
cache.maybe_reload()on the lookup hot path. Astat()per request (~1μs); actual reload only when setup has rewritten the file. Rotations propagate to serve without a restart.Cleanups enabled by the new design
_prune_stale_cache_entriesfrom setup — no stale entries to prune when keys are content-derived.id_mapreturn from_register_secrets— cache doesn't need hex IDs any more.ExecStart=systemctl try-restart psi-secrets.servicefrom the refresh wrapper — auto-reload handles it.Migration
Legacy v1 payloads (hex-ID keyed) are discarded on load; next save rewrites in v2 format with a freshly generated HMAC key. Container lookups during the one-time transition fall through to the provider exactly once.
Test plan
pytest— 344 tests pass.TestCacheKey: HMAC stable across save/load, different mappings produce different keys, per-host keys prevent cross-correlation.TestMaybeReload: no reload when mtime unchanged, reload when another writer updates the file, graceful handling of missing file.TestLegacyV1PayloadDiscarded: v1 payloads ignored cleanly with fresh HMAC key.test_serve_offline.pyto usecache.cache_key()for key construction.ruff check/ruff format --check/ty check— clean.