Problem
When a Midaz pod scales horizontally or restarts, its in-memory tenant config cache is empty. If the tenant-manager is unavailable, the new pod cannot serve tenant requests. The current 1h in-memory TTL means only 1h of tolerance for tenant-manager downtime.
Root causes:
- In-memory cache is per-pod (not shared across replicas)
- Cache doesn't survive pod restarts — new pods start cold
- No shared cache layer between pods
Goal
Redis/Valkey as primary shared cache (survives pod restarts, shared across all pods) + in-memory as fast L1 fallback. Tenant-manager downtime tolerated for Redis TTL (12-24h).
Architecture
Request → L1: In-Memory (fast, 5min TTL, per-pod)
↓ (miss)
L2: Redis/Valkey (shared, 12h TTL, survives restarts)
↓ (miss)
HTTP to tenant-manager → Secrets Manager
↓ (success)
Write to L2 (Redis) + L1 (in-memory)
Invalidation flow (suspend/purge):
revalidatePoolSettings calls GetTenantConfig(WithSkipCache()) → bypasses L1+L2 → gets 403 → evicts connection + deletes from both caches
Implementation Plan
Phase 1: Redis ConfigCache Adapter
New file: commons/tenant-manager/cache/redis.go
Implement existing ConfigCache interface (cache/config_cache.go:19-37) with Redis:
type RedisCache struct {
client redis.UniversalClient
prefix string
}
func NewRedisCache(client redis.UniversalClient, opts ...RedisCacheOption) *RedisCache
func (c *RedisCache) Get(ctx, key) (string, error) // redis GET, returns ErrCacheMiss on nil
func (c *RedisCache) Set(ctx, key, value, ttl) error // redis SET with TTL
func (c *RedisCache) Del(ctx, key) error // redis DEL
Tests: commons/tenant-manager/cache/redis_test.go — use miniredis
Phase 2: TieredCache Wrapper
New file: commons/tenant-manager/cache/tiered.go
Composes two ConfigCache implementations:
type TieredCache struct {
l1 ConfigCache
l2 ConfigCache
l1TTL time.Duration
l2TTL time.Duration
}
func NewTieredCache(l1, l2 ConfigCache, l1TTL, l2TTL time.Duration) *TieredCache
Get: L1 → miss → L2 → miss → ErrCacheMiss. On L2 hit, write back to L1 with l1TTL.
Set: write to L1 (short TTL) + L2 (long TTL).
Del: delete from both.
Tests: commons/tenant-manager/cache/tiered_test.go
Phase 3: WithTieredCache Convenience Option
File: commons/tenant-manager/client/client.go
func WithTieredCache(redisClient redis.UniversalClient, opts ...TieredCacheOption) ClientOption
Options: WithL1TTL(time.Duration), WithL2TTL(time.Duration), WithRedisPrefix(string)
Defaults: L1=5min, L2=12h, prefix=tm-config:
Creates TieredCache(InMemoryCache, RedisCache) and passes via existing WithCache.
Existing Code to Reuse
| Item |
Location |
ConfigCache interface |
cache/config_cache.go:19-37 |
InMemoryCache |
cache/memory.go (used as L1) |
WithCache(cc ConfigCache) |
client/client.go:138-144 |
ErrCacheMiss |
cache/config_cache.go:14 |
defaultCacheTTL |
client/client.go:29 |
Key Files
| File |
Action |
commons/tenant-manager/cache/redis.go |
New |
commons/tenant-manager/cache/redis_test.go |
New |
commons/tenant-manager/cache/tiered.go |
New |
commons/tenant-manager/cache/tiered_test.go |
New |
commons/tenant-manager/client/client.go |
Edit — add WithTieredCache option |
Backward Compatibility
- Without
WithTieredCache, behavior unchanged (in-memory only)
WithCache still works for custom implementations
WithCacheTTL still controls L1 TTL when using WithTieredCache
Downstream Integration (midaz — separate PR)
After this is merged, midaz components will use WithTieredCache:
| File |
Change |
components/ledger/internal/bootstrap/config.go |
Add env vars + WithTieredCache |
components/onboarding/internal/bootstrap/config.go |
Add env vars |
components/transaction/internal/bootstrap/config.go |
Add env vars |
components/crm/internal/bootstrap/config.go |
Add env vars |
components/crm/internal/bootstrap/config.tenant.go |
Use WithTieredCache |
Env vars: MULTI_TENANT_CACHE_L1_TTL_SEC (default 300), MULTI_TENANT_CACHE_L2_TTL_SEC (default 43200)
Verification
- Start Midaz with env vars → verify Redis key created on first request
- Restart pod → first request reads from Redis (no HTTP to tenant-manager)
- Scale to 3 pods → all read from shared Redis
- Stop tenant-manager → requests continue from Redis for 12h
InvalidateConfig → both L1 and L2 cleared
- Suspend tenant → revalidation bypasses cache → detects 403 → evicts both layers
- Run all tests:
go test ./commons/tenant-manager/... -count=1 -race
Problem
When a Midaz pod scales horizontally or restarts, its in-memory tenant config cache is empty. If the tenant-manager is unavailable, the new pod cannot serve tenant requests. The current 1h in-memory TTL means only 1h of tolerance for tenant-manager downtime.
Root causes:
Goal
Redis/Valkey as primary shared cache (survives pod restarts, shared across all pods) + in-memory as fast L1 fallback. Tenant-manager downtime tolerated for Redis TTL (12-24h).
Architecture
Invalidation flow (suspend/purge):
revalidatePoolSettingscallsGetTenantConfig(WithSkipCache())→ bypasses L1+L2 → gets 403 → evicts connection + deletes from both cachesImplementation Plan
Phase 1: Redis ConfigCache Adapter
New file:
commons/tenant-manager/cache/redis.goImplement existing
ConfigCacheinterface (cache/config_cache.go:19-37) with Redis:Tests:
commons/tenant-manager/cache/redis_test.go— use miniredisPhase 2: TieredCache Wrapper
New file:
commons/tenant-manager/cache/tiered.goComposes two
ConfigCacheimplementations:Get: L1 → miss → L2 → miss → ErrCacheMiss. On L2 hit, write back to L1 with l1TTL.Set: write to L1 (short TTL) + L2 (long TTL).Del: delete from both.Tests:
commons/tenant-manager/cache/tiered_test.goPhase 3: WithTieredCache Convenience Option
File:
commons/tenant-manager/client/client.goOptions:
WithL1TTL(time.Duration),WithL2TTL(time.Duration),WithRedisPrefix(string)Defaults: L1=5min, L2=12h, prefix=
tm-config:Creates
TieredCache(InMemoryCache, RedisCache)and passes via existingWithCache.Existing Code to Reuse
ConfigCacheinterfacecache/config_cache.go:19-37InMemoryCachecache/memory.go(used as L1)WithCache(cc ConfigCache)client/client.go:138-144ErrCacheMisscache/config_cache.go:14defaultCacheTTLclient/client.go:29Key Files
commons/tenant-manager/cache/redis.gocommons/tenant-manager/cache/redis_test.gocommons/tenant-manager/cache/tiered.gocommons/tenant-manager/cache/tiered_test.gocommons/tenant-manager/client/client.goWithTieredCacheoptionBackward Compatibility
WithTieredCache, behavior unchanged (in-memory only)WithCachestill works for custom implementationsWithCacheTTLstill controls L1 TTL when usingWithTieredCacheDownstream Integration (midaz — separate PR)
After this is merged, midaz components will use
WithTieredCache:components/ledger/internal/bootstrap/config.goWithTieredCachecomponents/onboarding/internal/bootstrap/config.gocomponents/transaction/internal/bootstrap/config.gocomponents/crm/internal/bootstrap/config.gocomponents/crm/internal/bootstrap/config.tenant.goWithTieredCacheEnv vars:
MULTI_TENANT_CACHE_L1_TTL_SEC(default 300),MULTI_TENANT_CACHE_L2_TTL_SEC(default 43200)Verification
InvalidateConfig→ both L1 and L2 clearedgo test ./commons/tenant-manager/... -count=1 -race