[DRAFT] Cache organization usage counts for limit enforcement

### 1. Problem

When a user saves a translation (empty → text), `StringKeysLimitListeners` runs an org-wide `COUNT(DISTINCT ...)` query to check if the organization is within its plan limits. This query joins `translation → key → project` across the entire organization.

Even with proper indexes, this query is inherently O(org size) and can slow down requests for large organizations. Running a full count on every translation or key creation is inefficient — the limit check should be O(1).

Today's baseline: every translation/key creation pays for a full org-wide count query in the request path. This affects the PUT `/v2/projects/{projectId}/translations` endpoint, key creation, and imports.

### 2. Appetite

2 weeks for one developer.

The incremental tracking logic already exists (`getUsageIncreaseAmount()` returns +1/-1 correctly). The work is: build a cached count layer, wire it into the existing listeners, add background reconciliation, and test edge cases (imports, deletions, branching).

### 3. Solution

**Cached usage counts (Redis + DB fallback)**

Store the current key count and translation count per organization in Redis. The limit check reads from Redis instead of running the COUNT query.

**Element 1: Redis counter per organization**
- Keys: `org:{orgId}:key_count`, `org:{orgId}:translation_count`
- On translation/key create: `INCR`
- On translation/key delete: `DECR`
- The existing `getUsageIncreaseAmount()` already calculates +1/-1 correctly — wire it to Redis instead of (or before) the DB count

**Element 2: Limit check path**
- `StringKeysLimitListeners.onActivity()` reads from Redis instead of `organizationStatsService.getTranslationCount()`
- If Redis is unavailable, fall back to allowing the operation (performance > exactness)
- The transaction-scoped bean pattern stays — within one transaction, the Redis read happens once, then increments are tracked in memory

**Element 3: Background reconciliation**
- A scheduled job runs periodically (e.g., every 5-10 minutes)
- Executes the actual COUNT query per organization (can be spread out, not all at once)
- Compares result to Redis value and corrects if there's drift
- This ensures the cache doesn't drift far from reality over time

**Element 4: Cache initialization**
- On first access (cache miss in Redis), run the COUNT query once and store the result
- After that, only incremental updates
- On application startup or Redis flush, counts get lazily re-initialized on first access

**Element 5: Usage reporting (`StringsKeysUsageListener`)**
- The usage storing listener can also read from Redis for the count it stores in `UsageEvent`
- Or it can continue using the DB count since it only fires for ADD/DEL (less frequent)

### 4. Rabbit holes

- **Imports**: A large import may create thousands of translations in one batch. The existing `EntityPreCommitEvent` fires per entity, so Redis would get thousands of individual INCRs. This is fine — Redis handles high throughput. But the existing `StringKeysLimitListeners` is transaction-scoped and caches the base count, so the Redis read should also be cached per transaction (read once at start, then increment in memory, write final delta to Redis at commit).

- **Multi-pod consistency**: Redis is shared across pods, so INCR/DECR is atomic and consistent. No pod-local cache divergence.

- **Transaction rollback**: If a transaction increments Redis but then rolls back, the Redis count will be wrong until reconciliation fixes it. Option: use `TransactionalEventListener(phase = AFTER_COMMIT)` to only update Redis after successful commit. This is the safer approach.

- **Branching count correctness**: The cached count should reflect the corrected query (from #3527) that excludes branch keys when branching is disabled. The reconciliation job uses the corrected query.

### 5. No-gos

- No real-time exact counts — the cache may be slightly off between reconciliation runs, and that's acceptable
- No per-project counts — only org-level counts are cached (that's what the limit check needs)
- No Redis Cluster setup — use the existing Redis instance
- No migration of historical usage events — only the live count cache

### 6. Success criteria

- PUT `/translations` (empty → text) limit check becomes O(1) regardless of org size
- Limit enforcement still blocks users who exceed their plan (within a small margin)
- No regression in billing accuracy (reconciliation keeps counts aligned)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DRAFT] Cache organization usage counts for limit enforcement #3528

1. Problem

2. Appetite

3. Solution

4. Rabbit holes

5. No-gos

6. Success criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[DRAFT] Cache organization usage counts for limit enforcement #3528

Description

1. Problem

2. Appetite

3. Solution

4. Rabbit holes

5. No-gos

6. Success criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions