new types, updated rate limit service #102

HardMax71 · 2026-01-25T23:33:09Z

Summary by cubic

Typed core models and streamlined the rate limit service and metrics to reduce complexity and make admin usage stats consistent across algorithms. Updated Grafana, OpenAPI, and docs to match the new metrics and types.

Refactors
- Rewrote RateLimitService with Pydantic TypeAdapter-based config, simpler Redis usage, and compact sliding window/token bucket logic.
- Introduced typed models: EndpointUsageStats, ReplayError, ExecutionResultSummary, KeyStrategy for idempotency. Moved ResourceUsageDomain to events. ExecutionResultDomain.metadata is now EventMetadata.
- DLQBatchRetryResponse.details now returns DLQRetryResult. Admin user rate limits current_usage returns EndpointUsageStats.
- Metrics simplified to request/allowed/rejected/bypass counters with minimal attributes. Grafana dashboard updated accordingly.
- Event statistics now use typed lists (EventTypeCount, ServiceEventCount) instead of dicts.
Migration
- Import ResourceUsageDomain from app.domain.events; treat ExecutionResultDomain.metadata as EventMetadata; stop using model_dump() in producers/handlers.
- Update clients: read EndpointUsageStats.remaining and algorithm in current_usage; parse DLQ retry details as DLQRetryResult; handle EventStatistics lists shape.
- Use KeyStrategy enum instead of string values for idempotency key_strategy.
- Remove references to deprecated metrics (check_duration, redis_duration, algorithm_duration, quota_usage, token_bucket_*). Update dashboards/alerts to use the new counters.

^{Written for commit b8dda69. Summary will update on new commits.}

Summary by CodeRabbit

New Features
- Per-endpoint rate-limit usage shown with algorithm and remaining counts; dashboard panels simplified for clarity.
- New "mark all as read" notification event and richer notification event types.
- DLQ batch retry and replay errors return structured, typed details.
Bug Fixes
- Execution metadata and replay error reporting made more consistent.
- Pod/container status events now deliver structured status data (not raw strings).
Documentation
- Rate-limiting docs updated with Usage Statistics and per-user override options.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-25T23:33:26Z

📝 Walkthrough

Walkthrough

Consolidates typing and domain models (ResourceUsageDomain moved; new ExecutionResultSummary, ReplayError, EndpointUsageStats), refactors rate-limit metrics and service, replaces DLQ response assembly with pydantic model_validate using DLQRetryResult, and modernizes annotations/typing across many modules.

Changes

Cohort / File(s)	Summary
DLQ & Schemas `backend/app/api/routes/dlq.py`, `backend/app/schemas_pydantic/dlq.py`, `docs/reference/openapi.json`	`DLQBatchRetryResponse.details` now typed as `DLQRetryResult` list; route constructs response via `DLQBatchRetryResponse.model_validate(result, from_attributes=True)`; OpenAPI and frontend types updated.
Rate limit metrics & service `backend/app/core/metrics/rate_limit.py`, `backend/app/services/rate_limit_service.py`, `backend/app/domain/rate_limit/*.py`, `backend/grafana/provisioning/dashboards/rate-limiting-dashboard.json`, `docs/architecture/rate-limiting.md`	Metrics counters renamed to underscored internals and four new record_* methods added; RateLimitService refactored to use `EndpointUsageStats`, TypeAdapter-based config, and revised check/get_usage flows; Grafana/dashboard/docs updated.
Domain model reorganization `backend/app/domain/events/typed.py`, `backend/app/domain/execution/models.py`, `backend/app/domain/events/event_models.py`, `backend/app/domain/replay/.py`, `backend/app/domain/admin/.py`	`ResourceUsageDomain` moved into events domain; added `ExecutionResultSummary`, `ReplayError`, `EndpointUsageStats`, `EventTypeCount`/`ServiceEventCount`; replay/admin models and repositories updated to use structured typed lists instead of raw dict maps.
Event bus & notification events `backend/app/services/event_bus.py`, `backend/app/services/notification_service.py`	EventBus switched to accept/publish `DomainEvent`/`BaseEvent` objects; NotificationService now publishes typed notification events (including `NotificationAllReadEvent`) with `EventMetadata`.
Idempotency key strategy enum `backend/app/domain/idempotency/models.py`, `backend/app/services/idempotency/`, `backend/app/services//consumer_wrappers.py`, `backend/tests/*/idempotency/.py`	Introduces `KeyStrategy` enum and replaces string key_strategy parameters/usages across idempotency manager, middleware, wrappers, workers, orchestrators, and tests.
Replay & event stats repository changes `backend/app/db/repositories/admin/admin_events_repository.py`, `backend/app/db/repositories/event_repository.py`	Aggregation outputs now return lists of typed count objects (`EventTypeCount`, `ServiceEventCount`) instead of dict maps; repository code builds those models.
Type hint modernization & annotations many modules (e.g., `backend/app/core/`, `backend/app/schemas_pydantic/`, `backend/app/services/`, `backend/app/db/docs/`)	Widespread migration from `typing` generics to built-in generics and PEP 604 unions; added `from __future__ import annotations` in core modules; many signature/type refinements.
Miscellaneous behavioral tweaks `backend/app/services/execution_service.py`, `backend/app/core/providers.py`, `backend/app/services/result_processor/processor.py`, `backend/app/services/pod_monitor/event_mapper.py`	Terminal execution metadata set to `None` (not {}); Redis teardown uses `close()`; idempotency default strategies switched to enum values; PodRunningEvent.container_statuses now a list of `ContainerStatusInfo` objects; result processor emits EventMetadata object instead of dumped dict.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

refactor: added settings-driven DI and CSRF middleware with SSE subscribed event #73 — DLQ batch-retry refactor introducing DLQBatchRetryResult/DLQRetryResult types; strongly related to DLQ typing/validation changes.
chore: type fixes #74 — Migration to Pydantic model_validate/from_attributes for DLQ models; aligns with the DLQ response construction change.
2 events system (domain, avro) -> single one #84 — Similar DLQ route change switching to model_validate(...) for DLQ response assembly.

Suggested labels

enhancement

Poem

🐰 I hopped through types with nimble paws,
Moved resource data to kinder laws,
Metrics learned new songs to sing,
DLQ replies now match the schema ring,
A tiny rabbit cheers these tidy paws!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 35.96% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'new types, updated rate limit service' is vague and generic, using non-descriptive terms that do not convey the main scope of this significant refactoring.	Consider a more specific title that highlights the primary change, such as 'Refactor rate limit service with typed models and simplified metrics' or 'Introduce typed domain models and modernize rate limit implementation'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cubic-dev-ai

No issues found across 32 files

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

backend/app/services/sse/redis_bus.py (1)

37-41: Use aclose() instead of deprecated close() for PubSub cleanup.

In redis-py 7.1.0, PubSub.close() is deprecated and marked with "Use aclose() instead". Although close() internally calls aclose() (so it still works), the code should use await self._pubsub.aclose() to follow current best practices and align with how the asyncio Redis client handles cleanup.

backend/app/domain/execution/models.py (1)

35-43: Metadata field is not persisted to the database and will be silently lost.

Line 42 changes ExecutionResultDomain.metadata to EventMetadata | None, but the write_terminal_result method (backend/app/db/repositories/execution_repository.py, line 48) does not persist the metadata field. While resource_usage is correctly serialized with .model_dump() and error_type is set, metadata is completely omitted from the database update. This causes metadata to be silently discarded whenever an ExecutionResultDomain is saved. Either add metadata persistence to write_terminal_result (which requires adding a metadata field to ExecutionDocument) or remove the field from ExecutionResultDomain if it's not intended for storage.

🤖 Fix all issues with AI agents

In `@backend/app/core/providers.py`:
- Around line 118-121: The async Redis client cleanup uses the deprecated
close() alias; update the finally block that currently does "await
client.close()" to call "await client.aclose()" instead so the
redis.asyncio.Redis client is closed with the modern API (locate the provider
that yields client and replace the await client.close() call with await
client.aclose()).

🧹 Nitpick comments (3)

backend/app/domain/replay/models.py (1)
11-19: Consider using datetime type for timestamp field.

The timestamp field is typed as str but stores ISO-formatted datetime strings. Other models in this file (e.g., ReplaySessionState.created_at, started_at, completed_at) use the datetime type directly. Using str here creates inconsistency and loses type safety benefits.
♻️ Suggested refactor
 class ReplayError(BaseModel):
     """Error details for replay operations."""

     model_config = ConfigDict(from_attributes=True)

-    timestamp: str
+    timestamp: datetime
     error: str
     type: str | None = None  # Present for session-level errors
     event_id: str | None = None  # Present for event-level errors
This would also require updating the call sites in replay_service.py to pass datetime objects directly instead of calling .isoformat().
backend/app/services/event_replay/replay_service.py (1)
174-177: Consider including type for event-level errors as well.

For consistency with _handle_session_error (line 149) and improved debugging, consider populating the type field with the exception class name.
♻️ Suggested refactor
         err = ReplayError(
-            timestamp=datetime.now(timezone.utc).isoformat(), event_id=str(event.event_id), error=str(error)
+            timestamp=datetime.now(timezone.utc).isoformat(),
+            event_id=str(event.event_id),
+            error=str(error),
+            type=type(error).__name__,
         )
backend/app/settings.py (1)

124-124: Document the requirement or hardcode the Redis decode setting.

Using Literal[True] is supported in pydantic-settings 2.5 and enforces the value at both type-check and runtime. However, this introduces a validation-time breaking change: any existing .env with REDIS_DECODE_RESPONSES=false will fail on startup.

Since decode_responses=True appears required by the Redis client instantiation in backend/core/providers.py, either:

Add a comment explaining why it must be True (e.g., downstream code assumes decoded strings).

Remove it as a configurable setting and hardcode decode_responses=True directly in the Redis client setup.

backend/app/core/providers.py

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/app/infrastructure/kafka/mappings.py (1)
38-45: Add missing NOTIFICATION_ALL_READ event type to mapping.

The EventType.NOTIFICATION_ALL_READ is defined in the enum (line 44 of events.py) but not included in EVENT_TYPE_TO_TOPIC. This causes it to fall back to SYSTEM_EVENTS (line 99), inconsistent with all other NOTIFICATION_* events that route to NOTIFICATION_EVENTS.
Proposed fix
     EventType.NOTIFICATION_READ: KafkaTopic.NOTIFICATION_EVENTS,
+    EventType.NOTIFICATION_ALL_READ: KafkaTopic.NOTIFICATION_EVENTS,
     EventType.NOTIFICATION_CLICKED: KafkaTopic.NOTIFICATION_EVENTS,

🤖 Fix all issues with AI agents

In `@backend/app/db/repositories/admin/admin_events_repository.py`:
- Line 146: The list comprehension building events_by_type can crash on unknown
DB values because EventType(t["_id"]) raises for invalid enums; update the logic
in the events_by_type construction (where EventTypeCount and EventType are used
with top_types) to safely convert or skip unknowns — e.g., check
EventType._value2member_map_.get(t["_id"]) (or use a try/except around
EventType(...)) and only create EventTypeCount for valid enum members, skipping
or logging legacy/unknown types so the stats endpoint doesn't fail.

In `@backend/app/db/repositories/event_repository.py`:
- Around line 247-254: The current conversion of aggregation keys to EventType
instances risks using an incorrect filter; replace any suggested membership
check against EventType.__members__.values() with a check against the enum
string values (e.g., test k in [e.value for e in EventType]) before calling
EventType(k), or simply rely on the existing Pydantic validation
(EventDocument(**data)) and leave the current EventType(k) conversion in the
events_by_type mapping as-is; update the logic around events_by_type (and any
related EventTypeCount construction) to use the correct membership check
(EventType) so invalid strings are filtered correctly.

In `@backend/app/services/user_settings_service.py`:
- Around line 72-76: Replace the wildcard subscription pattern with the exact
event type: update the call to bus.subscribe currently using
f"{EventType.USER_SETTINGS_UPDATED}*" to subscribe to
EventType.USER_SETTINGS_UPDATED directly; keep the existing _handle function and
its isinstance(evt, UserSettingsUpdatedEvent) check and ensure the returned
subscription id is still assigned to self._subscription_id and that _handle
continues to call self.invalidate_cache(evt.user_id) as before.

In `@docs/reference/openapi.json`:
- Around line 8494-8505: The examples for EventStatistics and related schemas
use uppercase EventType strings that don't match the EventType enum (which uses
lowercase); update all example values in the EventStatistics examples and any
EventTypeCountSchema/ServiceEventCountSchema example objects to use the
lowercase enum values (e.g., "request", "response", etc.) so generated SDKs and
clients get correct casing; search for examples referencing EventType in
EventStatistics, EventTypeCountSchema and ServiceEventCountSchema and replace
uppercase strings with the corresponding lowercase enum values consistently
across the file.

In `@frontend/src/lib/api/types.gen.ts`:
- Around line 3676-3680: The type for the field container_statuses has changed
from string to Array<ContainerStatusInfo>, so update any code that reads or
assigns this property to handle an array of ContainerStatusInfo objects rather
than a string; locate usages of container_statuses and adjust parsing/iteration
logic (e.g., map/filter loops, type annotations, and any JSON parsing) and
update any variable/type declarations or function signatures that consume it to
expect ContainerStatusInfo[] instead of string so callers of the affected types
(including functions referencing ContainerStatusInfo or container_statuses)
compile and handle the new structure correctly.

♻️ Duplicate comments (1)

backend/app/core/providers.py (1)
118-121: Use aclose() instead of close() for async Redis client cleanup.

This was previously flagged: close() is deprecated since redis-py 5.0.1 and is an alias for aclose(). The code should use the modern API.
🔧 Suggested fix
         try:
             yield client
         finally:
-            await client.close()
+            await client.aclose()

🧹 Nitpick comments (4)

backend/tests/e2e/services/rate_limit/test_rate_limit_service.py (1)
141-142: Consider testing via public API rather than private attributes.

Asserting on _requests (a private attribute by Python convention) couples this test to internal implementation details. If the metrics internals are refactored, this test could break even though the public behavior remains correct.

Since the metrics refactor introduced public recording methods (record_request, etc.), consider verifying wiring by invoking the public API or simply removing this assertion if the other tests already exercise the metrics path.
♻️ Alternative approach
-    # Verify metrics object has _requests counter for checks
-    assert hasattr(svc.metrics, '_requests')
+    # Verify metrics object exposes public recording API
+    assert callable(getattr(svc.metrics, 'record_request', None))
Or remove entirely if metric recording is already validated by other tests.
backend/app/schemas_pydantic/user_settings.py (1)
20-20: Avoid mutable default list for channels.
A list literal can be shared/mutated across instances; prefer a default_factory for isolation.
♻️ Suggested change
-    channels: list[NotificationChannel] = [NotificationChannel.IN_APP]
+    channels: list[NotificationChannel] = Field(default_factory=lambda: [NotificationChannel.IN_APP])
backend/tests/e2e/services/events/test_event_bus.py (1)
28-28: Prefer explicit enum values for the pattern/key.
Avoids reliance on Enum __str__/encode behavior and keeps the intent unambiguous.
♻️ Suggested change
-    await bus.subscribe(f"{EventType.USER_SETTINGS_UPDATED}*", handler)
+    await bus.subscribe(f"{EventType.USER_SETTINGS_UPDATED.value}*", handler)
...
-            key=EventType.USER_SETTINGS_UPDATED.encode("utf-8"),
+            key=EventType.USER_SETTINGS_UPDATED.value.encode("utf-8"),
Also applies to: 45-45
backend/app/services/rate_limit_service.py (1)

310-344: Consider handling missing rules in token bucket stats.

For sliding window stats (line 332-337), when no matching rule is found, limit is set to 0, resulting in remaining = max(0, 0 - count) = 0. However, for token bucket stats (lines 338-344), the bucket's stored tokens are used directly without considering whether a rule exists.

This inconsistency could be intentional (token bucket stores actual remaining tokens), but consider adding a comment explaining the difference, or returning None for endpoints without matching rules if that's more appropriate.

backend/app/db/repositories/admin/admin_events_repository.py

backend/app/db/repositories/event_repository.py

backend/app/services/user_settings_service.py

docs/reference/openapi.json

frontend/src/lib/api/types.gen.ts

sonarqubecloud · 2026-01-26T17:27:18Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

new types, updated rate limit service

f87ca84

cubic-dev-ai bot reviewed Jan 25, 2026

View reviewed changes

coderabbitai bot reviewed Jan 25, 2026

View reviewed changes

backend/app/core/providers.py Show resolved Hide resolved

more typed stuff, also updated openapi spec and regenerated types

f890c67

coderabbitai bot reviewed Jan 26, 2026

View reviewed changes

frontend linting fix

b8dda69

HardMax71 merged commit 5e71fbf into main Jan 26, 2026
24 checks passed

HardMax71 deleted the fix/more-typed-objs branch January 26, 2026 17:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new types, updated rate limit service #102

new types, updated rate limit service #102

Uh oh!

HardMax71 commented Jan 25, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 25, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud bot commented Jan 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

new types, updated rate limit service #102

new types, updated rate limit service #102

Uh oh!

Conversation

HardMax71 commented Jan 25, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud bot commented Jan 26, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HardMax71 commented Jan 25, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 25, 2026 •

edited

Loading