Skip to content

fix: recover auto chat mode when quota state stalls#472

Open
double2tea wants to merge 1 commit intochenyme:mainfrom
double2tea:fix/auto-chat-quota-recovery
Open

fix: recover auto chat mode when quota state stalls#472
double2tea wants to merge 1 commit intochenyme:mainfrom
double2tea:fix/auto-chat-quota-recovery

Conversation

@double2tea
Copy link
Copy Markdown

@double2tea double2tea commented Apr 13, 2026

Summary

  • add a shared account-selection helper for chat handlers
  • fall back from AUTO to FAST and then EXPERT when chat quota state is stale
  • trigger one throttled on-demand quota refresh before returning No available accounts
  • cover the new selection behavior with focused unit tests

Problem

Some deployments can get stuck returning No available accounts for this model tier for chat requests using AUTO models even though quota becomes available again after a manual refresh or a process restart.

In practice there were two missing recovery paths:

  • chat handlers treated AUTO as a hard mode and did not try other chat windows that were still available
  • when the in-memory/runtime quota state said no account was available, request handling returned immediately instead of forcing a fresh quota sync and retrying once

This makes the service dependent on long periodic refresh intervals or manual intervention.

What This Changes

1. Chat-mode fallback

For chat models whose public mode is AUTO, account selection now tries:

  • AUTO
  • FAST
  • EXPERT

This behavior is enabled by default and guarded by:

  • features.auto_chat_mode_fallback = true

2. Empty-pool refresh retry

If account selection still finds no candidate, the service now:

  • runs one throttled refresh_on_demand()
  • retries account selection once

This behavior is enabled by default and guarded by:

  • features.on_empty_retry_enabled = true

3. Shared implementation

The recovery logic is centralized in a shared products-layer helper so OpenAI Chat Completions, OpenAI Responses, and Anthropic Messages stay consistent.

Why This Is Safe

  • the fallback is limited to chat requests only
  • image/video behavior is unchanged
  • the on-demand refresh path is already throttled by existing refresh-service logic
  • the feature flags keep the behavior configurable

Verification

  • ./.venv/bin/python -m unittest tests.test_account_selection
  • ./.venv/bin/python -m py_compile app/products/_account_selection.py tests/test_account_selection.py app/products/openai/chat.py app/products/openai/responses.py app/products/anthropic/messages.py

Notes

I observed this against a live deployment where:

  • cached AUTO quota had stalled at zero
  • FAST / EXPERT quota was still available
  • a manual refresh or service restart restored AUTO

This patch addresses both the stale-state case and the no-self-recovery case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant