fix(auth): retire rate_limit_until, unify on cachedQuota single truth#492
Merged
Conversation
The proxy used to maintain two independent signals for "this account is
locked": entry.usage.rate_limit_until (local lock from 429 retry-after)
and entry.cachedQuota.<bucket>.limit_reached (passive collection from
upstream rate_limits headers). The two drift because four callers of
markStatus(_, "active") in refresh-scheduler / account-mutation flip the
status away from "rate_limited" without clearing rate_limit_until. The
orphaned field then outlives the upstream window: dashboard shows
"已限速" while cachedQuota reads 0% used, and free-plan users with a 7-day
primary bucket can stay locked for a week past the actual reset.
Single-source-of-truth fix:
- Delete markRateLimited / clearRateLimit / markQuotaExhausted; add
applyRateLimit429() that writes the 429 retry-after into
cachedQuota.rate_limit.{limit_reached=true, reset_at}. Never shrinks
an existing reset_at — next passive header collection corrects the
bucket if the 429 was actually secondary.
- Drop "rate_limited" from AccountStatus enum. status is now a pure
rotation marker; pool exclusion flows through hasReachedCachedQuota,
which already checks all 3 buckets (primary / secondary / code_review).
- refreshStatus loses its rate-limit clearing branch (resetExpiredQuotaWindow
already auto-clears limit_reached when reset_at passes).
- isAuthenticated and getPoolSummary consult cachedQuota too, so an
all-exhausted pool no longer falsely reports authenticated.
- accounts.json one-shot migration: migrateLegacyRateLimit in
loadPersisted converts legacy status="rate_limited" + rate_limit_until
to status="active" + synthesized cachedQuota.rate_limit (only when
the local lock is fresher than cachedQuota). Next persist drops the
legacy field.
- Dashboard derivedStatus(account) helper renders "rate_limited" badge
from cachedQuota.*.limit_reached, fixing the "已活跃 + 7d 已达上限"
contradiction for accounts with secondary exhaustion.
Diagnostic findings folded into the changelog:
- Native rust transport upgraded "Stream error: {e}" to error_chain(&e)
so hyper cause chains are visible. Confirmed the long-running
"premature stream close" reports as
"error decoding response body: unexpected EOF during chunk size line"
— upstream cuts HTTP/1.1 chunked mid-chunk after ~120s of reasoning
without output_text.
- UpstreamPrematureCloseError introduced (codex-event-extractor.ts) and
collectCodexResponse throws it instead of EmptyResponseError when the
stream never emits a terminal event. proxy-handler returns 504
fail-fast rather than burning 3 accounts on cross-account retry.
Tests: full suite 1927 green; npx tsc --noEmit clean. Migration smoke
on live data confirmed miles.halloway's orphan lock cleared and
a1432350557's weekly-exhausted status now surfaces correctly in
dashboard.
Address review feedback on #492: - web/src/components/AccountTable.tsx: drop the unused `derivedStatus` import. AssignmentAccount has no `quota` field so the helper would always pass through to `acct.status` — the import was dead. Replace with a comment explaining why the table renders raw status while AccountList/AccountCard receive full Account and do derive. - Revert native/src/lib.rs + native/codex-tls.darwin-arm64.node to origin/dev. The rust error_chain upgrade is diagnostic-only (helps inspect hyper cause chains in stderr) and was rebuilt only for darwin-arm64. Shipping it without rebuilt win32/linux binaries would leave non-Mac operators on the old short error format — inconsistent across platforms. Ship the rust change separately with full per-platform binary rebuilds. - CHANGELOG: trim the rust-specific callout from the UpstreamPrematureCloseError entry to match the slimmer scope of this PR (the user-visible 504 fail-fast and proxy-handler logic stay; the diagnostic upgrade is gone). Tests + tsc green.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Eliminate the dual-source-of-truth between local
rate_limit_until+status === "rate_limited"andcachedQuota.<bucket>.limit_reachedthat has been causing stale locks. The two signals drift apart whenevermarkStatus(_, "active")is called from refresh-scheduler / account-mutation (which never clearrate_limit_until), leaving accounts locally locked long after the upstream window has actually reset. Visible to users as "已限速 badge + 0% 已使用" contradiction; for free-plan accounts (7-day primary window), the orphan lock could persist for up to a week past the real reset.cachedQuotais now the only signal. All "is this account exhausted" decisions flow throughhasReachedCachedQuota(which already covers all 3 buckets — primary / secondary / code_review).While auditing this we also pinned down the long-running "premature stream close" complaint: native rust transport now emits the full reqwest cause chain, which revealed upstream chatgpt.com hard-cuts HTTP/1.1 chunked encoding mid-chunk after ~120s of reasoning without
output_text. Surface this distinctly so cross-account empty-response retry doesn't burn 3 accounts × 120s on a deterministic upstream cap.Changes
Backend (
src/auth/):markRateLimited/clearRateLimit/markQuotaExhausted; addapplyRateLimit429()that writes intocachedQuota.rate_limit.{limit_reached=true, reset_at}and never shrinks an existing reset_at (src/auth/account-registry.ts,src/auth/account-pool.ts)."rate_limited"fromAccountStatusenum; status is now pure rotation marker (src/auth/types.ts).refreshStatusloses its rate-limit clearing branch —resetExpiredQuotaWindowalready auto-clearslimit_reachedwhenreset_atpasses (src/auth/account-registry.ts).isAuthenticated+getPoolSummaryconsulthasReachedCachedQuotaso an all-exhausted pool reports correctly.isQuotaExhausted(quota)helper (src/auth/quota-skip.ts).accounts.jsonmigration inloadPersistedviamigrateLegacyRateLimit— converts legacystatus="rate_limited" + rate_limit_untiltostatus="active" + synthesized cachedQuota.rate_limitonly when the local lock is fresher than cachedQuota (src/auth/account-persistence.ts).Routes / proxy plumbing:
src/routes/shared/proxy-error-handler.ts).src/routes/shared/proxy-handler.ts).Premature-close diagnostics (bonus, surfaced during the audit):
UpstreamPrematureCloseError(src/translation/codex-event-extractor.ts);collectCodexResponsethrows it when stream ends withoutresponse.completed/failed/errorterminal event. proxy-handler returns 504 fail-fast instead of triggering empty-response cross-account retry (src/translation/codex-to-openai.ts).eprintln!("Stream error: {e}")toerror_chain(&e)so hyper cause chains are visible in stderr (native/src/lib.rs).Dashboard:
derivedStatus(account)+isQuotaExhausted(quota)helper (web/src/lib/accountStatus.ts).AccountCard,AccountListfilter dropdown + counts,AccountTable,AccountManagementfilter ladder all wired to render"rate_limited"badge based on cachedQuota — fixes the "已活跃 + 7d 已达上限" contradiction.Tests:
tests/unit/auth/account-pool-rate-limit-429.test.ts(9 cases) covering primary-bucket write, never-shrink rule, hasReachedCachedQuota exclusion, status not mutated, request counting, auto-clear via resetExpiredQuotaWindow.tests/unit/auth/account-persistence-migration.test.ts(7 cases) covering legacy-future-lock synthesis, fresh-cachedQuota-trust, stale-cachedQuota-overwrite, orphan-past-lock cleanup.account-pool-quota,account-pool-config-di,account-pool,account-pool-has-available,account-pool-sticky,account-routing,proxy-handlerintegration,quota-refreshe2e,secondary-quotastress,proxy-error-handlerunit) to assert viaapplyRateLimit429+isQuotaExhausted(account.quota)instead of the retired fields.CHANGELOG
[Unreleased] → Fixedentries cover both the dual-truth fix (with free-plan severity callout) and the premature-close 504 fail-fast.Test Plan
npm test— 1927 passed, 1 skipped, 0 failed (1 pre-existing flaky usage-stats history test confirmed unrelated viagit stashbaseline check)npx tsc --noEmit— cleannpm run build— vite bundle + tsc both cleandata/accounts.jsonto.bak-pre-refactor, restarted prod proxy, confirmed viacurl /auth/accounts | jqthat previously-lockedmiles.hallowayhad orphanrate_limit_untilcleared anda1432350557correctly reportssecondary_rate_limit.limit_reached=truefrom cachedQuotaderivedStatusreturns"rate_limited"for thea1432350557-style account (status=active + secondary.limit_reached=true) — fixes Bug Bnpm run test:realnot run this session (network-bound upstream test)Notes
rate_limit_until: nullandstatus="active"on previously-locked accounts (brief lock amnesia → handful of extra 429s, acceptable for a one-time migration).UpstreamPrematureCloseErroris a new client-visible status code for a previously-502 condition. Codex CLI and Claude Code both treat 504 as retryable, so end-user behavior should improve (no longer burns 3 accounts before reporting).