Skip to content

fix(opencode): classify provider billing/quota failures instead of masking them#1466

Open
Astro-Han wants to merge 1 commit into
devfrom
claude/i1105-provider-failure-classify
Open

fix(opencode): classify provider billing/quota failures instead of masking them#1466
Astro-Han wants to merge 1 commit into
devfrom
claude/i1105-provider-failure-classify

Conversation

@Astro-Han

@Astro-Han Astro-Han commented Jun 22, 2026

Copy link
Copy Markdown
Owner

Summary

Fixes the first defect behind the "Connection lost. Please check whether the last operation completed before resending." misreport: provider billing/quota failures (DeepSeek 402 "Insufficient Balance" and friends) were classified as unknown, and the nested provider message was dropped — so the real reason never reached the UI. This PR is classification-only (packages/opencode/src/provider/error.ts); the run-incident halt overwrite that finishes masking the message is PR2.

Changes in provider/error.ts:

  • 402 → quota_exhausted — account can no longer pay = same user action as a depleted quota (top up or switch model); no new payment_required kind.
  • Strong/weak BILLING_PATTERNS so billing failures reported under inconsistent statuses still classify as quota_exhausted. Strong patterns (insufficient balance / out of credits / payment required / arrears / 余额不足 / 欠费) match unconditionally; weak patterns (quota exceeded / billing issue) only on a billing-shaped status {400,402,403} with no rate-limit signal. 429 is excluded — it is Too Many Requests, so "quota exceeded" wording (e.g. Google's per-minute request quota) stays a retryable rate_limit.
  • Exclude opencode FreeUsageLimitError from billing: it must stay a retry-time free_quota_exhausted concept (countdown card), never the terminal quota_exhausted kind, which would stop classifyRetry before the free-quota branch.
  • Fix message() to prefer the nested {error:{message}} string so DeepSeek's "Insufficient Balance" surfaces instead of dumping the raw body (body.error is an object and short-circuited the old || chain).
  • Unify provider code extraction (error.code / code / error.type / Google-style status) behind extractProviderCode.
  • parseStreamError now classifies known auth (authentication_error / invalid_api_key / permission_denied) and rate-limit (rate_limit_exceeded / too_many_requests / rate_limited) codes, plus strong-billing messages.
  • Typed-error middle path: a typed ({type:"error"} + error object) provider error body with an unhandled code now becomes a structured APIError(kind="unknown") preserving code/responseBody for the frontend, instead of an opaque UnknownError. A bare {code} body is not upgraded (indistinguishable from a Node runtime error like EACCES) and stays UnknownError. Retryability is read from the code only (never the free-text message), so a terminal error whose message merely mentions "unavailable" is not wrongly retried; a transient code (resource_exhausted/unavailable/overloaded/rate-limit) stays retryable, and a typed FreeUsageLimitError stays retryable so classifyRetry routes it to free_quota_exhausted.

resource_exhausted retry semantics are intentionally left unchanged.

Why

A Windows user on DeepSeek (out of balance) saw repeated "Connection lost" messages. Root cause is a 3-layer defect; the classification layer is fixed here: 402 fell through apiCallErrorKind to unknown, and message() dropped the nested "Insufficient Balance" string. The goal is to classify all common provider error types correctly and pass the real reason through, not just patch 402.

Related Issue

Refs #1123 (classify-then-render umbrella), #1105 (classification spine).

Human Review Status

Pending

Review Focus

  • The retry-safety invariants: FreeUsageLimitError must stay free_quota_exhausted; resource_exhausted must stay retryable; a terminal quota_exhausted code must not be retried.
  • The strong/weak billing split and the 429 exclusion (false-positive risk).
  • The typed-vs-bare middle-path boundary (typed upgrades to APIError; bare stays UnknownError).

Risk Notes

Behavior/contract change (intentional, covered by tests): typed provider error bodies with unhandled codes now serialize as APIError(kind="unknown") instead of UnknownError; two existing guard tests were updated to the new contract, with the meaningful guarantee (such errors stay non-retryable) preserved. No data migration. No UI in this PR (Risk Notes "Screenshots" item N/A — backend classification only).

How To Verify

bun test test/session/message-v2.test.ts src/session/retry.test.ts → 105 pass, 0 fail
bun test test/session src/session test/provider src/provider → 1332 pass, 4 skip, 1 todo, 0 fail
tsgo --noEmit (packages/opencode) → clean
codex review (xhigh, 4 rounds) → final fresh-eye pass: no P0/P1/P2/P3

Key regressions pinned: 402 → quota_exhausted; DeepSeek "Insufficient Balance" at 402 and 400 → quota_exhausted; FreeUsageLimitError 429 stays rate_limit and routes to free_quota_exhausted; 429 "quota exceeded" stays rate_limit; typed terminal quota_exhausted code not retried; "unavailable" in message stays non-retryable; bare Node EACCES error stays UnknownError; untyped envelope stays UnknownError.

Screenshots or Recordings

N/A — backend error-classification change, no UI.

Checklist

  • Type label — this PR carries exactly one of bug, enhancement, task, documentation. Type labels are author-added; the labeler bot does NOT assign them. Add the label in the GitHub UI, then tick this.
  • Routing labels — this PR carries at least one of app, ui, platform, harness, ci. The labeler bot assigns these on PR open based on changed paths. Confirm the bot's choice (or override if wrong), then tick this.
  • Priority label — this PR carries exactly one of P0, P1, P2, P3. The priority-triage bot suggests one on PR open. Confirm or override, then tick this.
  • Human Review Status above is set to Pending, Approved by @<reviewer>, or Not required: <reason> (default is Pending; "not required" is restricted to bot-authored low-risk PRs).
  • I linked the related issue, or stated in Summary why there is no issue.

https://claude.ai/code/session_015bW9JQSkuB156gkNQdxCzi

Summary by CodeRabbit

  • Bug Fixes

    • Improved detection and classification of billing and quota exhaustion errors
    • Enhanced authentication error handling in API responses
    • Improved error message extraction from API responses
    • Fixed retryability logic for various error scenarios
  • Tests

    • Expanded error classification test coverage for provider API failures

…sking them

A DeepSeek account out of balance returns 402 `{"error":{"message":
"Insufficient Balance",...}}`, but it surfaced to users as "Connection lost.
Please check whether the last operation completed before resending." The
first defect is in classification: 402 fell through to `unknown` and the
nested provider message was dropped, so the real reason never reached the UI.

Provider error classification (provider/error.ts):
- Map 402 -> quota_exhausted (account can no longer pay = same user action as
  a depleted quota: top up or switch model; no new payment_required kind).
- Add strong/weak BILLING_PATTERNS so billing failures providers report under
  inconsistent statuses still classify as quota_exhausted. Strong patterns
  (insufficient balance / out of credits / payment required / arrears / 余额不足
  / 欠费) match unconditionally; weak patterns (quota exceeded / billing issue)
  only on a billing-shaped status {400,402,403} with no rate-limit signal. 429
  is excluded: it is Too Many Requests, so "quota exceeded" wording (e.g.
  Google's per-minute request quota) stays a retryable rate_limit.
- Exclude opencode FreeUsageLimitError from billing: it must stay a retry-time
  free_quota_exhausted concept (countdown card), never the terminal
  quota_exhausted kind which would stop classifyRetry before the free-quota
  branch.
- Fix message(): prefer the nested {error:{message}} string so DeepSeek's
  "Insufficient Balance" is surfaced instead of dumping the raw body (body.error
  is an object and short-circuited the old `||` chain).
- Unify provider code extraction (error.code / code / error.type / Google-style
  status) behind extractProviderCode.
- parseStreamError: classify known auth (authentication_error / invalid_api_key
  / permission_denied) and rate-limit (rate_limit_exceeded / too_many_requests
  / rate_limited) codes, and strong-billing messages.
- PR1c middle path: a typed ({type:"error"} + error object) provider error body
  with an unhandled code now becomes a structured APIError(kind="unknown")
  preserving code/responseBody for the frontend, instead of an opaque
  UnknownError. A bare {code} body is NOT upgraded (indistinguishable from a
  Node runtime error like EACCES) and stays UnknownError. Retryability is read
  from the code only (never free-text message), so a terminal error whose
  message mentions "unavailable" is not wrongly retried; a transient-looking
  code (exhausted/unavailable/rate limit) stays retryable, and a typed
  FreeUsageLimitError stays retryable so classifyRetry routes it to
  free_quota_exhausted. Untyped nested envelopes still stay UnknownError.

resource_exhausted retry semantics are intentionally left unchanged.

Tests (message-v2.test.ts, retry.test.ts): 402 -> quota_exhausted; DeepSeek
"Insufficient Balance" at 402 and 400 -> quota_exhausted; FreeUsageLimitError
429 stays rate_limit; 429 "quota exceeded" stays rate_limit; stream
auth/rate/billing codes; typed-unknown -> APIError(kind=unknown) with retry
verdict from code; "unavailable" in message stays non-retryable; bare Node
EACCES error stays UnknownError; typed FreeUsageLimitError stream routes to
free_quota_exhausted; untyped envelope stays UnknownError.

Refs #1105 #1123

Claude-Session: https://claude.ai/code/session_015bW9JQSkuB156gkNQdxCzi
@Astro-Han Astro-Han added the bug Something isn't working label Jun 22, 2026
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: e28d34e2-63d4-495b-83fa-93be117ab949

📥 Commits

Reviewing files that changed from the base of the PR and between f433411 and f6b56a6.

📒 Files selected for processing (4)
  • packages/opencode/src/provider/error.ts
  • packages/opencode/src/session/retry.test.ts
  • packages/opencode/test/session/message-v2.test.ts
  • packages/opencode/test/session/retry.test.ts

📝 Walkthrough

Walkthrough

packages/opencode/src/provider/error.ts gains new billing/transient classification helpers (billingKindFor, extractProviderCode, strong/weak billing regex sets), maps HTTP 402 to quota_exhausted, updates message() to prefer nested body.error.message, expands parseStreamError with auth/rate-limit/billing branches, and applies a billing override in parseAPICallError. Tests are updated throughout to reflect the changed classification behavior.

Changes

Provider error classification improvements

Layer / File(s) Summary
Core classification helpers, 402 mapping, and message parsing
packages/opencode/src/provider/error.ts
Maps HTTP 402 to quota_exhausted in apiCallErrorKind. Adds billingKindFor (strong/weak billing regex with rate-limit exclusion), extractProviderCode (multi-shape payload extraction), and transient-code detection. Updates message() to prefer body.error.message over top-level fields.
parseStreamError: auth, rate-limit, and billing classification
packages/opencode/src/provider/error.ts
Derives code from the resolved typed error object only. Adds explicit auth branches for auth-related codes, rate_limit branches for rate-limit code variants, and an early billingKindFor-based override. Aligns unknown fallback isRetryable with isFreeUsageLimit/looksTransientCode.
parseAPICallError: extractProviderCode and billing override
packages/opencode/src/provider/error.ts
Replaces direct body.error.code access with extractProviderCode(body). Computes billingKind via billingKindFor (including response headers and status code) and uses it to override the final kind when billing-shaped.
Test coverage for new classification rules
packages/opencode/src/session/retry.test.ts, packages/opencode/test/session/message-v2.test.ts, packages/opencode/test/session/retry.test.ts
Updates expectation for typed unknown-code stream errors from UnknownError to APIError(unknown). Adds HTTP 402 → quota_exhausted test case. Adds a comprehensive PR1 classification completeness suite covering DeepSeek 402/400 balance failures, FreeUsageLimitError routing, auth/rate-limit/billing stream bodies, retryability rules, and over-match regression guards.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Astro-Han/pawwork#302: Modifies parseStreamError in the same packages/opencode/src/provider/error.ts file, specifically the stream error parsing and inner-message payload handling that this PR further expands.
  • Astro-Han/pawwork#1108: Introduces the providerFailure.kind/code vocabulary and the base apiCallErrorKind/parseStreamError population that this PR extends with billing overrides, auth branches, and defensive code extraction.

Poem

🐰 A 402 once slipped right through,
No quota_exhausted — that just wouldn't do!
Now billing regexes, strong ones and weak,
Find every depleted balance we seek.
Auth codes, rate limits, all in their place —
The rabbit hops fast through each error case! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: fixing provider billing/quota failure classification instead of masking them as unknown errors.
Description check ✅ Passed The description comprehensively covers all required template sections: Summary, Why, Related Issue, Human Review Status, Review Focus, Risk Notes, How To Verify, Screenshots/Recordings, and a fully-ticked Checklist.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/i1105-provider-failure-classify

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint install failed. For unrecoverable errors, disable the tool in CodeRabbit configuration.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added harness Model harness, prompts, tool descriptions, and session mechanics P2 Medium priority labels Jun 22, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested priority: P2 (includes non-doc, non-test paths outside the low-risk bucket).

P1/P0 are reserved for maintainer confirmation. Please relabel manually if this is a release blocker, security issue, data-loss risk, or updater/runtime failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working harness Model harness, prompts, tool descriptions, and session mechanics P2 Medium priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant