fix(opencode): classify provider API rejections as a terminal cause and stop overwriting their message#1468
fix(opencode): classify provider API rejections as a terminal cause and stop overwriting their message#1468Astro-Han wants to merge 1 commit into
Conversation
…nd stop overwriting their message A DeepSeek direct account in arrears returns 402 "Insufficient Balance", but it surfaced as "Connection lost. Please check whether the last operation completed before resending." Two of the three root-cause layers live here (the third, 402 classification, is PR #1466): 1. run-incident had only watchdog/transport terminal causes, so every provider API rejection was recorded as provider_transport_disconnect. 2. halt() overwrote the real provider message with a generic connection-lost recovery string. Changes: - New provider_api_error TerminalCause (run-incident) and Classification (run-observability), subcategory from providerFailure.kind. Bumps both RUN_INCIDENT_SCHEMA_VERSION and RunObservability SCHEMA_VERSION to 2; export.test.ts schema/version assertions updated. - The processor parses the failure once and passes providerFailure (with HTTP evidence) to the recorder, which routes a real provider API rejection to provider_api_error instead of defaulting it to a transport disconnect. classificationForIncident and retrySafetyFor gain provider_api_error branches (retryable=false -> provider_terminal_failure, aligned with #1118). - recoveryFor: a terminal provider API error stops with reason provider_api_error (out of the connection-lost set); a retryable one (rate_limit / server_overload) flows through the existing auto-retry tree. - The terminal halt no longer overwrites a provider API rejection's real message with the connection-lost recovery string. Lifecycle-close and user-cancel halts keep their authoritative interruption messages. - isProviderApiError gates the catch-all "unknown" kind on HTTP evidence (status code or response body) so a wrapped connection failure is not mislabeled a provider API error. Combined with PR #1466 (402 -> quota_exhausted), a billing failure now surfaces with its real provider message instead of "Connection lost". Refs #1105, #1123. Claude-Session: https://claude.ai/code/session_015bW9JQSkuB156gkNQdxCzi
📝 WalkthroughWalkthroughIntroduces a ChangesProvider API Error Classification and Retry Routing
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint install failed. For unrecoverable errors, disable the tool in CodeRabbit configuration. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/opencode/src/session/processor.ts (1)
1732-1760: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick winSingle-parse design with structured evidence forwarding.
The rework correctly parses the error once and extracts
providerFailurewith all necessary fields (kind,code,statusCode,hasResponseBody) for downstream routing. ThehasResponseBodycheck is appropriately defensive (empty string or non-string types yieldfalse).♻️ Optional: Extract inline type for clarity
The inline type annotation at lines 1734-1736 could be extracted to a named type if this return shape is referenced elsewhere or grows more complex:
type RetrySignalProviderFailure = { kind: ProviderFailureKind code?: string statusCode?: number hasResponseBody?: boolean }However, since this is a local return type and not referenced elsewhere, the current inline form is acceptable.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opencode/src/session/processor.ts` around lines 1732 - 1760, The current implementation with the inline type annotation for the return shape (containing kind, code, statusCode, and hasResponseBody fields) is acceptable and correctly handles the providerFailure extraction with defensive checks. The refactoring to extract this inline type definition into a named type like RetrySignalProviderFailure is purely optional for improved code clarity and maintainability, and should only be pursued if this return type shape is referenced elsewhere in the codebase or if the team prefers explicit named types for consistency. No mandatory changes are required.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@packages/opencode/src/session/processor.ts`:
- Around line 1732-1760: The current implementation with the inline type
annotation for the return shape (containing kind, code, statusCode, and
hasResponseBody fields) is acceptable and correctly handles the providerFailure
extraction with defensive checks. The refactoring to extract this inline type
definition into a named type like RetrySignalProviderFailure is purely optional
for improved code clarity and maintainability, and should only be pursued if
this return type shape is referenced elsewhere in the codebase or if the team
prefers explicit named types for consistency. No mandatory changes are required.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 012387f9-11d0-4371-ae8f-21eb52dc6c8c
📒 Files selected for processing (12)
packages/opencode/src/provider/error.tspackages/opencode/src/session/processor.tspackages/opencode/src/session/run-incident/derive.tspackages/opencode/src/session/run-incident/index.tspackages/opencode/src/session/run-incident/policy.tspackages/opencode/src/session/run-incident/presentation.tspackages/opencode/src/session/run-incident/types.tspackages/opencode/src/session/run-observability/recorder.tspackages/opencode/src/session/run-observability/types.tspackages/opencode/test/session/export.test.tspackages/opencode/test/session/processor-effect.test.tspackages/opencode/test/session/run-observability.test.ts
Summary
Adds a
provider_api_errorterminal cause so a provider's API-level rejection is classified as what it is, and stopshalt()from overwriting that rejection's real message with a generic connection-lost string. This is the run-incident + render half of the error-classification work; the classification half (402 →quota_exhausted) is PR #1466.provider_api_erroris added torun-incident'sTerminalCauseand torun-observability'sClassification, withsubcategorytaken fromproviderFailure.kind(auth/rate_limit/quota_exhausted/server_overload/invalid_request/unknown). BothRUN_INCIDENT_SCHEMA_VERSIONandRunObservabilitySCHEMA_VERSIONbump to2; the export pipeline preserves each existing package's own version (export.ts), so old bundles still read back as1.providerFailure(with HTTP evidence) intorecordAttemptFailureAndDeriveRecovery. The recorder routes a real provider API rejection toprovider_api_errorinstead of defaulting it toprovider_transport_disconnect. The recorder does not parse a third time.classificationForIncidentandretrySafetyForgainprovider_api_errorbranches:retryable === false→do_not_auto_retry/provider_terminal_failure, aligned with the fix(opencode): stop recommending auto-retry for terminal provider failures #1118retry_safetyaxis; a retryable one stayscandidate_safe_auto_retry.recoveryForstops a terminal provider API error with reasonprovider_api_error(kept out of the connection-lost reason set); a retryable provider API error (rate_limit/server_overload) flows through the existing auto-retry tree viaretryableProviderFailure. The terminalhalt()no longer overwrites a provider API rejection's real message with the connection-lost recovery string — while lifecycle-close and user-cancel halts keep their authoritative interruption messages.isProviderApiErrorgates the catch-allunknownkind on HTTP evidence (a status code or a response body). A wrapped connection failure — e.g. anAPICallErrorwith no HTTP response, or a DNS failure the stream classifier didn't recognize — stays on the transport path rather than being mislabeled a provider API error. Explicit kinds are self-evidencing.Why
A Windows user's DeepSeek-direct account in arrears returned
402 {"error":{"message":"Insufficient Balance",...}}, but it surfaced as "Connection lost. Please check whether the last operation completed before resending." Three layers caused this: (1) 402 classified asunknown; (2) run-incident had no provider-API category, so it defaulted toprovider_transport_disconnect; (3)halt()overwrote the real message with a recovery string. Layers 2 and 3 are fixed here; layer 1 is PR #1466. Combined, a billing failure now surfaces with its real provider message instead of "Connection lost".Related Issue
Refs #1105 (classification spine), #1123 (classify-then-render umbrella).
Human Review Status
Pending
Review Focus
isProviderApiError): explicit kinds always route;unknownonly routes with HTTP evidence so a wrapped connection failure isn't mislabeled.transport_disconnectanddecompressionstay on the transport path.recoveryForsplit: terminal provider errors stop; retryable ones (rate_limit/server_overload) must still auto-retry throughretryableProviderFailure— confirm the existing retry tests still pass.2; the decode path preserves each bundle's own version, and the only freshly-stamped assertions updated are inexport.test.ts.Risk Notes
Data-contract change (intentional, tested): two schema versions bump to
2and two enums gain aprovider_api_errorvalue. Additive and backward-compatible — old bundles carry and keep their ownschema_version: 1; the enums are not pinned byz.literal, so widening them does not reject old data. No SDK regeneration: these diagnostics enums are not exposed insdk/js. No UI in this PR (per-kind cards and i18n are a follow-up). Behavior change: a terminal provider API error no longer auto-retries and shows its real message; a retryable provider API error keeps retrying exactly as before.How To Verify
Pinned: 402 (kind
unknown, status 402) →provider_api_error, real message preserved;quota_exhaustedretryable=false →do_not_auto_retry/provider_terminal_failure;rate_limitretryable=true still auto-retries;transport_disconnect/decompression/ unknown-without-evidence → transport path; lifecycle-close halt keeps its message.Screenshots or Recordings
N/A — backend classification + message-passthrough change, no UI.
Checklist
bug,enhancement,task,documentation. Type labels are author-added; the labeler bot does NOT assign them. Add the label in the GitHub UI, then tick this.app,ui,platform,harness,ci. The labeler bot assigns these on PR open based on changed paths. Confirm the bot's choice (or override if wrong), then tick this.P0,P1,P2,P3. The priority-triage bot suggests one on PR open. Confirm or override, then tick this.Pending,Approved by @<reviewer>, orNot required: <reason>(default isPending; "not required" is restricted to bot-authored low-risk PRs).dev, and my PR title and commit messages use Conventional Commits in English.https://claude.ai/code/session_015bW9JQSkuB156gkNQdxCzi
Summary by CodeRabbit
New Features
Improvements