fix(deepseek-web): fix SSE parser, prompt format, and error handling#2502
fix(deepseek-web): fix SSE parser, prompt format, and error handling#2502ovehbe wants to merge 48 commits into
Conversation
…auth gaps, error sanitization, resilience - plugin.mjs: replace execSync string-template with spawnSync arg-array (command injection) - electron/main.js: gate unsafe-eval on !app.isPackaged instead of URL substring; merge duplicate connect-src - budget/bulk, resilience/reset: add requireManagementAuth guard - gemini-web, claude-web, copilot-web executors: route catch-block messages through sanitizeErrorMessage - oauth route, agents/tasks routes: route catch-block errors through sanitizeErrorMessage - codex.ts: return null (not error-object) from refreshCredentials on failure - tokenRefresh.ts: safe unknown-error access in catch block - combo.ts: reset exhaustedProviders per set-retry iteration so providers get a second chance - circuitBreaker.ts: persist and restore lastFailureKind via options JSON column
…souzapw#2433) Integrated into release/v3.8.1 — applied review suggestion (deduplicate removeHeaderCaseInsensitive)
…ity client profile fix by @Gi99lin
…diegosouzapw#2439) Integrated into release/v3.8.1
…yle auth (diegosouzapw#2436) Integrated into release/v3.8.1
…ey (diegosouzapw#2437) Integrated into release/v3.8.1
…iegosouzapw#2438) Integrated into release/v3.8.1
…osouzapw#2440) Integrated into release/v3.8.1
Integrated into release/v3.8.1
…diegosouzapw#2407) openAIToClaude now extracts developer-role messages into systemParts (alongside system role) and filters them from nonSystemMessages, so identity context injected via the Responses API developer role reaches the Claude system field instead of silently becoming an assistant turn.
…-up turns - map is_error to status "error"/"success" instead of hardcoding "success" - add serializeToolResultContent() to handle image/JSON content blocks; avoids sending text:\"\" which Kiro rejects as improperly formed - use deterministic uuidv5 for toolUseId when tool_call has no id, preventing id mismatch between assistant toolUse and subsequent tool_result Closes diegosouzapw#2446
…diegosouzapw#2447) fix(translator): fix Kiro 400 on follow-up turns carrying tool_result
Adds an opt-in toggle for Anthropic Fast Mode (speed:"fast") on the
Settings > AI page, mirroring the existing CodexFastTierTab pattern. The
behavior pairs with a CLIProxyAPI build that opts SDK-shaped traffic into
the Fast Mode entrypoint; OmniRoute signals the intent via a single
outbound header. Builds of CPA that do not recognize the header forward
it harmlessly.
- New helper: src/lib/providers/claudeFastMode.ts exposes
isClaudeFastModeEnabled / getClaudeFastModeSupportedModels /
shouldRequestClaudeFastMode plus the CPA_FORCE_FAST_MODE_HEADER
constant ("X-CPA-Force-Fast-Mode").
- New Card component: ClaudeFastModeTab.tsx — same Card+Toggle shape
as CodexFastTierTab.tsx, with a collapsible per-model checkbox group
(defaults to claude-opus-4-7 and claude-opus-4-6, matching the binary
KT() gate observed in claude-code v2.1.145).
- Schema: claudeFastMode { enabled?, supportedModels? } added to
updateSettingsSchema and to the getSettings() defaults.
- Middleware: buildUpstreamHeadersForExecute in chatCore.ts now sets
X-CPA-Force-Fast-Mode: 1 when provider === "claude" and the model
matches the configured supportedModels list.
- i18n: 6 new keys added with hand translations for en/fr/es/de and
__MISSING__: sentinels for the remaining 38 locales so a subsequent
scripts/i18n/sync-ui-keys.mjs pass can fill them in.
Default behavior is unchanged: the toggle defaults off, so accounts
that have not opted in keep their existing claude request flow. Even
when enabled, Anthropic enforces subscription tier and Fast Mode credit
balance server-side — the toggle may still surface out_of_credits or a
similar 429 from upstream.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to PR diegosouzapw#2440 review. Today the toggle is a single boolean that injects service_tier=priority globally for every Codex request. Two small extensions that came up: - Tier dropdown (default / priority / flex). Default = no override; flex routes the request through OpenAI's lower-priority queue and is cheap enough that some users want to opt in globally for batch/eval work. - Per-model gate. The toggle's intent is "Fast tier for the models that actually support it" — gpt-5.5 and gpt-5.4 per OpenAI's models_cache.json (service_tiers: priority). Other Codex slugs were silently receiving the service_tier header and getting a tier-related error from OpenAI. The checkbox list lets users curate the supported set without code changes when a future Codex release adds Fast eligibility to more slugs. Schema (settingsSchemas.ts): codexServiceTier: z.object({ enabled: z.boolean().optional(), tier: z.enum(["default", "priority", "flex"]).optional(), supportedModels: z.array(z.string()).optional(), }).optional() Back-compat: rows with just `{ enabled: true }` from PR diegosouzapw#2440 still work — resolveCodexGlobalFastServiceTier() defaults `tier` to "priority" and `supportedModels` to ["gpt-5.5", "gpt-5.4"] when those fields are absent. The legacy boolean shape and the older `codexFastServiceTier: true` flag are also still honored. Middleware (open-sse/handlers/chatCore.ts): - applyCodexGlobalFastServiceTier now takes an optional { model, body }. - Gate: skip injection when the request's target model does not match the supportedModels prefix list (case-insensitive). Calls without a model argument keep the original behavior, so any other call sites stay safe. - For tier=flex, the helper writes body.service_tier directly because requestDefaults goes through normalizeCodexServiceTier which only canonicalizes priority/fast. Per-connection requestDefaults.serviceTier still wins over the global override. UI (CodexFastTierTab.tsx): - Top-level boolean toggle stays unchanged. - When enabled: tier <Select> and a collapsible "Applied to models" list with checkboxes. Initial selection matches the Fast-eligible catalog; users can add slugs by storing them in supportedModels. i18n: en/fr/es/de hand-translated. The other 38 locales get the standard __MISSING__: sentinel so scripts/i18n/sync-ui-keys.mjs can fill them in a subsequent translator pass — same pattern PR diegosouzapw#2440 used. Default behavior is unchanged: existing connections keep their requestDefaults.serviceTier precedence, the toggle still defaults off, and old enabled=true rows continue to inject service_tier=priority for gpt-5.5 / gpt-5.4. Happy to revise if you'd prefer a different shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace all Database.Database and import("better-sqlite3").Database
references in migrationRunner.ts, stats.ts, jsonMigration.ts, and
compliance/index.ts with SqliteAdapter from db/adapters/types.ts.
Also removes the now-unnecessary @ts-expect-error comment on
db.immediate() in migrationRunner.ts.
…#2448) Replace compile-time-only 'as string' casts with runtime String() conversions and instanceof Date check. gray-matter parses unquoted YAML dates as Date objects, causing 'Objects are not valid as a React child' crashes on docs pages. Co-authored-by: Mr. Meowgi <ovehbe@users.noreply.github.com>
Repeat markup stripping until the input stabilizes so malformed or nested tags do not survive a single replacement pass. This prevents incomplete sanitization from leaving tag fragments in removed text and improves i18n key candidate extraction from diffs.
…, i18n llm.txt sync
Integrated into release/v3.8.1
…diegosouzapw#2452) Integrated into release/v3.8.1
…uzapw#2457) Integrated into release/v3.8.1
Close two defense-in-depth gaps surfaced by the v3.8.1 release vulnerability scan: - Field-level credential encryption (src/lib/db/encryption.ts) and the CLI decrypt path (bin/cli/encryption.mjs) now pass authTagLength: 16 to every createDecipheriv call. Authentication was already enforced via setAuthTag + final(), but pinning the tag length rejects truncated GCM tags up front, closing the tag-truncation forgery vector (Semgrep gcm-no-tag-length). - MermaidDiagram switches securityLevel from "loose" to "strict" (Mermaid's default), so the rendered SVG is sanitized by Mermaid's bundled DOMPurify before the innerHTML assignment. Adds a regression test asserting decrypt() fails closed on a truncated auth tag.
Integrated into release/v3.8.1
… names, security hardening, multi-driver SQLite
…TaskId/ActivityId and fix URL hostname check in test (diegosouzapw#2461) Co-authored-by: diegosouzapw <diego.souza.pw@gmail.com>
- Handle all 3 DeepSeek SSE stream formats: initial fragments, APPEND operations, and bare string tokens (fixes truncated responses) - Simplify prompt builder to send system + last user message only (DeepSeek web API is single-turn, full history caused marker leakage) - Check json.code before token extraction (fixes "did not return access token: Authorization" on code 40003 with HTTP 200) - Clear session cache alongside token cache on auth errors - Add dev origin for remote testing Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Code Review
This pull request enhances the DeepSeek-Web executor by adding support for incremental content updates and bare string tokens in SSE streams. It also introduces a native prompt builder and improves error handling by validating response codes and clearing caches upon token expiration. Feedback focuses on correcting a greedy regular expression in the prompt builder and ensuring consistent cache key usage when evicting session entries.
- Use non-greedy quantifiers in markdown image strip regex to avoid eating text between multiple images on the same line - Use accessToken as fallback key for sessionCache.delete to match the key logic in createSession (connectionId || accessToken) Co-authored-by: Cursor <cursoragent@cursor.com>
| } | ||
|
|
||
| // Extract content from DeepSeek fragments | ||
| const fragments = (data as any)?.v?.response?.fragments; |
There was a problem hiding this comment.
WARNING: Potential double-processing of SSE fragments. The condition at line 198 checks for data?.v?.response?.fragments (nested format), while line 216 checks for data?.p === "response/fragments" with Array.isArray(data?.v). If a response has both structures, the same fragments would be processed and emitted twice, causing duplicate content in the stream. Consider adding an else if or a guard to prevent double-processing.
| emittedRole = true; | ||
| chunk({ role: "assistant", content: "" }); | ||
| } | ||
| for (const frag of (data as any).v) { |
There was a problem hiding this comment.
WARNING: The iteration at line 221 assumes data.v is Frag[] when p === "response/fragments". However, the preceding condition at line 198 checks for Array.isArray(data?.v?.response?.fragments) — meaning data.v could be an object with a nested response property rather than a direct array. If both conditions match the same payload, iterating data.v directly could fail or produce unexpected results.
| signal: signal ?? undefined, | ||
| }); | ||
|
|
||
| if (resp.status === 401 || resp.status === 403) { |
There was a problem hiding this comment.
SUGGESTION: This HTTP 401/403 check before !resp.ok is redundant — these status codes will always satisfy !resp.ok. The early throw is useful to surface a specific error message, but the subsequent !resp.ok block (lines 403-404) would also be reached. Consider whether the specific message here adds value over the generic users/current HTTP ${resp.status} error below, or consolidate into a single error-handling path.
| if (!token) { | ||
| throw new Error(`No token in users/current response: code=${json?.code} msg=${json?.msg}`); | ||
| if (json?.code && json.code !== 0) { | ||
| const errMsg = json.msg || json?.data?.biz_msg || `error code ${json.code}`; |
There was a problem hiding this comment.
WARNING: If both json.msg and json?.data?.biz_msg are undefined/null, the fallback template literal produces error code undefined instead of a meaningful message. Add a final fallback: json.msg || json?.data?.biz_msg || json?.code || "Unknown error".
| Referer: `${DEEPSEEK_WEB_BASE}/`, | ||
| Authorization: `Bearer ${accessToken}`, | ||
| "X-Ds-Pow-Response": powAnswer, | ||
| "X-Client-Timezone-Offset": String(new Date().getTimezoneOffset() * -60), |
There was a problem hiding this comment.
SUGGESTION: getTimezoneOffset() is deprecated in modern JavaScript. Consider using Intl.DateTimeFormat().resolvedOptions().timeZone or a timezone offset calculation that doesn't rely on the deprecated method. While it still works, it may be removed in future Node.js versions.
|
|
||
| export const deepseekWebExecutor = new DeepSeekWebExecutor(); | ||
|
|
||
| // Re-export for auto-refresh executor and tests |
There was a problem hiding this comment.
SUGGESTION: Re-exporting tokenCache and sessionCache (module-level mutable state) as public exports exposes internal implementation details that could be mutated by consumers. Consider keeping these private and exposing only controlled operations (e.g., clearCaches() function) if external cache invalidation is needed. Re-exporting acquireAccessToken for testing is acceptable via @ts-expect-error or test-specific patterns.
| @@ -514,311 +466,189 @@ test("isSessionValid starts false", () => { | |||
| // ─── Abort handling ────────────────────────────────────────────────────── | |||
|
|
|||
| test("execute: handles abort signal gracefully", async () => { | |||
| const dsMod4 = await import("../../open-sse/executors/deepseek-web.ts"); | |||
| if (dsMod4.tokenCache) dsMod4.tokenCache.clear(); | |||
| const executor = new DeepSeekWebExecutor(); | |||
| const controller = new AbortController(); | |||
| controller.abort(); | |||
| const result = await executor.execute({ | |||
| model: "default", | |||
There was a problem hiding this comment.
WARNING: Test assertion changed from assert.equal(capturedBody.model_type, "deepseek_r1") to assert.equal(capturedBody.model_type, "default") for the R1 model test. Under the new resolveModelOptions() logic, m.includes("r1") no longer sets modelType = "deepseek_r1" — it only sets thinkingEnabled = true. The model_type is now always "expert" or "default". This changes the API behavior sent to DeepSeek. Verify that DeepSeek's /chat/completion endpoint accepts model_type: "default" for R1 models with thinking_enabled: true.
| } | ||
| return new Response("", { status: 404 }); | ||
| }; | ||
| const mock = await mockDeepSeekFlow(); | ||
| try { |
There was a problem hiding this comment.
WARNING: Test assertion changed from assert.equal(capturedBody.thinking_enabled, true) to assert.equal(capturedBody.thinking_enabled, false) for the expert model test. The model name "expert" contains neither "r1" nor "think" nor "reason", so thinkingEnabled is false under the new logic. This is a behavioral change — verify that expert models should default to thinking disabled, or that users must explicitly enable thinking via thinking_enabled: true in the body.
Code Review SummaryStatus: 8 Issues Found | Recommendation: Address before merge Overview
Issue Details (click to expand)WARNING
SUGGESTION
Files Reviewed (2 files)
Reviewed by qwen3.6-plus · 553,216 tokens |
Summary
json.codebefore token extraction (fixes "did not return access token: Authorization" on code 40003 with HTTP 200)Related Issues
Validation
npm run lintnpm run typecheck:corenpm run check:cyclesnpm run check:any-budget:t11Tests Added Or Updated
tests/unit/deepseek-web.test.ts— Updated prompt assertion for new single-turn prompt builder.Coverage Notes
open-sse/executors/deepseek-web.ts— SSE parser changes covered by existing streaming test (mocked flow test verifies content extraction). Error code handling covered by 40003 INVALID_TOKEN test.Reviewer Notes
{"p":"response/fragments/-1/content","o":"APPEND","v":"word"}), and bare string tokens ({"v":"word"}).<|User|>/<|Assistant|>markers caused the model to echo markers as literal text. Now sends only system prompt + last user message.192.168.0.111toallowedDevOriginsfor remote dev testing — can be removed before release if not needed.Made with Cursor