Skip to content

fix(deepseek-web): fix SSE parser, prompt format, and error handling#2502

Open
ovehbe wants to merge 48 commits into
diegosouzapw:release/v3.8.2from
ovehbe:fix/deepseek-web-sse-parser
Open

fix(deepseek-web): fix SSE parser, prompt format, and error handling#2502
ovehbe wants to merge 48 commits into
diegosouzapw:release/v3.8.2from
ovehbe:fix/deepseek-web-sse-parser

Conversation

@ovehbe
Copy link
Copy Markdown

@ovehbe ovehbe commented May 21, 2026

Summary

  • Handle all 3 DeepSeek SSE stream formats: initial fragments, APPEND operations, and bare string tokens (fixes truncated responses on expert/pro models)
  • Simplify prompt builder to send system + last user message only (DeepSeek web API is single-turn, full history caused chat marker leakage in responses)
  • Check json.code before token extraction (fixes "did not return access token: Authorization" on code 40003 with HTTP 200)
  • Clear session cache alongside token cache on auth errors

Related Issues

Validation

  • npm run lint
  • npm run typecheck:core
  • npm run check:cycles
  • npm run check:any-budget:t11
  • Unit tests: 24/24 pass

Tests Added Or Updated

  • tests/unit/deepseek-web.test.ts — Updated prompt assertion for new single-turn prompt builder.

Coverage Notes

  • open-sse/executors/deepseek-web.ts — SSE parser changes covered by existing streaming test (mocked flow test verifies content extraction). Error code handling covered by 40003 INVALID_TOKEN test.

Reviewer Notes

  • SSE parser: DeepSeek sends 3 different stream event formats depending on model type. The original parser only handled format 1 (initial fragments), causing expert/pro models to return truncated single-word responses. Now handles all 3: initial fragments, APPEND operations ({"p":"response/fragments/-1/content","o":"APPEND","v":"word"}), and bare string tokens ({"v":"word"}).
  • Prompt format: DeepSeek's web API is single-turn — sending full conversation history with <|User|> / <|Assistant|> markers caused the model to echo markers as literal text. Now sends only system prompt + last user message.
  • next.config.mjs: Added 192.168.0.111 to allowedDevOrigins for remote dev testing — can be removed before release if not needed.

Made with Cursor

diegosouzapw and others added 30 commits May 20, 2026 11:11
…auth gaps, error sanitization, resilience

- plugin.mjs: replace execSync string-template with spawnSync arg-array (command injection)
- electron/main.js: gate unsafe-eval on !app.isPackaged instead of URL substring; merge duplicate connect-src
- budget/bulk, resilience/reset: add requireManagementAuth guard
- gemini-web, claude-web, copilot-web executors: route catch-block messages through sanitizeErrorMessage
- oauth route, agents/tasks routes: route catch-block errors through sanitizeErrorMessage
- codex.ts: return null (not error-object) from refreshCredentials on failure
- tokenRefresh.ts: safe unknown-error access in catch block
- combo.ts: reset exhaustedProviders per set-retry iteration so providers get a second chance
- circuitBreaker.ts: persist and restore lastFailureKind via options JSON column
…souzapw#2433)

Integrated into release/v3.8.1 — applied review suggestion (deduplicate removeHeaderCaseInsensitive)
…diegosouzapw#2407)

openAIToClaude now extracts developer-role messages into systemParts
(alongside system role) and filters them from nonSystemMessages, so
identity context injected via the Responses API developer role reaches
the Claude system field instead of silently becoming an assistant turn.
…-up turns

- map is_error to status "error"/"success" instead of hardcoding "success"
- add serializeToolResultContent() to handle image/JSON content blocks;
  avoids sending text:\"\" which Kiro rejects as improperly formed
- use deterministic uuidv5 for toolUseId when tool_call has no id, preventing
  id mismatch between assistant toolUse and subsequent tool_result

Closes diegosouzapw#2446
…diegosouzapw#2447)

fix(translator): fix Kiro 400 on follow-up turns carrying tool_result
Adds an opt-in toggle for Anthropic Fast Mode (speed:"fast") on the
Settings > AI page, mirroring the existing CodexFastTierTab pattern. The
behavior pairs with a CLIProxyAPI build that opts SDK-shaped traffic into
the Fast Mode entrypoint; OmniRoute signals the intent via a single
outbound header. Builds of CPA that do not recognize the header forward
it harmlessly.

- New helper: src/lib/providers/claudeFastMode.ts exposes
  isClaudeFastModeEnabled / getClaudeFastModeSupportedModels /
  shouldRequestClaudeFastMode plus the CPA_FORCE_FAST_MODE_HEADER
  constant ("X-CPA-Force-Fast-Mode").
- New Card component: ClaudeFastModeTab.tsx — same Card+Toggle shape
  as CodexFastTierTab.tsx, with a collapsible per-model checkbox group
  (defaults to claude-opus-4-7 and claude-opus-4-6, matching the binary
  KT() gate observed in claude-code v2.1.145).
- Schema: claudeFastMode { enabled?, supportedModels? } added to
  updateSettingsSchema and to the getSettings() defaults.
- Middleware: buildUpstreamHeadersForExecute in chatCore.ts now sets
  X-CPA-Force-Fast-Mode: 1 when provider === "claude" and the model
  matches the configured supportedModels list.
- i18n: 6 new keys added with hand translations for en/fr/es/de and
  __MISSING__: sentinels for the remaining 38 locales so a subsequent
  scripts/i18n/sync-ui-keys.mjs pass can fill them in.

Default behavior is unchanged: the toggle defaults off, so accounts
that have not opted in keep their existing claude request flow. Even
when enabled, Anthropic enforces subscription tier and Fast Mode credit
balance server-side — the toggle may still surface out_of_credits or a
similar 429 from upstream.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to PR diegosouzapw#2440 review. Today the toggle is a single boolean that
injects service_tier=priority globally for every Codex request. Two small
extensions that came up:

- Tier dropdown (default / priority / flex). Default = no override; flex
  routes the request through OpenAI's lower-priority queue and is cheap
  enough that some users want to opt in globally for batch/eval work.
- Per-model gate. The toggle's intent is "Fast tier for the models that
  actually support it" — gpt-5.5 and gpt-5.4 per OpenAI's models_cache.json
  (service_tiers: priority). Other Codex slugs were silently receiving the
  service_tier header and getting a tier-related error from OpenAI. The
  checkbox list lets users curate the supported set without code changes
  when a future Codex release adds Fast eligibility to more slugs.

Schema (settingsSchemas.ts):

  codexServiceTier: z.object({
    enabled: z.boolean().optional(),
    tier: z.enum(["default", "priority", "flex"]).optional(),
    supportedModels: z.array(z.string()).optional(),
  }).optional()

Back-compat: rows with just `{ enabled: true }` from PR diegosouzapw#2440 still work —
resolveCodexGlobalFastServiceTier() defaults `tier` to "priority" and
`supportedModels` to ["gpt-5.5", "gpt-5.4"] when those fields are absent.
The legacy boolean shape and the older `codexFastServiceTier: true` flag
are also still honored.

Middleware (open-sse/handlers/chatCore.ts):
- applyCodexGlobalFastServiceTier now takes an optional { model, body }.
- Gate: skip injection when the request's target model does not match the
  supportedModels prefix list (case-insensitive). Calls without a model
  argument keep the original behavior, so any other call sites stay safe.
- For tier=flex, the helper writes body.service_tier directly because
  requestDefaults goes through normalizeCodexServiceTier which only
  canonicalizes priority/fast. Per-connection requestDefaults.serviceTier
  still wins over the global override.

UI (CodexFastTierTab.tsx):
- Top-level boolean toggle stays unchanged.
- When enabled: tier <Select> and a collapsible "Applied to models" list
  with checkboxes. Initial selection matches the Fast-eligible catalog;
  users can add slugs by storing them in supportedModels.

i18n: en/fr/es/de hand-translated. The other 38 locales get the standard
__MISSING__: sentinel so scripts/i18n/sync-ui-keys.mjs can fill them in a
subsequent translator pass — same pattern PR diegosouzapw#2440 used.

Default behavior is unchanged: existing connections keep their
requestDefaults.serviceTier precedence, the toggle still defaults off,
and old enabled=true rows continue to inject service_tier=priority for
gpt-5.5 / gpt-5.4.

Happy to revise if you'd prefer a different shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace all Database.Database and import("better-sqlite3").Database
references in migrationRunner.ts, stats.ts, jsonMigration.ts, and
compliance/index.ts with SqliteAdapter from db/adapters/types.ts.
Also removes the now-unnecessary @ts-expect-error comment on
db.immediate() in migrationRunner.ts.
…#2448)

Replace compile-time-only 'as string' casts with runtime String() conversions
and instanceof Date check. gray-matter parses unquoted YAML dates as Date
objects, causing 'Objects are not valid as a React child' crashes on docs pages.

Co-authored-by: Mr. Meowgi <ovehbe@users.noreply.github.com>
diegosouzapw and others added 17 commits May 20, 2026 19:18
Repeat markup stripping until the input stabilizes so malformed or
nested tags do not survive a single replacement pass.

This prevents incomplete sanitization from leaving tag fragments in
removed text and improves i18n key candidate extraction from diffs.
Close two defense-in-depth gaps surfaced by the v3.8.1 release vulnerability scan:

- Field-level credential encryption (src/lib/db/encryption.ts) and the CLI
  decrypt path (bin/cli/encryption.mjs) now pass authTagLength: 16 to every
  createDecipheriv call. Authentication was already enforced via setAuthTag +
  final(), but pinning the tag length rejects truncated GCM tags up front,
  closing the tag-truncation forgery vector (Semgrep gcm-no-tag-length).
- MermaidDiagram switches securityLevel from "loose" to "strict" (Mermaid's
  default), so the rendered SVG is sanitized by Mermaid's bundled DOMPurify
  before the innerHTML assignment.

Adds a regression test asserting decrypt() fails closed on a truncated auth tag.
… names, security hardening, multi-driver SQLite
…TaskId/ActivityId and fix URL hostname check in test (diegosouzapw#2461)

Co-authored-by: diegosouzapw <diego.souza.pw@gmail.com>
- Handle all 3 DeepSeek SSE stream formats: initial fragments,
  APPEND operations, and bare string tokens (fixes truncated responses)
- Simplify prompt builder to send system + last user message only
  (DeepSeek web API is single-turn, full history caused marker leakage)
- Check json.code before token extraction (fixes "did not return
  access token: Authorization" on code 40003 with HTTP 200)
- Clear session cache alongside token cache on auth errors
- Add dev origin for remote testing

Co-authored-by: Cursor <cursoragent@cursor.com>
@ovehbe ovehbe requested a review from diegosouzapw as a code owner May 21, 2026 15:27
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the DeepSeek-Web executor by adding support for incremental content updates and bare string tokens in SSE streams. It also introduces a native prompt builder and improves error handling by validating response codes and clearing caches upon token expiration. Feedback focuses on correcting a greedy regular expression in the prompt builder and ensuring consistent cache key usage when evicting session entries.

Comment thread open-sse/executors/deepseek-web.ts Outdated
Comment thread open-sse/executors/deepseek-web.ts Outdated
Comment thread open-sse/executors/deepseek-web.ts Outdated
@ovehbe ovehbe changed the base branch from release/v3.8.1 to main May 21, 2026 15:30
- Use non-greedy quantifiers in markdown image strip regex to avoid
  eating text between multiple images on the same line
- Use accessToken as fallback key for sessionCache.delete to match
  the key logic in createSession (connectionId || accessToken)

Co-authored-by: Cursor <cursoragent@cursor.com>
}

// Extract content from DeepSeek fragments
const fragments = (data as any)?.v?.response?.fragments;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Potential double-processing of SSE fragments. The condition at line 198 checks for data?.v?.response?.fragments (nested format), while line 216 checks for data?.p === "response/fragments" with Array.isArray(data?.v). If a response has both structures, the same fragments would be processed and emitted twice, causing duplicate content in the stream. Consider adding an else if or a guard to prevent double-processing.

emittedRole = true;
chunk({ role: "assistant", content: "" });
}
for (const frag of (data as any).v) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: The iteration at line 221 assumes data.v is Frag[] when p === "response/fragments". However, the preceding condition at line 198 checks for Array.isArray(data?.v?.response?.fragments) — meaning data.v could be an object with a nested response property rather than a direct array. If both conditions match the same payload, iterating data.v directly could fail or produce unexpected results.

signal: signal ?? undefined,
});

if (resp.status === 401 || resp.status === 403) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION: This HTTP 401/403 check before !resp.ok is redundant — these status codes will always satisfy !resp.ok. The early throw is useful to surface a specific error message, but the subsequent !resp.ok block (lines 403-404) would also be reached. Consider whether the specific message here adds value over the generic users/current HTTP ${resp.status} error below, or consolidate into a single error-handling path.

if (!token) {
throw new Error(`No token in users/current response: code=${json?.code} msg=${json?.msg}`);
if (json?.code && json.code !== 0) {
const errMsg = json.msg || json?.data?.biz_msg || `error code ${json.code}`;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: If both json.msg and json?.data?.biz_msg are undefined/null, the fallback template literal produces error code undefined instead of a meaningful message. Add a final fallback: json.msg || json?.data?.biz_msg || json?.code || "Unknown error".

Referer: `${DEEPSEEK_WEB_BASE}/`,
Authorization: `Bearer ${accessToken}`,
"X-Ds-Pow-Response": powAnswer,
"X-Client-Timezone-Offset": String(new Date().getTimezoneOffset() * -60),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION: getTimezoneOffset() is deprecated in modern JavaScript. Consider using Intl.DateTimeFormat().resolvedOptions().timeZone or a timezone offset calculation that doesn't rely on the deprecated method. While it still works, it may be removed in future Node.js versions.


export const deepseekWebExecutor = new DeepSeekWebExecutor();

// Re-export for auto-refresh executor and tests
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTION: Re-exporting tokenCache and sessionCache (module-level mutable state) as public exports exposes internal implementation details that could be mutated by consumers. Consider keeping these private and exposing only controlled operations (e.g., clearCaches() function) if external cache invalidation is needed. Re-exporting acquireAccessToken for testing is acceptable via @ts-expect-error or test-specific patterns.

@@ -514,311 +466,189 @@ test("isSessionValid starts false", () => {
// ─── Abort handling ──────────────────────────────────────────────────────

test("execute: handles abort signal gracefully", async () => {
const dsMod4 = await import("../../open-sse/executors/deepseek-web.ts");
if (dsMod4.tokenCache) dsMod4.tokenCache.clear();
const executor = new DeepSeekWebExecutor();
const controller = new AbortController();
controller.abort();
const result = await executor.execute({
model: "default",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Test assertion changed from assert.equal(capturedBody.model_type, "deepseek_r1") to assert.equal(capturedBody.model_type, "default") for the R1 model test. Under the new resolveModelOptions() logic, m.includes("r1") no longer sets modelType = "deepseek_r1" — it only sets thinkingEnabled = true. The model_type is now always "expert" or "default". This changes the API behavior sent to DeepSeek. Verify that DeepSeek's /chat/completion endpoint accepts model_type: "default" for R1 models with thinking_enabled: true.

}
return new Response("", { status: 404 });
};
const mock = await mockDeepSeekFlow();
try {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Test assertion changed from assert.equal(capturedBody.thinking_enabled, true) to assert.equal(capturedBody.thinking_enabled, false) for the expert model test. The model name "expert" contains neither "r1" nor "think" nor "reason", so thinkingEnabled is false under the new logic. This is a behavioral change — verify that expert models should default to thinking disabled, or that users must explicitly enable thinking via thinking_enabled: true in the body.

@kilo-code-bot
Copy link
Copy Markdown

kilo-code-bot Bot commented May 21, 2026

Code Review Summary

Status: 8 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 5
SUGGESTION 3
Issue Details (click to expand)

WARNING

File Line Issue
open-sse/executors/deepseek-web.ts 198 Potential double-processing of SSE fragments — conditions at line 198 and 216 could both match the same payload
open-sse/executors/deepseek-web.ts 221 Iterating data.v assumes it's Frag[], but line 198's check implies data.v could be an object with nested response property
open-sse/executors/deepseek-web.ts 409 Error message fallback could produce error code undefined if both json.msg and json?.data?.biz_msg are missing
tests/unit/deepseek-web.test.ts 475 R1 model test assertion changed from model_type: "deepseek_r1" to "default" — verify DeepSeek accepts this
tests/unit/deepseek-web.test.ts 574 Expert model test assertion changed from thinking_enabled: true to false — verify this is intended behavior

SUGGESTION

File Line Issue
open-sse/executors/deepseek-web.ts 400 HTTP 401/403 check before !resp.ok is redundant; consider consolidating error handling
open-sse/executors/deepseek-web.ts 571 getTimezoneOffset() is deprecated in modern JavaScript
open-sse/executors/deepseek-web.ts 727 Re-exporting module-level mutable state (tokenCache, sessionCache) exposes internals to consumers
Files Reviewed (2 files)
  • open-sse/executors/deepseek-web.ts — 7 issues (SSE parser changes, error handling, cache management, header format, re-exports)
  • tests/unit/deepseek-web.test.ts — 2 issues (behavioral changes in test assertions for R1 and expert models)

Reviewed by qwen3.6-plus · 553,216 tokens

@diegosouzapw diegosouzapw changed the base branch from main to release/v3.8.2 May 21, 2026 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants