Skip to content

fix(translate): prevent content loss in long-form translation#169

Open
luandro wants to merge 18 commits intomainfrom
issue-166
Open

fix(translate): prevent content loss in long-form translation#169
luandro wants to merge 18 commits intomainfrom
issue-166

Conversation

@luandro
Copy link
Contributor

@luandro luandro commented Mar 19, 2026

Problem

Closes #166

Long-form Notion pages (troubleshooting, create/edit observation, view/edit track, understanding exchange, etc.) were silently dropping sections during automatic translation. The root cause: feeding very large chunks (up to 500 K chars) to the model saturated its effective attention window, causing it to omit headings and paragraphs without raising any error.

Solution

Two complementary mechanisms:

1. Proactive aggressive chunking

Lowered TRANSLATION_CHUNK_MAX_CHARS from 500,000 → 120,000 chars. Each translation request now stays well within the model's reliable attention range, reducing the chance of content being dropped in the first place.

2. Structural completeness validation + retry

After every translation call, the result is compared against the source using structural markers:

  • Heading count (any heading loss triggers a retry)
  • Fenced code blocks
  • Bullet / numbered lists (threshold: ≥ 3 items)
  • Table lines
  • Severe length shrinkage (< 55% of source length when source ≥ 4,000 chars)

If incompleteness is detected, the chunk limit is halved and the translation is retried (up to TRANSLATION_COMPLETENESS_MAX_RETRIES = 2 times, with a floor of TRANSLATION_MIN_CHUNK_MAX_CHARS = 8,000 chars). After exhausting retries a non-critical error is surfaced so the pipeline can log and continue.

Test plan

  • New: chunks long-form content proactively below model-derived maximums
  • New: retries with smaller chunks when a valid response omits a section
  • New: fails (non-critically) when repeated completeness retries still return incomplete content
  • New: treats heavy structural shrinkage as incomplete translation
  • New: preserves complete heading structures when chunking by sections
  • Existing: token overflow errors still classified as non-critical token_overflow
  • Existing: single-call fast path for small content
  • Existing: chunks large content and calls API once per chunk
  • All tests pass (bunx vitest run scripts/notion-translate/translateFrontMatter.test.ts)

Long-form Notion pages (troubleshooting, create/edit observation, etc.)
were silently dropping sections during automatic translation. The likely
cause: feeding very large chunks to the model saturated its effective
attention window, causing it to omit headings and paragraphs without
raising an error.

Changes:
- Lower proactive chunk ceiling from 500 K → 120 K chars so each
  translation request stays well within reliable model attention range
- Add structural completeness validation after every translation call:
  checks heading count, fenced code blocks, bullet/numbered lists,
  table lines, and severe length shrinkage (< 55 % of source)
- Retry with progressively smaller chunks (halved each attempt, floor
  8 K chars) up to TRANSLATION_COMPLETENESS_MAX_RETRIES (2) times when
  incompleteness is detected, then surface a non-critical error

Closes #166
greptile-apps[bot]

This comment was marked as off-topic.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 19, 2026

🐳 Docker Image Published

Your Docker image has been built and pushed for this PR.

Image Reference: docker.io/communityfirst/comapeo-docs-api:pr-169

Platforms: linux/amd64, linux/arm64

Testing

To test this image:

docker pull docker.io/communityfirst/comapeo-docs-api:pr-169
docker run -p 3001:3001 docker.io/communityfirst/comapeo-docs-api:pr-169

Built with commit d15da43

@luandro
Copy link
Contributor Author

luandro commented Mar 19, 2026

@codex review

@kilo-code-bot
Copy link
Contributor

kilo-code-bot bot commented Mar 19, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Overview

This increment adds frontmatter integrity failure to the recoverable retry conditions, along with test coverage for the retry behavior.

Severity Count
CRITICAL 0
WARNING 0
SUGGESTION 0
Changes in This Increment
File Change
scripts/notion-translate/translateFrontMatter.ts Extended recoverable error conditions to include schema_invalid errors with "Frontmatter integrity check failed" message, enabling retries when frontmatter fields are dropped
scripts/notion-translate/translateFrontMatter.test.ts Added test for frontmatter integrity retry behavior - first call drops slug, retry preserves it
Verification

The source code change at line ~1153 adds a second condition to isRecoverableCompletenessFailure:

  • error.code === "schema_invalid" && /Frontmatter integrity check failed/.test(error.message)
  • This enables the retry mechanism when assertFrontmatterIntegrity() detects dropped critical fields (slug, sidebar_position, sidebar_label, id, title)
  • The test validates this by mocking a first response that drops slug, then a retry that preserves it
Previously Resolved Issues (carried forward)
File Line Issue Resolution
scripts/notion-translate/translateFrontMatter.ts ~310 Fenced code regex doesn't handle indented fences ✅ Fixed - now uses CommonMark-compliant tracking with up to 3 leading spaces
scripts/notion-translate/translateFrontMatter.ts ~739 Heading loss check is too permissive ✅ Fixed - now uses strict < comparison (no tolerance)
scripts/notion-translate/translateFrontMatter.ts ~893 JSON truncation not handled ✅ Fixed - catches finish_reason:length before JSON.parse and converts to token_overflow
Files Reviewed (2 files)
  • scripts/notion-translate/translateFrontMatter.ts - Extended retry logic verified
  • scripts/notion-translate/translateFrontMatter.test.ts - Test coverage verified

@github-actions
Copy link
Contributor

github-actions bot commented Mar 19, 2026

🚀 Preview Deployment

Your documentation preview is ready!

Preview URL: https://pr-169.comapeo-docs.pages.dev

🔄 Content: Regenerated 5 pages from Notion (script changes detected)

💡 Tip: Add label fetch-all-pages to test with full content, or fetch-10-pages for broader coverage.

This preview will update automatically when you push new commits to this PR.


Built with commit d15da43

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 33c1581bba

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@luandro
Copy link
Contributor Author

luandro commented Mar 19, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 33c1581bba

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

… overhead

P1: TRANSLATION_COMPLETENESS_MAX_RETRIES was 2, so halving from 120k only
    reached 60k → 30k before giving up. Reaching the 8k floor requires 4
    halvings (120k→60k→30k→15k→8k), so raise the constant to 4.

P2: getChunkContentBudget was flooring the *content* budget at
    TRANSLATION_MIN_CHUNK_MAX_CHARS (8k), ignoring prompt overhead (~2.6k).
    This made the actual request larger than the documented 8k minimum.
    Fix: subtract overhead from the total limit and floor the content budget
    at 1; the 8k total-request floor is already enforced by the retry caller.

Update the "preserves heading structures" test to use a chunkLimit that
reflects a realistic total-request budget (3_200 chars) rather than a
raw content size (500 chars), which the old incorrect floor had masked.
greptile-apps[bot]

This comment was marked as off-topic.

@luandro
Copy link
Contributor Author

luandro commented Mar 19, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fd3b4d24b7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

collectMarkdownStructureMetrics only matched ATX headings (# Heading).
CommonMark/Docusaurus also accept setext headings (underline with === or ---).
If the model reformats a heading into setext style the count would drop and
translateText would incorrectly treat the translation as incomplete.

Add a multiline regex for setext headings and include them in headingCount.
greptile-apps[bot]

This comment was marked as off-topic.

@luandro
Copy link
Contributor Author

luandro commented Mar 19, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c36dfe9b16

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

The previous regex /^.+\n[=\-]{2,}\s*/gm matched any non-empty line
followed by --- or ===, which also matches list items before thematic
breaks (e.g. "- Item\n---"). This caused isSuspiciouslyIncompleteTranslation
to count spurious headings in the source and falsely flag complete
translations as incomplete.

Fix: only match === underlines (setext H1). The = character has no other
CommonMark meaning, so this is unambiguous. Setext H2 (--- underline) is
skipped because it cannot be distinguished from a thematic break without
a full parser. Notion content uses ATX headings exclusively anyway.
greptile-apps[bot]

This comment was marked as off-topic.

@luandro
Copy link
Contributor Author

luandro commented Mar 19, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4459a152a0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

… check

Setext H2 headings (Heading\n---): re-introduce detection with a negative
lookahead that excludes lines starting with list markers or block-level
prefixes, which avoids the thematic-break false-positive while still
catching genuine section headings.

Admonitions (:::type … :::): Docusaurus callout blocks can be silently
dropped by the model without triggering any of the existing checks.
Count opening+closing ::: pairs (like fenced code blocks) and treat
a drop in admonitionCount as an incompleteness signal.
greptile-apps[bot]

This comment was marked as off-topic.

@luandro
Copy link
Contributor Author

luandro commented Mar 19, 2026

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ecdf12235a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…ection

P1 — table detection: the previous regex only matched GFM table rows with
outer pipes (| A | B |). Models sometimes emit pipeless form (A | B | C).
Switch to matching GFM table *separator* rows instead — these are the
unambiguous per-spec indicator of a table and work regardless of outer-pipe
style. Threshold lowered to 1 separator (from 2 data lines). Regex uses a
simple character-class + .filter() to avoid ReDoS-unsafe nested quantifiers.

P2 — fenced code content: structural markers inside fenced code blocks
(headings, list items, table rows, admonitions) were counted as real
document structure. Strip fenced block interiors before running all
regex checks so that code samples do not inflate source counts and
cause false-positive incompleteness failures.
greptile-apps[bot]

This comment was marked as off-topic.

@luandro
Copy link
Contributor Author

luandro commented Mar 19, 2026

@codex review

@chatgpt-codex-connector

This comment was marked as off-topic.

Fixes a TypeScript 'includes does not exist on type never' error and allows up to 3 spaces of optional indentation for fenced code blocks in stripFencedCodeContent.

Co-authored-by: Junie <junie@jetbrains.com>
luandro added 2 commits March 19, 2026 21:49
Bullet lists inside YAML frontmatter (e.g. keywords lists) were being
counted as structural elements, causing false-positive incomplete
translation detections when the model reformatted them as inline arrays.

Strip frontmatter before collecting structure metrics so that only
translatable content body is evaluated.
…ce content

- Relax heading-loss threshold from strict (< source) to (< source - 1)
  so a single reformatted heading no longer triggers a spurious retry
- Buffer lines inside unclosed fenced code blocks and restore them on
  EOF instead of silently dropping remaining content, preventing false
  positive completeness failures on malformed markdown
- Update retries/failure tests to use 4-section documents that still
  demonstrate retry behaviour under the new heading tolerance
Copy link

@capy-ai capy-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 1 comment

@tomasciccola
Copy link
Contributor

Ok, I did a first round of review and found what I think its a bug. To reproduce:

  1. Obtain a long content document. For this, since fetching docs was failing from the repo, I search the git history for the longest version of creating-a-new-observation.md and pulled the file
  2. I ran a simple script to read that file and translated
import { translateText } from "./scripts/notion-translate/translateFrontMatter";
import { readFileSync, writeFileSync } from "node:fs";

const text = readFileSync("creating-a-new-observation.md").toString();
const { markdown, title } = await translateText(
  text,
  "Creating a new observation",
  "es-AR"
);

console.log(`translated with title ${title}`);
writeFileSync("obs_es.md", markdown);

The script throwed from scripts/notion-translate/translateFrontMatter.ts:215, where parseTranslationPayload lives, with the error TranslationError: OpenAI returned invalid JSON translation payload.
I logged the payload content before the error is thrown (the error basically happens on parsed = JSON.parse(content);).
What I saw is that the content is wrongly formed json, its seems its a chunked JSON with the shape

{
"markdown": "long formed content...."
}

but it doesn't end from the payload (there's no closing brace).

Potential Issue

The response is being chunked, but we parse the payload as if it where the full json response, hence the fail when parsing as JSON

The headingLoss condition tolerated losing one heading silently
(`< headingCount - 1`), weakening the regression guard for long-form
content loss. Align it with all other structural checks (zero-tolerance)
by changing to `< headingCount`.

Addresses review feedback on PR #169.
Copy link

@capy-ai capy-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 1 comment

luandro added 2 commits March 25, 2026 20:40
When OpenAI truncates output due to hitting the token budget it signals
finish_reason: "length". Previously the code would fall through and try
to parse truncated JSON, producing parse errors or garbage. Now it
throws a non-critical token_overflow TranslationError immediately, which
the existing translateChunkWithOverflowFallback retry path handles by
splitting the chunk and retrying.

Adds two tests: one verifying the error classification, one verifying
the end-to-end retry-with-smaller-chunks behaviour.
@luandro
Copy link
Contributor Author

luandro commented Mar 26, 2026

@tomasciccola

This is addressed now for the failure mode described there.

We now check choice.finish_reason before attempting to parse the JSON payload in scripts/notion-translate/translateFrontMatter.ts. If OpenAI returns finish_reason: "length", the code converts that into a non-critical token_overflow and retries with smaller chunks instead of falling through to JSON.parse on a partial payload.

Relevant points:

  • fix landed in 014bd81 and is still present in current HEAD 7168988
  • the finish_reason:length classification happens before JSON parsing
  • the fallback path retries with smaller chunks
  • there is explicit coverage in translateFrontMatter.test.ts for both classification of finish_reason:length and retrying with smaller chunks

I also re-ran the targeted finish_reason:length tests locally and they passed.

Copy link

@capy-ai capy-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 1 comment

luandro added 5 commits March 26, 2026 07:09
… misparse

The fence state machine now tracks the opening fence character and length
per CommonMark spec. A fence is only closed by a closing marker of the same
character with at least the opening length and no info string.

fix(translate): add frontmatter integrity check for critical Docusaurus fields

Added parseFrontmatterKeys() to extract top-level YAML keys and
assertFrontmatterIntegrity() to verify that translated markdown preserves
all frontmatter keys present in the source. Detects missing or unexpectedly
added critical fields (slug, sidebar_position, etc.) and fails the translation
with a non-critical error so retries are possible.

chore(changelog): scope Unreleased entries to translation-integrity work only

Removed unrelated entries from CHANGELOG.md to focus on translation-integrity
improvements. Kept only the entries relevant to this fix:
- Translation Completeness
- Long-form Content Translation
- Build Scripts
…g retry

assertFrontmatterIntegrity threw schema_invalid / isCritical:false and
its docstring promised the caller would retry, but translateText only
retried on unexpected_error + /incomplete/ messages.  Frontmatter
violations therefore bypassed all retry logic and aborted translation
immediately.

Extend isRecoverableCompletenessFailure to also match schema_invalid
errors whose message contains "Frontmatter integrity check failed", so
they share the same chunk-halving retry path as completeness failures.

Add a test that verifies a first-attempt frontmatter drop triggers a
retry and succeeds when the subsequent attempt preserves all keys.
@tomasciccola
Copy link
Contributor

I did another round of testing with the changes:
Translated create-a-new-observation.md (its probably not the latest version, but long enough, with 275 lines)
time bun run translation-test.ts

________________________________________________________
Executed in   20.66 mins    fish           external
   usr time    1.46 secs    0.21 millis    1.46 secs
   sys time    0.83 secs    1.25 millis    0.83 secs

as seen, the time to translate this long form article took 20mins, which is pretty long

Doing a smaller file yielded a translation in 5 seconds (4 line file)
I think we should consider viability of this, since translations could become a huge bottleneck. But for now I think this makes some sense.

Additionally, I'm wondering why the main translation function lives in translateFrontMatter.ts. The filename is misleading since it signals that this only translate the frontmatter metadata. So we should see how we can refactor this for maintainability; but maybe I'm missing something??

@tomasciccola
Copy link
Contributor

I did more thorough testing, trying to find where the bottleneck lies. It seems that the issue is wrong chunk which leads to repeated and unnecessary api calls. Each api call takes around a minute, and seeing how things are chunk, things should take around 2 API calls for a 300 line file.

Heres the output of the test I did, with an AI assisted analysis. If needed, I can also add the script I used to test this (but don't want to pollute the git history for now...)

📄 Input: 732043 chars (714.9 KB)
🖼️ Data URLs: 1 (total 710394 chars)
📄 After masking: 21687 chars (21.2 KB)

======================================================================
🚀 Starting translateText at 2026-03-26T18:46:35.741Z

🔵 [2026-03-26T18:46:35.744Z] API #1 start (input: 24.0 KB, model: deepseek-chat)
🟢 [2026-03-26T18:48:34.007Z] API #1 done in 118.3s (output: 10.4 KB, finish: length)

🔵 [2026-03-26T18:48:34.009Z] API #2 start (input: 10.5 KB, model: deepseek-chat)
🟢 [2026-03-26T18:50:04.645Z] API #2 done in 90.6s (output: 8.3 KB, finish: stop)

🔵 [2026-03-26T18:50:04.645Z] API #3 start (input: 3.4 KB, model: deepseek-chat)
🟢 [2026-03-26T18:50:13.311Z] API #3 done in 8.7s (output: 1.0 KB, finish: stop)

🔵 [2026-03-26T18:50:13.312Z] API #4 start (input: 10.0 KB, model: deepseek-chat)
🟢 [2026-03-26T18:52:08.110Z] API #4 done in 114.8s (output: 6.6 KB, finish: length)

🔵 [2026-03-26T18:52:08.110Z] API #5 start (input: 6.5 KB, model: deepseek-chat)
🟢 [2026-03-26T18:53:10.973Z] API #5 done in 62.9s (output: 3.9 KB, finish: stop)

🔵 [2026-03-26T18:53:10.974Z] API #6 start (input: 6.0 KB, model: deepseek-chat)
🟢 [2026-03-26T18:54:10.776Z] API #6 done in 59.8s (output: 3.6 KB, finish: stop)

🔵 [2026-03-26T18:54:10.777Z] API #7 start (input: 7.8 KB, model: deepseek-chat)
🟢 [2026-03-26T18:55:11.188Z] API #7 done in 60.4s (output: 5.7 KB, finish: stop)

🔵 [2026-03-26T18:55:11.192Z] API #8 start (input: 24.0 KB, model: deepseek-chat)
🟢 [2026-03-26T18:57:05.402Z] API #8 done in 114.2s (output: 10.4 KB, finish: length)

🔵 [2026-03-26T18:57:05.403Z] API #9 start (input: 10.5 KB, model: deepseek-chat)
🟢 [2026-03-26T18:58:33.692Z] API #9 done in 88.3s (output: 8.3 KB, finish: stop)

🔵 [2026-03-26T18:58:33.692Z] API #10 start (input: 3.4 KB, model: deepseek-chat)
🟢 [2026-03-26T18:58:41.704Z] API #10 done in 8.0s (output: 1.0 KB, finish: stop)

🔵 [2026-03-26T18:58:41.704Z] API #11 start (input: 10.0 KB, model: deepseek-chat)
🟢 [2026-03-26T19:00:30.773Z] API #11 done in 109.1s (output: 6.6 KB, finish: length)

🔵 [2026-03-26T19:00:30.773Z] API #12 start (input: 6.5 KB, model: deepseek-chat)
🟢 [2026-03-26T19:01:36.410Z] API #12 done in 65.6s (output: 3.9 KB, finish: stop)

🔵 [2026-03-26T19:01:36.410Z] API #13 start (input: 6.0 KB, model: deepseek-chat)
🟢 [2026-03-26T19:02:37.851Z] API #13 done in 61.4s (output: 3.6 KB, finish: stop)

🔵 [2026-03-26T19:02:37.852Z] API #14 start (input: 7.8 KB, model: deepseek-chat)
🟢 [2026-03-26T19:03:41.746Z] API #14 done in 63.9s (output: 5.7 KB, finish: stop)

🔵 [2026-03-26T19:03:41.749Z] API #15 start (input: 24.0 KB, model: deepseek-chat)
🟢 [2026-03-26T19:05:38.735Z] API #15 done in 117.0s (output: 10.4 KB, finish: length)

🔵 [2026-03-26T19:05:38.736Z] API #16 start (input: 10.5 KB, model: deepseek-chat)
🟢 [2026-03-26T19:07:08.159Z] API #16 done in 89.4s (output: 8.3 KB, finish: stop)

🔵 [2026-03-26T19:07:08.160Z] API #17 start (input: 3.4 KB, model: deepseek-chat)
🟢 [2026-03-26T19:07:16.081Z] API #17 done in 7.9s (output: 1.0 KB, finish: stop)

🔵 [2026-03-26T19:07:16.081Z] API #18 start (input: 10.0 KB, model: deepseek-chat)
🟢 [2026-03-26T19:09:09.524Z] API #18 done in 113.4s (output: 6.6 KB, finish: length)

🔵 [2026-03-26T19:09:09.524Z] API #19 start (input: 6.5 KB, model: deepseek-chat)
🟢 [2026-03-26T19:10:14.753Z] API #19 done in 65.2s (output: 3.9 KB, finish: stop)

🔵 [2026-03-26T19:10:14.753Z] API #20 start (input: 6.0 KB, model: deepseek-chat)
🟢 [2026-03-26T19:11:15.844Z] API #20 done in 61.1s (output: 3.6 KB, finish: stop)

🔵 [2026-03-26T19:11:15.851Z] API #21 start (input: 7.8 KB, model: deepseek-chat)
🟢 [2026-03-26T19:12:19.676Z] API #21 done in 63.8s (output: 5.7 KB, finish: stop)

🔵 [2026-03-26T19:12:19.690Z] API #22 start (input: 10.5 KB, model: deepseek-chat)
🟢 [2026-03-26T19:13:50.611Z] API #22 done in 90.9s (output: 8.3 KB, finish: stop)

🔵 [2026-03-26T19:13:50.625Z] API #23 start (input: 13.3 KB, model: deepseek-chat)
🟢 [2026-03-26T19:15:43.945Z] API #23 done in 113.3s (output: 7.3 KB, finish: length)

🔵 [2026-03-26T19:15:43.946Z] API #24 start (input: 3.4 KB, model: deepseek-chat)
🟢 [2026-03-26T19:15:52.761Z] API #24 done in 8.8s (output: 1.0 KB, finish: stop)

🔵 [2026-03-26T19:15:52.762Z] API #25 start (input: 6.5 KB, model: deepseek-chat)
🟢 [2026-03-26T19:17:00.675Z] API #25 done in 67.9s (output: 3.9 KB, finish: stop)

🔵 [2026-03-26T19:17:00.675Z] API #26 start (input: 7.7 KB, model: deepseek-chat)
🟢 [2026-03-26T19:18:34.658Z] API #26 done in 94.0s (output: 5.2 KB, finish: stop)

🔵 [2026-03-26T19:18:34.658Z] API #27 start (input: 3.3 KB, model: deepseek-chat)
🟢 [2026-03-26T19:18:43.438Z] API #27 done in 8.8s (output: 0.9 KB, finish: stop)

🔵 [2026-03-26T19:18:43.438Z] API #28 start (input: 5.4 KB, model: deepseek-chat)
🟢 [2026-03-26T19:19:11.211Z] API #28 done in 27.8s (output: 3.2 KB, finish: stop)

🔵 [2026-03-26T19:19:11.215Z] API #29 start (input: 6.5 KB, model: deepseek-chat)
🟢 [2026-03-26T19:19:51.351Z] API #29 done in 40.1s (output: 4.1 KB, finish: stop)

🔵 [2026-03-26T19:19:51.352Z] API #30 start (input: 6.6 KB, model: deepseek-chat)
🟢 [2026-03-26T19:20:43.974Z] API #30 done in 52.6s (output: 4.2 KB, finish: stop)

🔵 [2026-03-26T19:20:43.975Z] API #31 start (input: 3.4 KB, model: deepseek-chat)
🟢 [2026-03-26T19:20:52.586Z] API #31 done in 8.6s (output: 1.0 KB, finish: stop)

🔵 [2026-03-26T19:20:52.586Z] API #32 start (input: 6.5 KB, model: deepseek-chat)
🟢 [2026-03-26T19:21:59.043Z] API #32 done in 66.5s (output: 3.9 KB, finish: stop)

🔵 [2026-03-26T19:21:59.043Z] API #33 start (input: 7.7 KB, model: deepseek-chat)
🟢 [2026-03-26T19:23:24.658Z] API #33 done in 85.6s (output: 5.2 KB, finish: stop)

🔵 [2026-03-26T19:23:24.659Z] API #34 start (input: 6.2 KB, model: deepseek-chat)
🟢 [2026-03-26T19:23:55.061Z] API #34 done in 30.4s (output: 4.0 KB, finish: stop)

======================================================================
❌ Failed after 2239.3s: Translated markdown appears incomplete after chunk reassembly
code=unexpected_error isCritical=false

📊 SUMMARY
──────────────────────────────────────────────────────────────────────
Total wall time: 2239.3s
Total API calls: 34
Total API time: 2239.3s
Total non-API time: 0.1s (0.0%)

📊 PER-CALL DETAIL (with inter-call gaps)
──────────────────────────────────────────────────────────────────────
API # 1: 118.3s | @ 0.0s | in: 24.0 KB | out: 10.4 KB | finish: length
API # 2: 90.6s | @ 118.3s | in: 10.5 KB | out: 8.3 KB | finish: stop
API # 3: 8.7s | @ 208.9s | in: 3.4 KB | out: 1.0 KB | finish: stop
API # 4: 114.8s | @ 217.6s | in: 10.0 KB | out: 6.6 KB | finish: length
API # 5: 62.9s | @ 332.4s | in: 6.5 KB | out: 3.9 KB | finish: stop
API # 6: 59.8s | @ 395.2s | in: 6.0 KB | out: 3.6 KB | finish: stop
API # 7: 60.4s | @ 455.0s | in: 7.8 KB | out: 5.7 KB | finish: stop
API # 8: 114.2s | @ 515.5s | in: 24.0 KB | out: 10.4 KB | finish: length
API # 9: 88.3s | @ 629.7s | in: 10.5 KB | out: 8.3 KB | finish: stop
API #10: 8.0s | @ 718.0s | in: 3.4 KB | out: 1.0 KB | finish: stop
API #11: 109.1s | @ 726.0s | in: 10.0 KB | out: 6.6 KB | finish: length
API #12: 65.6s | @ 835.0s | in: 6.5 KB | out: 3.9 KB | finish: stop
API #13: 61.4s | @ 900.7s | in: 6.0 KB | out: 3.6 KB | finish: stop
API #14: 63.9s | @ 962.1s | in: 7.8 KB | out: 5.7 KB | finish: stop
API #15: 117.0s | @ 1026.0s | in: 24.0 KB | out: 10.4 KB | finish: length
API #16: 89.4s | @ 1143.0s | in: 10.5 KB | out: 8.3 KB | finish: stop
API #17: 7.9s | @ 1232.4s | in: 3.4 KB | out: 1.0 KB | finish: stop
API #18: 113.4s | @ 1240.3s | in: 10.0 KB | out: 6.6 KB | finish: length
API #19: 65.2s | @ 1353.8s | in: 6.5 KB | out: 3.9 KB | finish: stop
API #20: 61.1s | @ 1419.0s | in: 6.0 KB | out: 3.6 KB | finish: stop
API #21: 63.8s | @ 1480.1s | in: 7.8 KB | out: 5.7 KB | finish: stop
API #22: 90.9s | @ 1543.9s | in: 10.5 KB | out: 8.3 KB | finish: stop
API #23: 113.3s | @ 1634.9s | in: 13.3 KB | out: 7.3 KB | finish: length
API #24: 8.8s | @ 1748.2s | in: 3.4 KB | out: 1.0 KB | finish: stop
API #25: 67.9s | @ 1757.0s | in: 6.5 KB | out: 3.9 KB | finish: stop
API #26: 94.0s | @ 1824.9s | in: 7.7 KB | out: 5.2 KB | finish: stop
API #27: 8.8s | @ 1918.9s | in: 3.3 KB | out: 0.9 KB | finish: stop
API #28: 27.8s | @ 1927.7s | in: 5.4 KB | out: 3.2 KB | finish: stop
API #29: 40.1s | @ 1955.5s | in: 6.5 KB | out: 4.1 KB | finish: stop
API #30: 52.6s | @ 1995.6s | in: 6.6 KB | out: 4.2 KB | finish: stop
API #31: 8.6s | @ 2048.2s | in: 3.4 KB | out: 1.0 KB | finish: stop
API #32: 66.5s | @ 2056.8s | in: 6.5 KB | out: 3.9 KB | finish: stop
API #33: 85.6s | @ 2123.3s | in: 7.7 KB | out: 5.2 KB | finish: stop
API #34: 30.4s | @ 2208.9s | in: 6.2 KB | out: 4.0 KB | finish: stop

📊 CALL SIZE PATTERN
──────────────────────────────────────────────────────────────────────
24.0 → 10.5 → 3.4 → 10.0 → 6.5 → 6.0 → 7.8 → 24.0 → 10.5 → 3.4 → 10.0 → 6.5 → 6.0 → 7.8 → 2
4.0 → 10.5 → 3.4 → 10.0 → 6.5 → 6.0 → 7.8 → 10.5 → 13.3 → 3.4 → 6.5 → 7.7 → 3.3 → 5.4 → 6.5 →
6.6 → 3.4 → 6.5 → 7.7 → 6.2 KB


Executed in 37.32 mins fish external
usr time 3.30 secs 0.00 micros 3.30 secs
sys time 1.68 secs 924.00 micros 1.68 secs

Analysis Report

Root Cause: 100% DeepSeek API time

   Total wall time:    2239s (37 min)
   Total API time:     2239s (100%)
   Local code time:    0.1s  (0.0%)

The bottleneck is entirely DeepSeek. Your local code is not the problem at all.

What's happening: a cascade of wasted API calls

The 34 calls break into clear phases driven by two bugs working together:

Phase 1–3 (calls 1–21): Three identical retry rounds, ~510s each

Each round follows the same pattern:

  1. Full 24KB text sent → finish_reason: length (output truncated, ~118s wasted)
  2. translateChunkWithOverflowFallback splits into sub-chunks and translates them (~6 calls,
    ~350s)
  3. Reassembly → isSuspiciouslyIncompleteTranslation → fails → completeness retry with halved
    chunkLimit

But at depths 0, 1, 2 the chunkLimit (120K→60K→30K) is still larger than the 21KB masked
text, so it takes the single-call fast path every time and hits the exact same truncation.
Three identical rounds completely wasted.

Phase 4 (calls 22–34): Different chunking, still fails

At depth 3 (chunkLimit=15K), the text finally gets pre-split differently. But call #23 still
hits finish_reason: length. After all sub-chunks complete, reassembly still fails the
completeness check → final error.

The two problems

  1. finish_reason: length on the first call. DeepSeek's default max_tokens isn't enough to
    output the full translation of the 21KB input. The code correctly detects this and splits,
    but...
  2. Completeness retries are wasteful. The retry loop halves chunkLimit, but until it drops
    below the text size (~21KB), the text isn't actually split differently. Depths 0→1→2 all
    produce identical API calls (you can see calls 1–7 ≡ 8–14 ≡ 15–21 exactly).

Cost of the cascade

┌────────────────┬───────┬───────┬──────────────────────────────┐
│ Phase │ Calls │ Time │ Outcome │
├────────────────┼───────┼───────┼──────────────────────────────┤
│ Depth 0 (120K) │ 7 │ ~515s │ Wasted — identical to next │
├────────────────┼───────┼───────┼──────────────────────────────┤
│ Depth 1 (60K) │ 7 │ ~510s │ Wasted — identical to next │
├────────────────┼───────┼───────┼──────────────────────────────┤
│ Depth 2 (30K) │ 7 │ ~518s │ Wasted — identical to next │
├────────────────┼───────┼───────┼──────────────────────────────┤
│ Depth 3 (15K) │ 13 │ ~696s │ Different split, still fails │
├────────────────┼───────┼───────┼──────────────────────────────┤
│ Total │ 34 │ 2239s │ Error │
└────────────────┴───────┴───────┴──────────────────────────────┘

~1543s (69%) spent on three identical retry rounds that could never produce a different
result.

Suggested fixes

  1. Set max_tokens explicitly on the DeepSeek API call (e.g., 8192 or 16384). The 21KB input
    should produce a similar-sized translation, and the current default is clearly too low,
    causing the length truncation that triggers the entire cascade.
  2. Skip no-op completeness retries. When nextChunkLimit is still larger than the actual
    masked text, halving it won't change the chunking. The retry should detect this and jump to
    a chunkLimit that actually forces a split (e.g., maskedText.length / 2).
  3. Consider parallelizing chunk translations. Calls 2–7 (the sub-chunks) are independent but
    run sequentially. Translating them in parallel would cut ~350s per round to ~115s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Translation: Automatic translation is skipping chunks in long form content

2 participants