Conversation
Long-form Notion pages (troubleshooting, create/edit observation, etc.) were silently dropping sections during automatic translation. The likely cause: feeding very large chunks to the model saturated its effective attention window, causing it to omit headings and paragraphs without raising an error. Changes: - Lower proactive chunk ceiling from 500 K → 120 K chars so each translation request stays well within reliable model attention range - Add structural completeness validation after every translation call: checks heading count, fenced code blocks, bullet/numbered lists, table lines, and severe length shrinkage (< 55 % of source) - Retry with progressively smaller chunks (halved each attempt, floor 8 K chars) up to TRANSLATION_COMPLETENESS_MAX_RETRIES (2) times when incompleteness is detected, then surface a non-critical error Closes #166
🐳 Docker Image PublishedYour Docker image has been built and pushed for this PR. Image Reference: Platforms: linux/amd64, linux/arm64 TestingTo test this image: docker pull docker.io/communityfirst/comapeo-docs-api:pr-169
docker run -p 3001:3001 docker.io/communityfirst/comapeo-docs-api:pr-169Built with commit d15da43 |
|
@codex review |
Code Review SummaryStatus: No Issues Found | Recommendation: Merge OverviewThis increment adds frontmatter integrity failure to the recoverable retry conditions, along with test coverage for the retry behavior.
Changes in This Increment
VerificationThe source code change at line ~1153 adds a second condition to
Previously Resolved Issues (carried forward)
Files Reviewed (2 files)
|
🚀 Preview DeploymentYour documentation preview is ready! Preview URL: https://pr-169.comapeo-docs.pages.dev 🔄 Content: Regenerated 5 pages from Notion (script changes detected)
This preview will update automatically when you push new commits to this PR. Built with commit d15da43 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 33c1581bba
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 33c1581bba
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
… overhead
P1: TRANSLATION_COMPLETENESS_MAX_RETRIES was 2, so halving from 120k only
reached 60k → 30k before giving up. Reaching the 8k floor requires 4
halvings (120k→60k→30k→15k→8k), so raise the constant to 4.
P2: getChunkContentBudget was flooring the *content* budget at
TRANSLATION_MIN_CHUNK_MAX_CHARS (8k), ignoring prompt overhead (~2.6k).
This made the actual request larger than the documented 8k minimum.
Fix: subtract overhead from the total limit and floor the content budget
at 1; the 8k total-request floor is already enforced by the retry caller.
Update the "preserves heading structures" test to use a chunkLimit that
reflects a realistic total-request budget (3_200 chars) rather than a
raw content size (500 chars), which the old incorrect floor had masked.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fd3b4d24b7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
collectMarkdownStructureMetrics only matched ATX headings (# Heading). CommonMark/Docusaurus also accept setext headings (underline with === or ---). If the model reformats a heading into setext style the count would drop and translateText would incorrectly treat the translation as incomplete. Add a multiline regex for setext headings and include them in headingCount.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c36dfe9b16
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The previous regex /^.+\n[=\-]{2,}\s*/gm matched any non-empty line
followed by --- or ===, which also matches list items before thematic
breaks (e.g. "- Item\n---"). This caused isSuspiciouslyIncompleteTranslation
to count spurious headings in the source and falsely flag complete
translations as incomplete.
Fix: only match === underlines (setext H1). The = character has no other
CommonMark meaning, so this is unambiguous. Setext H2 (--- underline) is
skipped because it cannot be distinguished from a thematic break without
a full parser. Notion content uses ATX headings exclusively anyway.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4459a152a0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
… check Setext H2 headings (Heading\n---): re-introduce detection with a negative lookahead that excludes lines starting with list markers or block-level prefixes, which avoids the thematic-break false-positive while still catching genuine section headings. Admonitions (:::type … :::): Docusaurus callout blocks can be silently dropped by the model without triggering any of the existing checks. Count opening+closing ::: pairs (like fenced code blocks) and treat a drop in admonitionCount as an incompleteness signal.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ecdf12235a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…ection P1 — table detection: the previous regex only matched GFM table rows with outer pipes (| A | B |). Models sometimes emit pipeless form (A | B | C). Switch to matching GFM table *separator* rows instead — these are the unambiguous per-spec indicator of a table and work regardless of outer-pipe style. Threshold lowered to 1 separator (from 2 data lines). Regex uses a simple character-class + .filter() to avoid ReDoS-unsafe nested quantifiers. P2 — fenced code content: structural markers inside fenced code blocks (headings, list items, table rows, admonitions) were counted as real document structure. Strip fenced block interiors before running all regex checks so that code samples do not inflate source counts and cause false-positive incompleteness failures.
|
@codex review |
This comment was marked as off-topic.
This comment was marked as off-topic.
Fixes a TypeScript 'includes does not exist on type never' error and allows up to 3 spaces of optional indentation for fenced code blocks in stripFencedCodeContent. Co-authored-by: Junie <junie@jetbrains.com>
Bullet lists inside YAML frontmatter (e.g. keywords lists) were being counted as structural elements, causing false-positive incomplete translation detections when the model reformatted them as inline arrays. Strip frontmatter before collecting structure metrics so that only translatable content body is evaluated.
…ce content - Relax heading-loss threshold from strict (< source) to (< source - 1) so a single reformatted heading no longer triggers a spurious retry - Buffer lines inside unclosed fenced code blocks and restore them on EOF instead of silently dropping remaining content, preventing false positive completeness failures on malformed markdown - Update retries/failure tests to use 4-section documents that still demonstrate retry behaviour under the new heading tolerance
|
Ok, I did a first round of review and found what I think its a bug. To reproduce:
import { translateText } from "./scripts/notion-translate/translateFrontMatter";
import { readFileSync, writeFileSync } from "node:fs";
const text = readFileSync("creating-a-new-observation.md").toString();
const { markdown, title } = await translateText(
text,
"Creating a new observation",
"es-AR"
);
console.log(`translated with title ${title}`);
writeFileSync("obs_es.md", markdown);The script throwed from {
"markdown": "long formed content...."
}but it doesn't end from the payload (there's no closing brace). Potential IssueThe response is being chunked, but we parse the payload as if it where the full json response, hence the fail when parsing as JSON |
The headingLoss condition tolerated losing one heading silently (`< headingCount - 1`), weakening the regression guard for long-form content loss. Align it with all other structural checks (zero-tolerance) by changing to `< headingCount`. Addresses review feedback on PR #169.
When OpenAI truncates output due to hitting the token budget it signals finish_reason: "length". Previously the code would fall through and try to parse truncated JSON, producing parse errors or garbage. Now it throws a non-critical token_overflow TranslationError immediately, which the existing translateChunkWithOverflowFallback retry path handles by splitting the chunk and retrying. Adds two tests: one verifying the error classification, one verifying the end-to-end retry-with-smaller-chunks behaviour.
|
This is addressed now for the failure mode described there. We now check Relevant points:
I also re-ran the targeted |
… misparse The fence state machine now tracks the opening fence character and length per CommonMark spec. A fence is only closed by a closing marker of the same character with at least the opening length and no info string. fix(translate): add frontmatter integrity check for critical Docusaurus fields Added parseFrontmatterKeys() to extract top-level YAML keys and assertFrontmatterIntegrity() to verify that translated markdown preserves all frontmatter keys present in the source. Detects missing or unexpectedly added critical fields (slug, sidebar_position, etc.) and fails the translation with a non-critical error so retries are possible. chore(changelog): scope Unreleased entries to translation-integrity work only Removed unrelated entries from CHANGELOG.md to focus on translation-integrity improvements. Kept only the entries relevant to this fix: - Translation Completeness - Long-form Content Translation - Build Scripts
…g retry assertFrontmatterIntegrity threw schema_invalid / isCritical:false and its docstring promised the caller would retry, but translateText only retried on unexpected_error + /incomplete/ messages. Frontmatter violations therefore bypassed all retry logic and aborted translation immediately. Extend isRecoverableCompletenessFailure to also match schema_invalid errors whose message contains "Frontmatter integrity check failed", so they share the same chunk-halving retry path as completeness failures. Add a test that verifies a first-attempt frontmatter drop triggers a retry and succeeds when the subsequent attempt preserves all keys.
|
I did another round of testing with the changes: ________________________________________________________
Executed in 20.66 mins fish external
usr time 1.46 secs 0.21 millis 1.46 secs
sys time 0.83 secs 1.25 millis 0.83 secsas seen, the time to translate this long form article took 20mins, which is pretty long Doing a smaller file yielded a translation in 5 seconds (4 line file) Additionally, I'm wondering why the main translation function lives in |
|
I did more thorough testing, trying to find where the bottleneck lies. It seems that the issue is wrong chunk which leads to repeated and unnecessary api calls. Each api call takes around a minute, and seeing how things are chunk, things should take around 2 API calls for a 300 line file. Heres the output of the test I did, with an AI assisted analysis. If needed, I can also add the script I used to test this (but don't want to pollute the git history for now...) 📄 Input: 732043 chars (714.9 KB) ======================================================================
|
Problem
Closes #166
Long-form Notion pages (troubleshooting, create/edit observation, view/edit track, understanding exchange, etc.) were silently dropping sections during automatic translation. The root cause: feeding very large chunks (up to 500 K chars) to the model saturated its effective attention window, causing it to omit headings and paragraphs without raising any error.
Solution
Two complementary mechanisms:
1. Proactive aggressive chunking
Lowered
TRANSLATION_CHUNK_MAX_CHARSfrom 500,000 → 120,000 chars. Each translation request now stays well within the model's reliable attention range, reducing the chance of content being dropped in the first place.2. Structural completeness validation + retry
After every translation call, the result is compared against the source using structural markers:
If incompleteness is detected, the chunk limit is halved and the translation is retried (up to
TRANSLATION_COMPLETENESS_MAX_RETRIES = 2times, with a floor ofTRANSLATION_MIN_CHUNK_MAX_CHARS = 8,000chars). After exhausting retries a non-critical error is surfaced so the pipeline can log and continue.Test plan
token_overflowbunx vitest run scripts/notion-translate/translateFrontMatter.test.ts)