Skip to content

feat: commentIdBase prop for collision-free collab comment IDs (#257)#956

Draft
jedrazb wants to merge 9 commits into
mainfrom
fix/257-comment-id-base-v2
Draft

feat: commentIdBase prop for collision-free collab comment IDs (#257)#956
jedrazb wants to merge 9 commits into
mainfrom
fix/257-comment-id-base-v2

Conversation

@jedrazb

@jedrazb jedrazb commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Supersedes #942

In a collaborative session every peer's createCommentIdAllocator() started at 1, so the first comment from each peer got the same id and replies threaded onto the wrong card. This adds optional commentIdBase/commentIdStride props (React + Vue) that partition the ID space: pass ydoc.clientID * 1_000_000 with commentIdStride={1_000_000} and each peer mints from a disjoint range. Default 0/Infinity preserves the single-editor 1, 2, 3, … sequence. Fixes #257.

Review fixes applied on top of the original work:

  • Scope the collaboration-example dev server to Playwright runs that include the chromium project, so project-scoped runs (notably the release publish-path parity smoke gate) don't boot an unrelated server.
  • Time-box the example's startup GitHub stars fetch (2s) so a restricted network can't hang the webServer startup window.
  • Document the two preconditions for cross-peer collision-freedom (distinct clientIDs; fewer than stride allocations per session).
  • Use Vue binding syntax in the Vue prop's JSDoc example.

Known follow-up (not addressed here): DocxEditor.tsx crossed 2000 lines so the max-lines cap was nudged to 2020. The real fix is the planned file split, tracked separately.


View with Codesmith Autofix with Codesmith
Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.

samcorcos and others added 9 commits June 19, 2026 10:28
seedAbove now ignores IDs outside (base, base+stride] so a peer's synced
tracked-change revisionId can't pull this peer's allocator into another
partition; commentIdStride prop threads the stride through React/Vue and
the collab example. Floating add-comment button gets a data-testid so the
e2e test no longer matches on computed CSS, and getAttribute null is
guarded for a clearer failure.
- Scope the collaboration-example dev server to Playwright runs that
  include the chromium project, so project-scoped runs (notably the
  release publish-path parity smoke gate) don't boot an unrelated server.
- Time-box the example's startup GitHub stars fetch (2s) so a restricted
  network can't hang the webServer startup window.
- Document the two preconditions for cross-peer collision-freedom
  (distinct clientIDs; fewer than `stride` allocations per session).
- Use Vue binding syntax in the Vue prop's JSDoc example.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 20, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docx-editor Ready Ready Preview, Comment Jun 20, 2026 6:28pm

Request Review

@eigenpal-release-pal

Copy link
Copy Markdown
Contributor

All contributors have signed the CLA ✍️ ✅

Posted by the CLA bot.

@greptile-apps

greptile-apps Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Adds optional commentIdBase / commentIdStride props (React + Vue) that partition the comment/revision ID space across collaborating peers, fixing the collision where every peer's allocator started at 1. The default values (0 / Infinity) preserve the existing single-editor 1, 2, 3… sequence with no behavioural change.

  • Core allocator (commentIdAllocator.ts): createCommentIdAllocator gains base and stride params; seedAbove now ignores IDs outside (base, base+stride] so synced peer revision marks cannot cross-contaminate the local counter.
  • React + Vue: new props threaded to the allocator at mount; parity contract and API snapshots updated; DocxEditorPagedArea gains data-testid="floating-add-comment" to support the E2E test.
  • E2E + Playwright config: hermetic two-peer BroadcastChannel test added; collab dev server conditionally started only when the chromium project is in scope; GitHub stars fetch time-boxed to 2 s to avoid hanging CI startup.

Confidence Score: 4/5

Safe to merge; the core logic is correct for all documented usage patterns and the backward-compatible defaults preserve existing single-editor behaviour.

The allocator correctly filters out-of-partition IDs in seedAbove, both React and Vue wire up the props symmetrically, unit tests cover the key collab edge cases, and the parity contract is updated. Two observations hold the score: next() is unbounded and will silently overflow ceiling if a session mints more than stride IDs (the ceiling variable exists but is unused in next()), and the E2E test's .last() card picker could non-deterministically grab a peer-synced card depending on sidebar sort order under slow Yjs sync.

packages/core/src/prosemirror/commentIdAllocator.ts — the next() overflow path; e2e/tests/collab-comment-id.spec.ts — the .last() card-ID picker.

Important Files Changed

Filename Overview
packages/core/src/prosemirror/commentIdAllocator.ts Core change: adds base/stride params for ID partitioning; seedAbove correctly clamps to partition, but next() is unbounded and silently overflows ceiling
packages/core/src/prosemirror/tests/commentIdAllocator.test.ts Good unit coverage: default sequence, partition isolation, in-partition seed, out-of-partition seed, and cross-peer contamination all tested
packages/react/src/components/DocxEditor.tsx Correctly threads commentIdBase/commentIdStride into useRef(createCommentIdAllocator(...)); read-once-at-mount behavior is documented and intentional
packages/vue/src/components/DocxEditor.vue Vue component initializes allocator with props at setup time; commentIdStride defaults to undefined (which correctly maps to Infinity via function default)
e2e/tests/collab-comment-id.spec.ts Hermetic two-peer BroadcastChannel test; the .last() card picker could race against Yjs sync delivering the peer's card out of order
playwright.config.ts Conditionally boots the collab dev server only when chromium project is in scope; requestedProjects parsing handles both --project=x and --project x forms correctly
examples/collaboration/src/useCollaboration.ts Derives commentIdBase = ydoc.clientID * 1_000_000 inside the roomName-scoped useMemo and exports both props; clean integration
examples/collaboration/vite.config.ts AbortSignal.timeout(2000) added to the GitHub stars fetch to prevent hanging the Playwright webServer startup window on restricted networks

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["DocxEditor mount\n(commentIdBase, commentIdStride)"] --> B["createCommentIdAllocator(base, stride)\nnextId = base + 1\nceiling = base + stride"]
    B --> C["seedCommentAllocator called\non document load"]
    C --> D{"For each comment/\nrevision mark ID"}
    D --> E["allocator.seedAbove(id)"]
    E --> F{"id >= nextId\nAND id <= ceiling?"}
    F -- Yes --> G["nextId = id + 1\n(in-partition seed)"]
    F -- No --> H["Ignored\n(out-of-partition\nor already passed)"]
    G --> I["allocator.next()\nreturns nextId++"]
    H --> I
    I --> J{"nextId > ceiling?\n(overflow)"}
    J -- "No (common case)" --> K["ID minted in own partition\n✓ No collision"]
    J -- "Yes (> stride IDs minted)" --> L["ID crosses into\nadjacent partition\n⚠ Silent overflow"]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A["DocxEditor mount\n(commentIdBase, commentIdStride)"] --> B["createCommentIdAllocator(base, stride)\nnextId = base + 1\nceiling = base + stride"]
    B --> C["seedCommentAllocator called\non document load"]
    C --> D{"For each comment/\nrevision mark ID"}
    D --> E["allocator.seedAbove(id)"]
    E --> F{"id >= nextId\nAND id <= ceiling?"}
    F -- Yes --> G["nextId = id + 1\n(in-partition seed)"]
    F -- No --> H["Ignored\n(out-of-partition\nor already passed)"]
    G --> I["allocator.next()\nreturns nextId++"]
    H --> I
    I --> J{"nextId > ceiling?\n(overflow)"}
    J -- "No (common case)" --> K["ID minted in own partition\n✓ No collision"]
    J -- "Yes (> stride IDs minted)" --> L["ID crosses into\nadjacent partition\n⚠ Silent overflow"]
Loading

Reviews (1): Last reviewed commit: "fix(comment-id): address review on colla..." | Re-trigger Greptile

Comment on lines +55 to 63
export function createCommentIdAllocator(base = 0, stride = Infinity): CommentIdAllocator {
let nextId = base + 1;
const ceiling = base + stride;
return {
next: () => nextId++,
seedAbove(maxId: number) {
if (maxId >= nextId) nextId = maxId + 1;
if (maxId >= nextId && maxId <= ceiling) nextId = maxId + 1;
},
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 next() is unbounded — silent partition overflow

ceiling is computed but only consulted in seedAbove; next() never checks it. Once a peer mints its 1,000,001st ID it silently emits values in the adjacent partition. From that point, the other peer's seedAbove will accept those values (they fall within (adjacentBase, adjacentBase + stride]) and advance the victim's counter, causing the exact collision this feature is meant to prevent — with no error or warning.

Adding a dev-mode guard directly in next() (e.g., if (process.env.NODE_ENV !== 'production' && nextId > ceiling) console.warn(...)) would make the overflow immediately visible during development without any production overhead.

Comment on lines +24 to +28
const card = page.locator(`${SIDEBAR} .docx-comment-card[data-comment-id]`).last();
await card.waitFor({ state: 'visible' });
const id = await card.getAttribute('data-comment-id');
if (id == null) throw new Error('comment card rendered without data-comment-id');
return Number(id);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 .last() card picker is racing against Yjs sync

By the time peer B presses Enter and the assertion resolves, peer A's comment card may have already synced into peer B's sidebar. If the sidebar renders synced cards after the locally-minted card (e.g., by comment ID ascending, where peer B's base is higher than peer A's), .last() returns the correct card. But if the sidebar sorts by document position or Y.Array insertion order, A's card might appear after B's, causing .last() to return A's ID — making idB === idA and tripping the distinctness assertion non-deterministically.

A more resilient approach is to snapshot visible IDs before adding the comment and then wait for a new ID to appear, rather than relying on positional order.

@jedrazb

jedrazb commented Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

Heads up before this lands: the per-peer ID scheme produces comment/revision IDs that Word won't accept, so collaborative saves will come back corrupt.

The relevant attributes (w:comment/@w:id, w:ins/@w:id, w:del/@w:id, w:commentReference/@w:id, w:commentRangeStart/End) are all ST_DecimalNumber, which the schema defines as plain xsd:integer, no upper bound:

<xsd:simpleType name="ST_DecimalNumber">
  <xsd:restriction base="xsd:integer"/>
</xsd:simpleType>

The schema doesn't match what Word actually does, though. Word reads these IDs as signed 32-bit ints, so anything above 2,147,483,647 (0x7FFFFFFF) gets rejected: the file opens into "unreadable content" / Document Recovery and the comment or revision is dropped or renumbered. We already depend on this exact limit elsewhere. packages/core/src/utils/hexId.ts caps generated IDs at MAX_HEX_ID_EXCLUSIVE = 0x7fffffff, with a comment that it's there "to survive both Word ('Document Recovery') and strict OOXML validators." Same ceiling applies to the decimal w:id fields.

The scheme here is base = ydoc.clientID * 1_000_000. clientID is a random uint32 (up to ~4.3e9), so the base passes 0x7FFFFFFF once clientID is over ~2147, which is almost always: a random clientID lands at or under 2147 about 1 in 2 million times. So in practice every peer's first comment is already out of range (e.g. clientID 4e9 gives an id near 4e15). The serializer writes comment.id straight through (it only clamps negatives), so those values reach the XML unchanged.

The partitioning idea itself is fine for keeping IDs unique; the problem is just that clientID is too large to use as the multiplier.

There are really two separate requirements:

I'd keep the per-peer partition for the live side, and renumber the comment + revision IDs down to a dense 1..N at serialize time, only when something is actually out of range so normal single-editor saves stay untouched, with a hard 0x7FFFFFFF clamp as a backstop. One thing to watch: the editor uses a single ID space shared across comments and revisions, so the renumber has to draw from one sequence rather than two. This also covers oversized IDs coming from a malformed input file, not just collab.

Suggest holding the merge until the save side is handled. The partition on its own isn't safe for Word.

@jedrazb jedrazb marked this pull request as draft June 20, 2026 19:07
@samcorcos

Copy link
Copy Markdown
Contributor

Follow-up patch addressing the int32 w:id concern raised above, plus the two Greptile P2s. Applies cleanly on top of 44910a02.

What it does:

  • New packages/core/src/docx/serializer/decimalIdRemap.ts: pre-pass run from repackDocx / repackDocxFromRaw that walks the Document (body, comments, headers/footers, notes, tables, SDTs) and, only when any comment/revision ID exceeds 0x7FFFFFFF, remaps all of them to a dense 1..N from a single shared sequence. No-op otherwise, so single-editor saves are byte-identical. attemptSelectiveSave bails to full repack when a remap is needed (range markers in unchanged paragraphs).
  • New decimalIdRemap.test.ts with an output-level invariant: createDocx() a doc whose IDs are ~4×10¹⁵, unzip, assert every w:id in document.xml + comments.xml is ≤ 0x7FFFFFFF and that commentRangeStartw:comment IDs still match. This test fails on the current head of this branch (Received: 4000000001000001) and passes with the patch.
  • commentIdAllocator.ts: JSDoc now states the int32 reality and points at the remap; next() dev-warns once if it crosses ceiling.
  • collab-comment-id.spec.ts: addComment() snapshots existing IDs and polls for the new one instead of .last().
  • Changeset summary updated.

Verified locally: bun run typecheck, bun test packages/core/src/docx/ (376 pass), npx playwright test collab-comment-id --project=chromium (pass).

Patch (apply with git apply)
diff --git a/.changeset/comment-id-base-collab.md b/.changeset/comment-id-base-collab.md
index eb849e32..95f8fcc0 100644
--- a/.changeset/comment-id-base-collab.md
+++ b/.changeset/comment-id-base-collab.md
@@ -4,4 +4,4 @@
 '@eigenpal/docx-editor-vue': minor
 ---
 
-Add `commentIdBase`/`commentIdStride` props to partition comment/revision IDs per collaborating peer. Fixes #257.
+Add `commentIdBase`/`commentIdStride` props to partition comment/revision IDs per collaborating peer. On save, oversized IDs are renumbered to fit Word's signed-int32 `w:id` limit so collaborative documents open cleanly. Fixes #257.
diff --git a/e2e/tests/collab-comment-id.spec.ts b/e2e/tests/collab-comment-id.spec.ts
index 2797f666..578eeba6 100644
--- a/e2e/tests/collab-comment-id.spec.ts
+++ b/e2e/tests/collab-comment-id.spec.ts
@@ -15,17 +15,26 @@ const SIDEBAR = '.docx-unified-sidebar';
 const ADD_COMMENT_BTN = '[data-testid="floating-add-comment"]';
 
 async function addComment(page: Page, body: string): Promise<number> {
+  // Snapshot existing IDs first, then wait for a NEW one to appear — `.last()`
+  // would race against the peer's comment syncing into this sidebar.
+  const before = new Set(await visibleCommentIds(page));
   await page.locator('.layout-page-content').click({ clickCount: 3 });
   await page.locator(ADD_COMMENT_BTN).click();
   const ta = page.locator(`${SIDEBAR} textarea`).last();
   await ta.waitFor({ state: 'visible' });
   await ta.fill(body);
   await ta.press('Enter');
-  const card = page.locator(`${SIDEBAR} .docx-comment-card[data-comment-id]`).last();
-  await card.waitFor({ state: 'visible' });
-  const id = await card.getAttribute('data-comment-id');
-  if (id == null) throw new Error('comment card rendered without data-comment-id');
-  return Number(id);
+  let added: number | undefined;
+  await expect
+    .poll(
+      async () => {
+        added = (await visibleCommentIds(page)).find((id) => !before.has(id));
+        return added;
+      },
+      { timeout: 10_000 }
+    )
+    .toBeDefined();
+  return added!;
 }
 
 async function visibleCommentIds(page: Page): Promise<number[]> {
diff --git a/packages/core/src/docx/rezip.ts b/packages/core/src/docx/rezip.ts
index 42811c19..2f787315 100644
--- a/packages/core/src/docx/rezip.ts
+++ b/packages/core/src/docx/rezip.ts
@@ -39,6 +39,7 @@
 import JSZip from 'jszip';
 import type { Document } from '../types/document';
 import { serializeDocument } from './serializer/documentSerializer';
+import { remapOversizedRevisionIds } from './serializer/decimalIdRemap';
 import { type RawDocxContent } from './unzip';
 import { escapeXml } from './serializer/xmlUtils';
 
@@ -106,6 +107,10 @@ export async function repackDocx(doc: Document, options: RepackOptions = {}): Pr
   const { compressionLevel = 6, updateModifiedDate = true, modifiedBy } = options;
   const exportDocument = doc;
 
+  // Compact collab-partitioned comment/revision IDs down to Word's signed-int32
+  // range before serializing any part. No-op for normal single-editor saves.
+  remapOversizedRevisionIds(exportDocument);
+
   // Load the original ZIP
   const originalZip = await JSZip.loadAsync(doc.originalBuffer);
 
@@ -205,6 +210,8 @@ export async function repackDocxFromRaw(
   const { compressionLevel = 6, updateModifiedDate = true, modifiedBy } = options;
   const exportDocument = doc;
 
+  remapOversizedRevisionIds(exportDocument);
+
   // Create a new ZIP with all original files
   const newZip = new JSZip();
 
diff --git a/packages/core/src/docx/selectiveSave.ts b/packages/core/src/docx/selectiveSave.ts
index 949edc33..1acf6b03 100644
--- a/packages/core/src/docx/selectiveSave.ts
+++ b/packages/core/src/docx/selectiveSave.ts
@@ -10,6 +10,7 @@
 
 import type { Document, BlockContent } from '../types/document';
 import { serializeDocument } from './serializer/documentSerializer';
+import { remapOversizedRevisionIds } from './serializer/decimalIdRemap';
 import {
   serializeCommentsWithInfo,
   serializeCommentsExtended,
@@ -106,6 +107,10 @@ export async function attemptSelectiveSave(
   const content = doc.package.document.content;
   if (hasNewImagesOrHyperlinks(content)) return null;
 
+  // Collab-partitioned comment/revision IDs need a document-wide renumber that
+  // touches commentRange markers in unchanged paragraphs — full repack only.
+  if (remapOversizedRevisionIds(doc)) return null;
+
   const comments = doc.package.document.comments;
   const hasComments = comments && comments.length > 0;
   const headerFooterUpdates = collectHeaderFooterUpdates(doc);
diff --git a/packages/core/src/docx/serializer/decimalIdRemap.test.ts b/packages/core/src/docx/serializer/decimalIdRemap.test.ts
new file mode 100644
index 00000000..d7e4a723
--- /dev/null
+++ b/packages/core/src/docx/serializer/decimalIdRemap.test.ts
@@ -0,0 +1,158 @@
+import { describe, test, expect } from 'bun:test';
+import JSZip from 'jszip';
+import type { Document, Comment, Paragraph } from '../../types/document';
+import { remapOversizedRevisionIds, MAX_DECIMAL_ID } from './decimalIdRemap';
+import { serializeDocument } from './documentSerializer';
+import { serializeComments } from './commentSerializer';
+import { createDocx } from '../rezip';
+
+// Typical collab ID: ydoc.clientID (~4e9) * 1_000_000 — far above signed int32.
+const HUGE_A = 4_000_000_001_000_001;
+const HUGE_B = 1_234_567_001_000_001;
+
+function para(content: Paragraph['content']): Paragraph {
+  return { type: 'paragraph', content };
+}
+
+function run(text: string) {
+  return { type: 'run' as const, content: [{ type: 'text' as const, text }] };
+}
+
+function makeDoc(commentId: number, revisionId: number, replyId?: number): Document {
+  const comments: Comment[] = [
+    { id: commentId, author: 'A', content: [para([run('c')])] },
+    ...(replyId != null
+      ? [{ id: replyId, author: 'B', parentId: commentId, content: [para([run('r')])] }]
+      : []),
+  ];
+  return {
+    package: {
+      document: {
+        content: [
+          para([
+            { type: 'commentRangeStart', id: commentId },
+            run('hello '),
+            {
+              type: 'insertion',
+              info: { id: revisionId, author: 'A', date: '2025-01-01T00:00:00Z' },
+              content: [run('world')],
+            },
+            { type: 'commentRangeEnd', id: commentId },
+          ]),
+        ],
+        comments,
+      },
+    },
+  };
+}
+
+/** Every `w:id="N"` in `xml` as a number. */
+function emittedWIds(xml: string): number[] {
+  return [...xml.matchAll(/w:id="(\d+)"/g)].map((m) => Number(m[1]));
+}
+
+describe('remapOversizedRevisionIds', () => {
+  test('no-op when every ID already fits in signed int32', () => {
+    const doc = makeDoc(3, 7);
+    expect(remapOversizedRevisionIds(doc)).toBe(false);
+    expect(doc.package.document.comments![0].id).toBe(3);
+    const body = doc.package.document.content[0] as Paragraph;
+    expect((body.content[0] as { id: number }).id).toBe(3);
+  });
+
+  test('remaps to dense 1..N preserving relative order across comments + revisions', () => {
+    const doc = makeDoc(HUGE_A, HUGE_B, HUGE_A + 1);
+    expect(remapOversizedRevisionIds(doc)).toBe(true);
+
+    // Single shared sequence, sorted: HUGE_B → 1, HUGE_A → 2, HUGE_A+1 → 3.
+    const [comment, reply] = doc.package.document.comments!;
+    expect(comment.id).toBe(2);
+    expect(reply.id).toBe(3);
+    expect(reply.parentId).toBe(2);
+
+    const body = doc.package.document.content[0] as Paragraph;
+    expect(body.content[0]).toEqual({ type: 'commentRangeStart', id: 2 });
+    expect(body.content[3]).toEqual({ type: 'commentRangeEnd', id: 2 });
+    expect((body.content[2] as { info: { id: number } }).info.id).toBe(1);
+  });
+
+  test('idempotent — second call is a no-op', () => {
+    const doc = makeDoc(HUGE_A, HUGE_B);
+    expect(remapOversizedRevisionIds(doc)).toBe(true);
+    expect(remapOversizedRevisionIds(doc)).toBe(false);
+  });
+
+  test('descends into tables and pPr-level tracked changes', () => {
+    const doc: Document = {
+      package: {
+        document: {
+          content: [
+            {
+              type: 'table',
+              rows: [
+                {
+                  type: 'tableRow',
+                  structuralChange: {
+                    type: 'tableRowInsertion',
+                    info: { id: HUGE_A, author: 'A' },
+                  },
+                  cells: [
+                    {
+                      type: 'tableCell',
+                      content: [
+                        { ...para([run('x')]), pPrIns: { id: HUGE_B, author: 'A' } } as Paragraph,
+                      ],
+                    },
+                  ],
+                },
+              ],
+            },
+          ],
+        },
+      },
+    };
+    expect(remapOversizedRevisionIds(doc)).toBe(true);
+    for (const id of emittedWIds(serializeDocument(doc))) {
+      expect(id).toBeLessThanOrEqual(MAX_DECIMAL_ID);
+    }
+  });
+});
+
+describe('Word int32 w:id invariant (PR #956 review)', () => {
+  // This is the test that would have caught the original PR's bug without
+  // knowing the implementation: it asserts the SERIALIZED OUTPUT obeys Word's
+  // signed-int32 cap on every `w:id`, regardless of how big the in-memory IDs
+  // are. The schema says `ST_DecimalNumber` is unbounded `xsd:integer`; Word
+  // does not honour that, so neither can we.
+  test('createDocx never emits a w:id above 0x7FFFFFFF in document.xml or comments.xml', async () => {
+    const doc = makeDoc(HUGE_A, HUGE_B, HUGE_A + 1);
+    const buf = await createDocx(doc);
+    const zip = await JSZip.loadAsync(buf);
+
+    const documentXml = await zip.file('word/document.xml')!.async('text');
+    const commentsXml = await zip.file('word/comments.xml')!.async('text');
+
+    const all = [...emittedWIds(documentXml), ...emittedWIds(commentsXml)];
+    expect(all.length).toBeGreaterThan(0);
+    for (const id of all) {
+      expect(id).toBeLessThanOrEqual(MAX_DECIMAL_ID);
+    }
+
+    // Referential integrity survives the remap: the comment's range markers in
+    // document.xml still point at the comment's id in comments.xml.
+    const commentIdMatch = commentsXml.match(/<w:comment w:id="(\d+)"/);
+    expect(commentIdMatch).not.toBeNull();
+    expect(documentXml).toContain(`<w:commentRangeStart w:id="${commentIdMatch![1]}"`);
+  });
+
+  test('serializer-level invariant holds for the body alone', () => {
+    const doc = makeDoc(HUGE_A, HUGE_B);
+    remapOversizedRevisionIds(doc);
+    for (const id of emittedWIds(serializeDocument(doc))) {
+      expect(id).toBeLessThanOrEqual(MAX_DECIMAL_ID);
+    }
+    for (const id of emittedWIds(serializeComments(doc.package.document.comments!))) {
+      expect(id).toBeLessThanOrEqual(MAX_DECIMAL_ID);
+    }
+  });
+});
diff --git a/packages/core/src/docx/serializer/decimalIdRemap.ts b/packages/core/src/docx/serializer/decimalIdRemap.ts
new file mode 100644
index 00000000..ee0d1df9
--- /dev/null
+++ b/packages/core/src/docx/serializer/decimalIdRemap.ts
@@ -0,0 +1,176 @@
+/**
+ * Save-time renumbering of comment + tracked-change `w:id` values.
+ *
+ * The OOXML schema types these IDs as `ST_DecimalNumber` (unbounded
+ * `xsd:integer`), but Word reads them as **signed int32** — anything
+ * above `0x7FFFFFFF` triggers "unreadable content" / Document Recovery and
+ * the comment or revision is dropped. This is the decimal counterpart to
+ * the hex-id cap in `utils/hexId.ts`.
+ *
+ * Collaborative sessions partition the live ID space per peer (e.g.
+ * `ydoc.clientID * 1_000_000`, see `commentIdAllocator.ts`), which routinely
+ * lands IDs in the 10^12–10^15 range. That's fine in memory; on save we
+ * compact every comment + revision ID down to a dense `1..N` sequence so the
+ * file opens cleanly in Word. The remap is a no-op when every ID already fits,
+ * so single-editor saves stay byte-for-byte identical.
+ *
+ * Mutates the `Document` in place — same convention as `processNewImages`
+ * and the other rezip pre-passes.
+ */
+
+import type {
+  Document,
+  BlockContent,
+  Paragraph,
+  ParagraphContent,
+  Run,
+  Hyperlink,
+  SimpleField,
+  ComplexField,
+  InlineSdt,
+  Table,
+} from '../../types/document';
+
+/** Word's effective upper bound for `ST_DecimalNumber` `w:id` attributes. */
+export const MAX_DECIMAL_ID = 0x7fffffff;
+
+type IdVisitor = (id: number) => number;
+
+function visitRun(run: Run, fn: IdVisitor): void {
+  for (const pc of run.propertyChanges ?? []) pc.info.id = fn(pc.info.id);
+}
+
+function visitHyperlink(link: Hyperlink, fn: IdVisitor): void {
+  for (const child of link.children) {
+    if (child.type === 'run') visitRun(child, fn);
+  }
+}
+
+function visitField(field: SimpleField | ComplexField, fn: IdVisitor): void {
+  if (field.type === 'simpleField') {
+    for (const c of field.content) {
+      if (c.type === 'run') visitRun(c, fn);
+      else if (c.type === 'hyperlink') visitHyperlink(c, fn);
+    }
+  } else {
+    for (const r of field.fieldCode) visitRun(r, fn);
+    for (const r of field.fieldResult) visitRun(r, fn);
+  }
+}
+
+function visitInlineSdt(sdt: InlineSdt, fn: IdVisitor): void {
+  for (const item of sdt.content) {
+    if (item.type === 'run') visitRun(item, fn);
+    else if (item.type === 'hyperlink') visitHyperlink(item, fn);
+    else if (item.type === 'simpleField' || item.type === 'complexField') visitField(item, fn);
+    else if (item.type === 'inlineSdt') visitInlineSdt(item, fn);
+  }
+}
+
+function visitParagraphContent(item: ParagraphContent, fn: IdVisitor): void {
+  switch (item.type) {
+    case 'run':
+      return visitRun(item, fn);
+    case 'hyperlink':
+      return visitHyperlink(item, fn);
+    case 'simpleField':
+    case 'complexField':
+      return visitField(item, fn);
+    case 'inlineSdt':
+      return visitInlineSdt(item, fn);
+    case 'commentRangeStart':
+    case 'commentRangeEnd':
+      item.id = fn(item.id);
+      return;
+    case 'insertion':
+    case 'deletion':
+    case 'moveFrom':
+    case 'moveTo':
+      item.info.id = fn(item.info.id);
+      for (const c of item.content) {
+        if (c.type === 'run') visitRun(c, fn);
+        else if (c.type === 'hyperlink') visitHyperlink(c, fn);
+      }
+      return;
+    case 'moveFromRangeStart':
+    case 'moveFromRangeEnd':
+    case 'moveToRangeStart':
+    case 'moveToRangeEnd':
+      item.id = fn(item.id);
+      return;
+    default:
+      return;
+  }
+}
+
+function visitParagraph(p: Paragraph, fn: IdVisitor): void {
+  if (p.pPrIns) p.pPrIns.id = fn(p.pPrIns.id);
+  if (p.pPrDel) p.pPrDel.id = fn(p.pPrDel.id);
+  for (const pc of p.propertyChanges ?? []) pc.info.id = fn(pc.info.id);
+  for (const item of p.content) visitParagraphContent(item, fn);
+}
+
+function visitTable(t: Table, fn: IdVisitor): void {
+  for (const pc of t.propertyChanges ?? []) pc.info.id = fn(pc.info.id);
+  for (const row of t.rows) {
+    for (const pc of row.propertyChanges ?? []) pc.info.id = fn(pc.info.id);
+    if (row.structuralChange) row.structuralChange.info.id = fn(row.structuralChange.info.id);
+    for (const cell of row.cells) {
+      for (const pc of cell.propertyChanges ?? []) pc.info.id = fn(pc.info.id);
+      if (cell.structuralChange) cell.structuralChange.info.id = fn(cell.structuralChange.info.id);
+      visitBlocks(cell.content, fn);
+    }
+  }
+}
+
+function visitBlocks(blocks: BlockContent[], fn: IdVisitor): void {
+  for (const block of blocks) {
+    if (block.type === 'paragraph') visitParagraph(block, fn);
+    else if (block.type === 'table') visitTable(block, fn);
+    else if (block.type === 'blockSdt') visitBlocks(block.content, fn);
+  }
+}
+
+/** Visit every comment/revision `w:id` in the package via `fn` (write-back). */
+function visitDocumentIds(doc: Document, fn: IdVisitor): void {
+  const pkg = doc.package;
+  visitBlocks(pkg.document.content, fn);
+  for (const c of pkg.document.comments ?? []) {
+    c.id = fn(c.id);
+    if (c.parentId != null) c.parentId = fn(c.parentId);
+  }
+  for (const hf of pkg.headers?.values() ?? []) visitBlocks(hf.content, fn);
+  for (const hf of pkg.footers?.values() ?? []) visitBlocks(hf.content, fn);
+  for (const n of pkg.footnotes ?? []) visitBlocks(n.content, fn);
+  for (const n of pkg.endnotes ?? []) visitBlocks(n.content, fn);
+  for (const n of pkg.footnoteSeparators ?? []) visitBlocks(n.content, fn);
+  for (const n of pkg.endnoteSeparators ?? []) visitBlocks(n.content, fn);
+}
+
+/**
+ * If any comment or tracked-change ID in `doc` exceeds Word's signed-int32
+ * limit, remap **all** of them to a dense `1..N` sequence (single sequence
+ * shared across comments and revisions, since the editor uses one ID space for
+ * both). Returns `true` when a remap was applied.
+ *
+ * No-op (returns `false`) when every ID already fits, so non-collab saves are
+ * untouched.
+ */
+export function remapOversizedRevisionIds(doc: Document): boolean {
+  const ids = new Set<number>();
+  visitDocumentIds(doc, (id) => {
+    ids.add(id);
+    return id;
+  });
+
+  let max = 0;
+  for (const id of ids) if (id > max) max = id;
+  if (max <= MAX_DECIMAL_ID) return false;
+
+  const sorted = [...ids].sort((a, b) => a - b);
+  const remap = new Map<number, number>();
+  for (let i = 0; i < sorted.length; i++) remap.set(sorted[i], i + 1);
+
+  visitDocumentIds(doc, (id) => remap.get(id) ?? Math.min(id, MAX_DECIMAL_ID));
+  return true;
+}
diff --git a/packages/core/src/prosemirror/commentIdAllocator.ts b/packages/core/src/prosemirror/commentIdAllocator.ts
index 5458a507..a2f737b1 100644
--- a/packages/core/src/prosemirror/commentIdAllocator.ts
+++ b/packages/core/src/prosemirror/commentIdAllocator.ts
@@ -37,9 +37,11 @@ export interface CommentIdAllocator {
  * @param base - Offset for the first allocated ID (`base + 1`). Pair with
  * `stride` to partition the ID space across collaborating peers so concurrent
  * allocations never collide — e.g. `ydoc.clientID * 1_000_000` with
- * `stride = 1_000_000`. Yjs `clientID` is a uint32, so the product stays well
- * under `Number.MAX_SAFE_INTEGER` (OOXML `w:id` is `xsd:integer`, no upper
- * bound). Defaults to `0`, preserving the original `1, 2, 3, …` sequence.
+ * `stride = 1_000_000`. Defaults to `0`, preserving the original `1, 2, 3, …`
+ * sequence. These IDs are **in-memory only**: the OOXML schema types `w:id` as
+ * unbounded `xsd:integer`, but Word reads it as signed int32, so the serializer
+ * compacts to `1..N` on save when any ID exceeds `0x7FFFFFFF`
+ * (`docx/serializer/decimalIdRemap.ts`).
  * @param stride - Width of this allocator's partition. `seedAbove` ignores IDs
  * outside `(base, base + stride]` so a peer's synced revision marks can't pull
  * this allocator into their partition. Defaults to `Infinity` (single editor:
@@ -48,15 +50,30 @@ export interface CommentIdAllocator {
  * Collision-freedom across peers holds under two assumptions: (1) peers pass
  * distinct `base` values — with `clientID * stride` this means distinct Yjs
  * `clientID`s, which Yjs already relies on; and (2) a peer mints fewer than
- * `stride` IDs per session, since `next()` is unbounded and would climb past
+ * `stride` IDs per session. `next()` warns (dev only) if it climbs past
  * `base + stride` into the neighbouring partition. Both hold comfortably for
  * realistic sessions (`stride = 1_000_000`).
  */
 export function createCommentIdAllocator(base = 0, stride = Infinity): CommentIdAllocator {
   let nextId = base + 1;
   const ceiling = base + stride;
+  let warned = false;
   return {
-    next: () => nextId++,
+    next: () => {
+      const id = nextId++;
+      if (
+        !warned &&
+        id > ceiling &&
+        typeof process !== 'undefined' &&
+        process.env?.NODE_ENV !== 'production'
+      ) {
+        warned = true;
+        console.warn(
+          `[docx-editor] commentIdAllocator overflowed its partition (base=${base}, stride=${stride}).`
+        );
+      }
+      return id;
+    },
     seedAbove(maxId: number) {
       if (maxId >= nextId && maxId <= ceiling) nextId = maxId + 1;
     },

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Comment IDs can collide between collaborating peers (Comment.id is a module-level scalar)

2 participants