Skip to content

fix(core): harden clipboard HTML paste against XSS and ReDoS#960

Open
jedrazb wants to merge 1 commit into
mainfrom
fix/codeql-clipboard-html-hardening
Open

fix(core): harden clipboard HTML paste against XSS and ReDoS#960
jedrazb wants to merge 1 commit into
mainfrom
fix/codeql-clipboard-html-hardening

Conversation

@jedrazb

@jedrazb jedrazb commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Closes three high-severity CodeQL alerts in packages/core/src/utils/clipboard.ts, all in the pasted-HTML trust boundary:

  • js/xss — pasted clipboard HTML was assigned to innerHTML. It is now sanitized with DOMPurify (scripts, event handlers, javascript: URLs and dangerous tags stripped) and parsed into an inert document, then walked only for text and formatting. Nothing is inserted into the live DOM.
  • js/incomplete-multi-character-sanitization — the <!--…--> regex could leave a stray <!-- behind.
  • js/polynomial-redos — the Word conditional-comment regex backtracked polynomially on hostile input.

Comment stripping is now a single linear scan (stripHtmlComments) that handles downlevel conditional comments, drops unterminated comments through end-of-string, and cannot backtrack.

Adds dompurify as a dependency of @eigenpal/docx-editor-core. Added clipboard-html.test.ts covering comment stripping, the sanitize+inert-parse behavior (an img onerror payload stays inert), and a ReDoS guard that must finish near-instantly on adversarial input.

🤖 Generated with Claude Code

@vercel

vercel Bot commented Jun 20, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docx-editor Ready Ready Preview, Comment Jun 20, 2026 9:14pm

Request Review

@eigenpal-release-pal

eigenpal-release-pal Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

All contributors have signed the CLA ✍️ ✅

Posted by the CLA bot.

Comment thread packages/core/src/utils/clipboard.ts Fixed
@greptile-apps

greptile-apps Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR hardens the clipboard HTML paste boundary in packages/core/src/utils/clipboard.ts against three CodeQL-flagged vulnerabilities: XSS via innerHTML, an incomplete multi-character sanitization that could leave a stray <!--, and a polynomial-backtracking regex on comment stripping.

  • XSS fix: htmlToRuns now sanitizes pasted HTML through DOMPurify before parsing it into an inert DOMParser document, so nothing is ever written to innerHTML on the live DOM.
  • Comment stripping: both regexes in cleanWordHtml are replaced by a single linear-scan function (stripHtmlComments) that handles downlevel conditional comments and drops unterminated openers through end-of-string.
  • Test coverage: a new clipboard-html.test.ts validates comment stripping, the onerror payload inertness, and a ReDoS timing guard for the comment-scan path; the <o:>/<w:> lazy-regex paths in cleanWordHtml remain unfixed and are not covered by the timing test.

Confidence Score: 4/5

Safe to merge for the three addressed vulnerabilities; the <o:>/<w:> lazy-regex paths in cleanWordHtml still expose polynomial backtracking on hostile Word HTML and were flagged in the previous review round without being resolved in this iteration.

The XSS and comment-stripping fixes are correct and well-tested. The remaining concern is the two [\s\S]*? patterns for Office namespace tags in cleanWordHtml; they are reachable via the Word HTML code path and were identified in the prior review cycle but are unchanged here. Until those patterns are linearized, a motivated attacker who can get the editor to paste from a crafted Word document can trigger CPU-bound denial of service.

The <o:> and <w:> removal regexes in cleanWordHtml (lines 470–475 of packages/core/src/utils/clipboard.ts) are the remaining risk area.

Important Files Changed

Filename Overview
packages/core/src/utils/clipboard.ts Adds DOMPurify sanitization before DOMParser inert-document parsing (XSS fix), replaces comment-stripping regexes with a linear scan (ReDoS + stray-opener fix); two lazy [\s\S]*? patterns in cleanWordHtml for <o:> and <w:> tags remain unfixed and still allow polynomial backtracking on hostile input.
packages/core/src/utils/tests/clipboard-html.test.ts New test suite covering comment stripping, XSS inertness, and ReDoS guard for the comment-scan path; o:/w: tag ReDoS path is not covered by the timing guard.
packages/core/package.json Adds dompurify ^3.2.0 as a runtime dependency of @eigenpal/docx-editor-core.
.changeset/clipboard-html-hardening.md Hand-written changeset file; CLAUDE.md explicitly prohibits this and requires bun changeset to generate it (already flagged in a previous review comment).
bun.lock Lock file updated to add dompurify 3.4.10 and @types/trusted-types 2.0.7; all workspace package versions bump from 1.5.0 to 1.8.3, consistent with the fixed-group changeset workflow.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Paste event / ClipboardEvent] --> B{isWordHtml?}
    B -- yes --> C[cleanWordHtml\nstripHtmlComments LINEAR SCAN\n+ o:/w: regex strip\n+ mso- style strip]
    B -- no --> D{isEditorHtml?}
    D -- yes --> E{data-docx-editor-content\nattribute present?}
    E -- yes --> F[JSON.parse runs\n← returned directly]
    E -- no --> G[htmlToRuns]
    C --> G
    D -- no --> G
    G --> H[DOMPurify.sanitize\nstrips scripts / event handlers\njavascript: URLs / dangerous tags]
    H --> I[DOMParser.parseFromString\ninert document — no resource fetch\nno script execution]
    I --> J[processNode walk\ntext + formatting only]
    J --> K[Run array returned]

    style C fill:#fffbe6,stroke:#f0ad4e
    style H fill:#e6ffe6,stroke:#28a745
    style I fill:#e6ffe6,stroke:#28a745
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[Paste event / ClipboardEvent] --> B{isWordHtml?}
    B -- yes --> C[cleanWordHtml\nstripHtmlComments LINEAR SCAN\n+ o:/w: regex strip\n+ mso- style strip]
    B -- no --> D{isEditorHtml?}
    D -- yes --> E{data-docx-editor-content\nattribute present?}
    E -- yes --> F[JSON.parse runs\n← returned directly]
    E -- no --> G[htmlToRuns]
    C --> G
    D -- no --> G
    G --> H[DOMPurify.sanitize\nstrips scripts / event handlers\njavascript: URLs / dangerous tags]
    H --> I[DOMParser.parseFromString\ninert document — no resource fetch\nno script execution]
    I --> J[processNode walk\ntext + formatting only]
    J --> K[Run array returned]

    style C fill:#fffbe6,stroke:#f0ad4e
    style H fill:#e6ffe6,stroke:#28a745
    style I fill:#e6ffe6,stroke:#28a745
Loading

Reviews (3): Last reviewed commit: "fix(core): harden clipboard HTML paste a..." | Re-trigger Greptile

Comment thread .changeset/clipboard-html-hardening.md Outdated
Comment on lines +1 to +5
---
'@eigenpal/docx-editor-core': patch
---

Harden clipboard HTML paste against script injection and slow-input denial of service. Pasted HTML is now parsed through an inert `DOMParser` document instead of being assigned to `innerHTML`, so embedded markup cannot run, and Word comment stripping uses a single linear scan that cannot backtrack on hostile input or leave a stray comment opener behind.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hand-written changeset file

CLAUDE.md explicitly prohibits hand-writing .changeset/*.md files: "Generate the changeset with bun changeset — never hand-write the .changeset/*.md file. The interactive prompt picks the correct package name and bump and writes the right frontmatter. Hand-writing risks a wrong/typo'd package name, which crashes the post-merge Release workflow." This file should be regenerated with bun changeset in a terminal.

Context Used: CLAUDE.md (source)

Sanitize pasted clipboard HTML with DOMPurify and parse it into an
inert document instead of assigning it to innerHTML, so embedded
scripts, event handlers, and javascript: URLs cannot run. Replace the
regex-based Word comment stripping with a single linear scan that
cannot backtrack polynomially on hostile input and never leaves a
stray comment opener behind.

Resolves CodeQL js/xss, js/incomplete-multi-character-sanitization,
and js/polynomial-redos in packages/core/src/utils/clipboard.ts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants