Parse and render first-page headers with VML/EMF images by samcorcos · Pull Request #1 · USTreasury/docx-editor

samcorcos · 2026-06-21T00:44:59Z

Summary

Word header logos and OLE preview pictures are commonly EMF metafiles, which browsers can't decode — they rendered as a broken <img> box. This PR extracts the PNG/JPEG bitmap that's almost always embedded inside the metafile and uses it as the display URL, so the logo just shows up. Original EMF bytes stay on MediaFile.data so round-trip is unaffected.

Approach shipped: embedded-raster extraction (not full EMF rasterization). For the residual case — a vector-only metafile with no embedded bitmap — there's a new parseDocx(buf, { mediaResolver }) hook so a host can rasterize server-side and hand back a data:/blob: URL.

Also in this PR:

w:smartTag is a transparent wrapper; recurse into it instead of dropping its runs (the Treasury template lost "WASHINGTON" without this).
Unedited headers/footers re-emit their original XML on save (HeaderFooter.verbatimXml), so w:object/OLE/VML the model can't fully represent round-trips byte-identically. The HF inline editor and every model-mutation site clear it on first edit.

Visual verification

Treasury information-memo template, page 1:

Before	After

Header close-up (after):

Tests

packages/core/src/docx/__tests__/header-vml-emf.test.ts — extractMetafileRaster on a real EMF; parseDocx populates headers, image src is data:image/png, smartTag runs survive, mediaResolver overrides; round-trip leaves header1.xml / footer1.xml / image1.emf byte-identical; default vs first+titlePg both populate.
e2e/tests/header-vml-emf.spec.ts — seal <img> decodes (naturalWidth > 0) inside .layout-page-header on page 1; body text starts below the header band; clicking body still places the caret; header renders under externalContent mode.
1785/1785 unit tests pass; targeted header/footer/image e2e suite green (3 pre-existing failures on main confirmed unrelated).

Notes for upstream

Needs a bun changeset (minor — additive public API: MediaResolver, extractMetafileRaster, HeaderFooter.verbatimXml). Left for the upstream PR per the changeset workflow.

Word header logos and OLE preview pictures are commonly EMF metafiles that browsers can't decode, so they rendered as broken `<img>` boxes. The metafile almost always wraps a single PNG/JPEG bitmap record; extract it at media-load time and use that as the display URL while keeping the original bytes for round-trip. Adds an optional `mediaResolver` parse hook for hosts that want to rasterize the vector-only residual server-side. Also: recurse into `w:smartTag` (its runs were silently dropped), and capture each header/footer's original XML so unedited parts re-emit byte-identically on save instead of being rebuilt from the model.

- clear HeaderFooter.verbatimXml in the React/Vue overlay save paths (useHeaderFooterEditing / usePagesPointer) so edits aren't lost on export - mediaResolver: resolve files in parallel and swallow per-file errors so one bad conversion doesn't abort the whole parse - reuse mediaToDataUrl from unzip instead of a local duplicate - metafileRaster: drop the brittle GIF extractor; scan JPEG for the last EOI so an EXIF thumbnail doesn't truncate the outer image

samcorcos added 2 commits June 20, 2026 20:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse and render first-page headers with VML/EMF images#1

Parse and render first-page headers with VML/EMF images#1
samcorcos wants to merge 2 commits into
mainfrom
feat/open-headers

samcorcos commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

samcorcos commented Jun 21, 2026

Summary

Visual verification

Tests

Notes for upstream

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant