Parse and render first-page headers with VML/EMF images#1
Open
samcorcos wants to merge 2 commits into
Open
Conversation
Word header logos and OLE preview pictures are commonly EMF metafiles that browsers can't decode, so they rendered as broken `<img>` boxes. The metafile almost always wraps a single PNG/JPEG bitmap record; extract it at media-load time and use that as the display URL while keeping the original bytes for round-trip. Adds an optional `mediaResolver` parse hook for hosts that want to rasterize the vector-only residual server-side. Also: recurse into `w:smartTag` (its runs were silently dropped), and capture each header/footer's original XML so unedited parts re-emit byte-identically on save instead of being rebuilt from the model.
- clear HeaderFooter.verbatimXml in the React/Vue overlay save paths (useHeaderFooterEditing / usePagesPointer) so edits aren't lost on export - mediaResolver: resolve files in parallel and swallow per-file errors so one bad conversion doesn't abort the whole parse - reuse mediaToDataUrl from unzip instead of a local duplicate - metafileRaster: drop the brittle GIF extractor; scan JPEG for the last EOI so an EXIF thumbnail doesn't truncate the outer image
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Word header logos and OLE preview pictures are commonly EMF metafiles, which browsers can't decode — they rendered as a broken
<img>box. This PR extracts the PNG/JPEG bitmap that's almost always embedded inside the metafile and uses it as the display URL, so the logo just shows up. Original EMF bytes stay onMediaFile.dataso round-trip is unaffected.Approach shipped: embedded-raster extraction (not full EMF rasterization). For the residual case — a vector-only metafile with no embedded bitmap — there's a new
parseDocx(buf, { mediaResolver })hook so a host can rasterize server-side and hand back adata:/blob:URL.Also in this PR:
w:smartTagis a transparent wrapper; recurse into it instead of dropping its runs (the Treasury template lost "WASHINGTON" without this).HeaderFooter.verbatimXml), sow:object/OLE/VML the model can't fully represent round-trips byte-identically. The HF inline editor and every model-mutation site clear it on first edit.Visual verification
Treasury information-memo template, page 1:
Header close-up (after):
Tests
packages/core/src/docx/__tests__/header-vml-emf.test.ts—extractMetafileRasteron a real EMF;parseDocxpopulatesheaders, image src isdata:image/png, smartTag runs survive,mediaResolveroverrides; round-trip leavesheader1.xml/footer1.xml/image1.emfbyte-identical; default vs first+titlePg both populate.e2e/tests/header-vml-emf.spec.ts— seal<img>decodes (naturalWidth > 0) inside.layout-page-headeron page 1; body text starts below the header band; clicking body still places the caret; header renders underexternalContentmode.mainconfirmed unrelated).Notes for upstream
Needs a
bun changeset(minor — additive public API:MediaResolver,extractMetafileRaster,HeaderFooter.verbatimXml). Left for the upstream PR per the changeset workflow.