Releases: AryanBV/pdf-edit-engine
Releases · AryanBV/pdf-edit-engine
v0.1.1 — ARY-276 + ARY-278 bugfix release
Bugfix release — fixes three classes of CIDFont/Identity-H text-replacement failures discovered on real-world Chrome and Word PDFs. Fully backwards-compatible: public API unchanged.
[0.1.1] — 2026-04-15
Fixed
- ARY-276: Identity-H CIDFont replacement on large-font titles with per-glyph
Tm+Tjemission (Word and Chrome generators) no longer garbles spacing. The operator merge logic now has an all-narrow anchor fallback that collapses chains of narrowTm+Tjoperators into a single anchor, so replacement text flows past the original operator boundaries as the PDF spec allows (surgeon.pyF0 fallback, commitf2b4aad). - ARY-278: Narrow Identity-H subsets (e.g., Chrome's 179-glyph ArialMT) now extend via in-place glyph injection. Missing glyphs are appended to the embedded font at fresh GIDs, preserving every pre-existing CID→GID mapping. The previous Tier 2 subset-and-replace approach renumbered CIDs and corrupted unrelated content-stream text (the
1ova ,ndustriesMode 2 symptom) — replaced entirely (fonts.py_extend_tier2, commits4c262d4..77d3912). - Cross-font resolver pollution in
replace_all:_apply_single_replacementnow always re-fetches the resolver frommatch.characters[0].font_name, discarding any stale resolver passed in by the caller. Previously,replace_all's per-page loop reused one pre-fetched resolver across every match on the page. When matches used different fonts, the stale resolver validatedcan_encodeagainst the wrong font, extension was skipped, and content-stream operators were encoded with the wrong font's CIDs. Symptom on real Chrome PDFs with multiple Identity-H fonts per page:"ova ndustries"extraction because the emitted CIDs only mapped to N/I in the other font's ToUnicode CMap. Pre-existing bug, surfaced during 0.1.1 real-PDF validation. FontResolverCache: now evicts by font-dict object generation number, so pages that share a font via indirect reference are invalidated together after font mutation (encoding.py, commit8acbd49)./Wand/ToUnicodededup entries on repeatextend_subsetcalls to prevent bloat (fonts.py, commit60a1697).- mypy strict: resolved 15 pre-existing strict-mode errors in
structural.pyandreflow.py. The CI mypy step is now blocking (previously had|| true).
Verified
- Tested against real-world Chrome (Skia/PDF m147) and Microsoft Word PDFs that reproduced the original ARY-276 garble. Both round-trip cleanly with no Mode-1 or Mode-2 garble tokens in extracted text and no silent font substitutions.
- 636 tests passing (up from 628), mypy strict clean on all 16 source files, ruff clean.
Known scope limits
- CFF / Type1 embedded fonts still raise
FontNotFoundErrorwith a clear message when the engine needs to inject glyphs into them. Tier 1.5 handles TrueType only; CFF support is tracked in ARY-279 for 0.2.0.
v0.1.0
[0.1.0] — 2026-04-07
Initial release — format-preserving PDF text editing.
- Text search, replacement, and batch editing at the content stream operator level
- Two-tier font subset extension (CMap-only fast path + full re-embed)
- FidelityReport on every edit — programmatic quality verification
- 15 PDF wrapper operations (merge, split, rotate, encrypt, etc.)
- Paragraph detection and greedy line-breaking reflow
- 628 tests, 85% coverage
- Zero external binaries, zero API keys, zero network calls