Skip to content

Release v0.9.3#158

Merged
titusz merged 17 commits into
mainfrom
v0.9.3
Jun 4, 2026
Merged

Release v0.9.3#158
titusz merged 17 commits into
mainfrom
v0.9.3

Conversation

@titusz

@titusz titusz commented Jun 1, 2026

Copy link
Copy Markdown
Member

Summary

Release branch for v0.9.3, collecting EPUB cover/thumbnail robustness fixes, code_iscc_mt() parallelism improvements, and dependency updates (including the iscc-schema>=0.7.0 bump).

Changes

  • Added SVG cover image support for EPUB thumbnails (rasterized via resvg)
  • Added IsccThumbExtractionError for recoverable thumbnail extraction failures
  • Changed code_iscc() to handle thumbnail extraction failures gracefully (logs warning, continues without thumbnail instead of raising) and to generate thumbnails early, before heavy content processing
  • Removed EPUB cover fallback to first manifest image (only explicit cover references are used)
  • Fixed EPUB3 cover-image detection for manifests with multiple space-separated property tokens
  • Fixed EPUB cover extraction for CP437→UTF-8 mojibake filenames and OPF hrefs with ../. segments
  • Fixed PNG cover thumbnail extraction failing on Photoshop-exported covers with large zTXt metadata chunks
  • Wrapped iscc-tika parse failures as IsccExtractionError in text_extract / text_meta_extract
  • Refactored code_iscc_mt() for improved parallelism; verified output matches code_iscc()
  • Removed redundant onnxruntime from sci/sct optional dependency groups
  • Updated iscc-schema floor to >=0.7.0 (version-pinned @context/$schema URLs now resolve to 0.7.0)
  • CI: skip semantic code tests on macOS Python 3.12 (no onnxruntime 1.26.0 wheels)

Testing

uv run poe all passes locally (lint, build-docs, tests @ 100% coverage). CI green across Python 3.11–3.14 on Windows, Ubuntu, and macOS.

Note

pyproject.toml version is still 0.9.2 and the changelog entry reads 0.9.3 - Unreleased — version bump / release-date finalization not yet done.

titusz added 14 commits April 29, 2026 13:16
Some EPUBs store UTF-8 filename bytes without setting the ZIP UTF-8
flag (bit 11), causing Python's zipfile to decode entries as CP437
mojibake and fail cover lookup. Recover such names by re-encoding
CP437→UTF-8 in a new _resolve_archive_path helper.

Also normalize cover paths with posixpath.normpath so OPF hrefs that
use '..'/'.' segments (relative to a nested OPF) resolve to the actual
archive entry — restoring behavior the previous endswith-basename
fallback masked.

Removes the # pragma: no cover on epub_cover and adds tests for every
branch (cover-detection methods, error paths, both new fixes).
Photoshop-exported PNGs commonly carry zTXt chunks (e.g. tiff:37724
ImageSourceData) that decompress past PIL's 1 MB MAX_TEXT_CHUNK guard,
causing thumbnail extraction to fail for valid EPUB covers. Raise the
limit to 4 MB.
Tika parse failures (e.g. TIKA-237 SAXException on EPUBs with deeply
nested XHTML) surfaced as bare TypeError, breaking the IsccExtractionError
contract that callers rely on. text_extract and text_meta_extract now
catch the TypeError and re-raise as IsccExtractionError with the original
Tika message preserved. Upstream tracked in iscc/iscc-tika#7.
zuban 0.7.1's default "typed" mode flags every io.BytesIO(...) call as
"Cannot instantiate abstract class" because typeshed declares BytesIO's
parents (BinaryIO/IOBase via Generic[AnyStr]) with abstract methods.
mypy mode does not flag these. Disable the abstract error code globally
since we don't define abstract classes ourselves.
Prevents _resolve_archive_path from matching an unrelated UTF-8 entry
whose correctly decoded name happens to collide with the CP437
re-encoding of the target path.
- Support SVG cover images in EPUBs by rasterizing via resvg
- Add IsccThumbExtractionError for recoverable thumbnail failures
- Generate thumbnails early in code_iscc() and continue without one on
  failure instead of raising
- Fix EPUB3 cover-image detection for multi-token properties attributes
- Remove fallback to first manifest image (only explicit cover refs)
- Rename _resolve_archive_path to resolve_archive_path (public API)
- Improve API docstrings for code_iscc, code_iscc_mt, code_content,
  code_text options
- Update dependencies (pydantic, onnxruntime, huggingface-hub, etc.)
…untime dep

Refactored code_iscc_mt() for better parallelism by extracting text
upfront, overlapping thumbnail generation with sum/meta computation,
and aligning result merge order with code_iscc(). Removed redundant
onnxruntime from sci/sct optional dependency groups since it is already
a transitive dependency. Skipped semantic tests on macOS Python 3.12
where onnxruntime 1.26.0 lacks wheels.
Add a parametrized equivalence test covering all processing modes (image,
audio, video, text via PDF and DOCX). Asserts the multithreaded
code_iscc_mt produces output identical to the single-threaded code_iscc,
with add_units and granular enabled to guard ISCC-UNIT and feature
ordering through the parallelism refactor.
iscc-schema 0.7.0 version-pins the embedded @context/$schema URLs, which
now resolve to 0.7.0. Update test assertions and changelog accordingly.
@codecov

codecov Bot commented Jun 1, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.73%. Comparing base (1ab12e7) to head (1ef0c71).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #158      +/-   ##
==========================================
+ Coverage   99.72%   99.73%   +0.01%     
==========================================
  Files          23       23              
  Lines        1809     1894      +85     
==========================================
+ Hits         1804     1889      +85     
  Misses          5        5              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

titusz added 3 commits June 4, 2026 12:31
Hard-coded 0.7.0 in the @context/$schema URL assertions broke when
iscc-schema bumped to 0.8.0. Derive the version from
iscc_schema.__version__ so these tests stay green across schema
releases while still verifying serialization structure.
Finalize the v0.9.3 release:
- Bump version 0.9.2 -> 0.9.3 (pyproject, uv.lock, version test)
- Date the changelog heading and correct the stale iscc-schema floor
  bullet (>=0.7.0 -> >=0.8.0)

Also harden thumbnail error handling surfaced in the release review:
code_iscc()/code_iscc_mt() now re-raise fatal IsccExtractionError
(corrupt/invalid source files) from the optional thumbnail step instead
of swallowing it, while missing-cover and other thumbnailer errors stay
recoverable. epub_cover() raises the recoverable IsccThumbExtractionError
when a declared cover is absent from the archive.
@titusz titusz merged commit d7447d7 into main Jun 4, 2026
33 of 34 checks passed
@titusz titusz deleted the v0.9.3 branch June 4, 2026 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant