Spun out of the senior-dev review of #1.
Background
letter_set.v1 currently makes source.scan_url optional. Variants are anchored by source.scan_entry_id, which resolves against the upstream entries.jsonl index in HeOCR/public-domain-hand-written-hebrew-scans.
That works for downstream consumers that re-resolve the upstream index, but it leaves a gap for consumers that want to fetch the source image directly without re-resolving.
Decision needed
Either:
- Keep optional, document the dereference path. Add a note in
docs/letter_set_v1.md saying consumers must dereference via the upstream index pinned in the document's upstream.repo / upstream.revision, and that source.scan_url is a non-authoritative convenience copy.
- Make required. Then we need to also decide whether
scan_url must point at the canonical upstream URL or whether mirrors are allowed, and whether the URL is stable across upstream revisions.
Scope
- Update the schema (
src/hletterscriptgen/schemas/letter_set.schema.json) if option 2.
- Update
docs/letter_set_v1.md and docs/upstream_integration.md either way.
- Add tests under
tests/test_validation.py if shape changes.
Impact
This is a contract surface. Resolve before any consumer outside the HeOCR org takes a dependency on letter_set.v1.
Spun out of the senior-dev review of #1.
Background
letter_set.v1currently makessource.scan_urloptional. Variants are anchored bysource.scan_entry_id, which resolves against the upstreamentries.jsonlindex inHeOCR/public-domain-hand-written-hebrew-scans.That works for downstream consumers that re-resolve the upstream index, but it leaves a gap for consumers that want to fetch the source image directly without re-resolving.
Decision needed
Either:
docs/letter_set_v1.mdsaying consumers must dereference via the upstream index pinned in the document'supstream.repo/upstream.revision, and thatsource.scan_urlis a non-authoritative convenience copy.scan_urlmust point at the canonical upstream URL or whether mirrors are allowed, and whether the URL is stable across upstream revisions.Scope
src/hletterscriptgen/schemas/letter_set.schema.json) if option 2.docs/letter_set_v1.mdanddocs/upstream_integration.mdeither way.tests/test_validation.pyif shape changes.Impact
This is a contract surface. Resolve before any consumer outside the HeOCR org takes a dependency on
letter_set.v1.