hletterscriptgen

Generator framework for per-writer Hebrew handwritten letter-glyph image sets. It turns rights-clean HASH scan records plus human-reviewed glyph annotations into deterministic letter_set.v1 outputs for HeOCR/hletterscript.

At a Glance

Field	Value
Role	Generate and validate per-writer Hebrew letter-set outputs
Input	HASH scan metadata, scan image files, and generation profiles
Output	`letter_set.v1` JSON plus cropped glyph PNG assets
Public contract	`docs/letter_set_v1.md`
Example fixture	`examples/letter_set/writer_example.json`
Main CLI	`hletterscriptgen`
Code license	MIT
Generated glyph rights	Per-variant rights inherited from upstream scans

What This Repository Owns

This repository contains the Python package, CLI, schema, and validation contracts for creating writer-level Hebrew letter sets. It does not host the published glyph dataset itself; generated and curated data belongs in HeOCR/hletterscript.

What lives here:

hletterscriptgen, the Python package and command-line interface.
The letter_set.v1 JSON Schema and fixture example.
Generation-profile parsing, upstream eligibility checks, glyph extraction helpers, checksum calculation, and output validation.
CI, release workflow, and rights-carryover policy.

What lives elsewhere:

Page-scan ingestion and rights curation: HeOCR/hash.
Published letter-glyph image sets: HeOCR/hletterscript.
Synthetic document composition: HeOCR/hocrsyngen.
Dataset orchestration and release assembly: HeOCR/hocrgen, HeOCR/HeOCR, and HeOCR/HeOCRsynth.

Pipeline Position

flowchart LR
    HASH["HeOCR/hash<br/>rights-clean scans"] --> PROFILE["generation profile<br/>writer + glyph bboxes"]
    PROFILE --> GEN["hletterscriptgen<br/>crop, hash, dedupe, validate"]
    GEN --> DATA["HeOCR/hletterscript<br/>letter_set.v1 + glyph PNGs"]
    DATA --> SYN["HeOCR/hocrsyngen<br/>synthetic pages"]
    SYN --> OCR["HeOCR / HeOCRsynth<br/>OCR and HTR datasets"]

See docs/repository_scope.md for the full ecosystem boundary map and per-repository responsibilities.

Install

python -m pip install -e ".[test]"

For development:

python -m pip install -e ".[dev]"

Requires Python 3.11 or newer.

CLI

hletterscriptgen version
hletterscriptgen schema --format json
hletterscriptgen validate examples/letter_set/writer_example.json
hletterscriptgen check-eligible path/to/hash/data/index/entries.jsonl
hletterscriptgen scan-blobs path/to/scan.png --format json
hletterscriptgen generate --profile generate_profile.json --output ./out

The generate command expects a human-curated generation profile that names writer IDs, upstream scan entries, and glyph bounding boxes. The output is one directory per writer, each containing letter_set.json and the surviving cropped glyph assets.

The `letter_set.v1` Contract

The bundled schema describes one writer's letter-glyph collection:

writer_id identifies the writer set.
writer_provenance records how the writer attribution was established.
upstream pins the exact HASH revision used by the generation run.
letters maps Hebrew letters and final forms to one or more glyph variants.
Each variant carries an asset path, checksum, image metadata, source scan ID, bounding box, license, and rights evidence.
license_summary summarizes the distinct variant-level licenses but does not replace per-variant rights metadata.

Read the full contract in docs/letter_set_v1.md.

Validate Locally

python -m ruff check .
python -m mypy
python -m pytest
hletterscriptgen validate examples/letter_set/writer_example.json

Licensing

Code in this repository is MIT licensed. See LICENSE.
Generated letter sets carry per-variant upstream rights. See LICENSE-POLICY.md; the generator records rights evidence but does not relicense glyphs.

Contributing

See CONTRIBUTING.md. Agent collaborators should also read AGENTS.md.

Credits

Created by Shay Palachy Affek [GitHub]

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github		.github
docs		docs
examples/letter_set		examples/letter_set
src/hletterscriptgen		src/hletterscriptgen
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE-POLICY.md		LICENSE-POLICY.md
README.md		README.md
SECURITY.md		SECURITY.md
llms.txt		llms.txt
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hletterscriptgen

At a Glance

What This Repository Owns

Pipeline Position

Install

CLI

The `letter_set.v1` Contract

Validate Locally

Licensing

Contributing

Credits

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

hletterscriptgen

At a Glance

What This Repository Owns

Pipeline Position

Install

CLI

The letter_set.v1 Contract

Validate Locally

Licensing

Contributing

Credits

About

Topics

Resources

License

Licenses found

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

The `letter_set.v1` Contract

Packages