Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ __pycache__/
.venv/
venv/
.pytest_cache/
.review_feedback.json
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ cff-version: 1.2.0
message: Please cite this dataset using the metadata below.
type: dataset
title: Hebrew Handwritten Per-Letter Image Dataset
abstract: 'Per-letter image crops of handwritten Hebrew letters, grouped into sets by writer. Each crop is a derivative of a permissively-licensed upstream scan in HeOCR/public-domain-hand-written-hebrew-scans, with per-image rights inherited and attribution recorded. The index is line-oriented JSON (JSONL). Release 0.0.0-rc is the initial-setup release: the corpus contains no per-letter image entries yet. The repository ships the schemas, validation tooling, CI, and licensing policy needed to start ingesting.'
abstract: Per-letter image crops of handwritten Hebrew letters, grouped into sets by writer. Each crop is a derivative of a permissively-licensed upstream scan in HeOCR/public-domain-hand-written-hebrew-scans, with per-image rights inherited and attribution recorded. The index is line-oriented JSON (JSONL). Release 0.0.0-rc contains 7 per-letter image entries drawn from 1 verified writers (7 PDM-1.0).
authors:
- name: Shay Palachy-Affek
version: 0.0.0-rc
Expand Down
2 changes: 1 addition & 1 deletion NOTICE.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Repository-authored metadata is dedicated to the public domain under CC0 1.0 Uni
Per-letter image crops are derivatives of upstream scans in [HeOCR/public-domain-hand-written-hebrew-scans](https://github.com/HeOCR/public-domain-hand-written-hebrew-scans) and carry per-entry rights inherited from the source page. The entries listed below carry a license that requires attribution (currently CC-BY-4.0, CC-BY-SA-4.0). Anyone redistributing or reusing these crops must keep the listed credit and link to the source page on which the rights claim was verified.

- Corpus release: `0.0.0-rc`
- Released at (corpus state): `2026-05-12T00:00:00Z`
- Released at (corpus state): `2026-05-12T22:30:00Z`

## Attribution-required entries

Expand Down
7 changes: 7 additions & 0 deletions data/index/entries.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{"entry_id": "chaim_nachman_bialik__bet__v0001", "extraction": {"extracted_at": "2026-05-12T22:30:00Z", "extracted_by": "Shay Palachy-Affek", "method": "manual", "notes": "Manual seed crop from upstream scan commons__bialik_el_hazippor__p0001 (Bialik manuscript draft of 'El Hatzippor'). Bbox picked from line 4 of the scan; cursive Hebrew handwriting. Crop produced via Pillow.", "tool": "manual", "tool_version": "0.0.0-manual"}, "image": {"background": "original", "bytes": 3301, "height_px": 22, "local_path": "data/letters/chaim_nachman_bialik/bet/chaim_nachman_bialik__bet__v0001.png", "mime_type": "image/png", "sha256": "f699ee63a92ee3459377547bce0ff1188e8ad2fb8086e7e683f5133e2347a627", "width_px": 15}, "letter": {"codepoint": "U+05D1", "form": "regular", "name": "bet", "notes": null, "style": "cursive_ashkenazi", "unicode_char": "ב"}, "quality": {"exclusion_reasons": [], "legibility": "medium", "notes": "Seed crop at native upstream resolution; cursive Ashkenazi-style hand. Low-resolution upstream scan (409x253) limits per-letter pixel detail.", "usable_for_htr": true, "usable_for_syngen": true}, "rights": {"attribution_required": false, "attribution_text": null, "attribution_url": null, "commercial_use_allowed": true, "derivatives_allowed": true, "evidence_text": "Inherited from upstream entry commons__bialik_el_hazippor__p0001 (PDM-1.0; this work is in the public domain in its country of origin and other countries where the copyright term is the author's life plus 70 years or fewer). Bialik died in 1934; the work is public domain in Israel and most jurisdictions.", "license_expression": "PDM-1.0", "redistribution_allowed": true, "rights_basis": "public_domain", "verification_status": "inherited_from_upstream", "verified_at": "2026-05-12"}, "upstream": {"bbox": {"h": 22, "w": 15, "x": 343, "y": 203}, "commit": "df07bd3825405ed93c15fd61fe4d7967fc60885e", "entry_id": "commons__bialik_el_hazippor__p0001", "release_tag": null, "sha256": "bdd6f1a3b9f8821bbca0c0c836eebf3914a335f816662f1a7f0c4495e45e624e", "source_id": "commons__bialik_el_hazippor"}, "writer_id": "chaim_nachman_bialik"}
{"entry_id": "chaim_nachman_bialik__kaf__v0001", "extraction": {"extracted_at": "2026-05-12T22:30:00Z", "extracted_by": "Shay Palachy-Affek", "method": "manual", "notes": "Manual seed crop from upstream scan commons__bialik_el_hazippor__p0001 (Bialik manuscript draft of 'El Hatzippor'). Bbox picked from line 4 of the scan; cursive Hebrew handwriting. Crop produced via Pillow.", "tool": "manual", "tool_version": "0.0.0-manual"}, "image": {"background": "original", "bytes": 3093, "height_px": 16, "local_path": "data/letters/chaim_nachman_bialik/kaf/chaim_nachman_bialik__kaf__v0001.png", "mime_type": "image/png", "sha256": "f64f3120eaa7f07244bcd2b730683f7e7ee72316abb7f783b6daaeed0883915a", "width_px": 12}, "letter": {"codepoint": "U+05DB", "form": "regular", "name": "kaf", "notes": null, "style": "cursive_ashkenazi", "unicode_char": "כ"}, "quality": {"exclusion_reasons": [], "legibility": "medium", "notes": "Split from original kaf+yod double-letter bbox; right (kaf) portion only. Low-resolution upstream scan.", "usable_for_htr": true, "usable_for_syngen": true}, "rights": {"attribution_required": false, "attribution_text": null, "attribution_url": null, "commercial_use_allowed": true, "derivatives_allowed": true, "evidence_text": "Inherited from upstream entry commons__bialik_el_hazippor__p0001 (PDM-1.0; this work is in the public domain in its country of origin and other countries where the copyright term is the author's life plus 70 years or fewer). Bialik died in 1934; the work is public domain in Israel and most jurisdictions.", "license_expression": "PDM-1.0", "redistribution_allowed": true, "rights_basis": "public_domain", "verification_status": "inherited_from_upstream", "verified_at": "2026-05-12"}, "upstream": {"bbox": {"h": 16, "w": 12, "x": 330, "y": 203}, "commit": "df07bd3825405ed93c15fd61fe4d7967fc60885e", "entry_id": "commons__bialik_el_hazippor__p0001", "release_tag": null, "sha256": "bdd6f1a3b9f8821bbca0c0c836eebf3914a335f816662f1a7f0c4495e45e624e", "source_id": "commons__bialik_el_hazippor"}, "writer_id": "chaim_nachman_bialik"}
{"entry_id": "chaim_nachman_bialik__lamed__v0001", "extraction": {"extracted_at": "2026-05-12T22:30:00Z", "extracted_by": "Shay Palachy-Affek", "method": "manual", "notes": "Manual seed crop from upstream scan commons__bialik_el_hazippor__p0001 (Bialik manuscript draft of 'El Hatzippor'). Bbox picked from line 4 of the scan; cursive Hebrew handwriting. Crop produced via Pillow.", "tool": "manual", "tool_version": "0.0.0-manual"}, "image": {"background": "original", "bytes": 3468, "height_px": 28, "local_path": "data/letters/chaim_nachman_bialik/lamed/chaim_nachman_bialik__lamed__v0001.png", "mime_type": "image/png", "sha256": "b9b291b0c6b701c759ac1475ae74106d98cd4e22faa121e53c5eea2f91dc085c", "width_px": 15}, "letter": {"codepoint": "U+05DC", "form": "regular", "name": "lamed", "notes": null, "style": "cursive_ashkenazi", "unicode_char": "ל"}, "quality": {"exclusion_reasons": [], "legibility": "medium", "notes": "Seed crop at native upstream resolution; cursive Ashkenazi-style hand. Low-resolution upstream scan (409x253) limits per-letter pixel detail.", "usable_for_htr": true, "usable_for_syngen": true}, "rights": {"attribution_required": false, "attribution_text": null, "attribution_url": null, "commercial_use_allowed": true, "derivatives_allowed": true, "evidence_text": "Inherited from upstream entry commons__bialik_el_hazippor__p0001 (PDM-1.0; this work is in the public domain in its country of origin and other countries where the copyright term is the author's life plus 70 years or fewer). Bialik died in 1934; the work is public domain in Israel and most jurisdictions.", "license_expression": "PDM-1.0", "redistribution_allowed": true, "rights_basis": "public_domain", "verification_status": "inherited_from_upstream", "verified_at": "2026-05-12"}, "upstream": {"bbox": {"h": 28, "w": 15, "x": 200, "y": 192}, "commit": "df07bd3825405ed93c15fd61fe4d7967fc60885e", "entry_id": "commons__bialik_el_hazippor__p0001", "release_tag": null, "sha256": "bdd6f1a3b9f8821bbca0c0c836eebf3914a335f816662f1a7f0c4495e45e624e", "source_id": "commons__bialik_el_hazippor"}, "writer_id": "chaim_nachman_bialik"}
{"entry_id": "chaim_nachman_bialik__mem__v0001", "extraction": {"extracted_at": "2026-05-12T22:30:00Z", "extracted_by": "Shay Palachy-Affek", "method": "manual", "notes": "Manual seed crop from upstream scan commons__bialik_el_hazippor__p0001 (Bialik manuscript draft of 'El Hatzippor'). Bbox picked from line 4 of the scan; cursive Hebrew handwriting. Crop produced via Pillow.", "tool": "manual", "tool_version": "0.0.0-manual"}, "image": {"background": "original", "bytes": 3098, "height_px": 23, "local_path": "data/letters/chaim_nachman_bialik/mem/chaim_nachman_bialik__mem__v0001.png", "mime_type": "image/png", "sha256": "cbb73d4644f2f42751f06c18224efeb2ff7bdcc9cb674283d466540582a35b72", "width_px": 12}, "letter": {"codepoint": "U+05DE", "form": "regular", "name": "mem", "notes": null, "style": "cursive_ashkenazi", "unicode_char": "מ"}, "quality": {"exclusion_reasons": [], "legibility": "low", "notes": "Collapsed mem form in cursive Ashkenazi hand; visually resembles yod or a small square. Hard/ambiguous example — treat with care for HTR training.", "usable_for_htr": true, "usable_for_syngen": false}, "rights": {"attribution_required": false, "attribution_text": null, "attribution_url": null, "commercial_use_allowed": true, "derivatives_allowed": true, "evidence_text": "Inherited from upstream entry commons__bialik_el_hazippor__p0001 (PDM-1.0; this work is in the public domain in its country of origin and other countries where the copyright term is the author's life plus 70 years or fewer). Bialik died in 1934; the work is public domain in Israel and most jurisdictions.", "license_expression": "PDM-1.0", "redistribution_allowed": true, "rights_basis": "public_domain", "verification_status": "inherited_from_upstream", "verified_at": "2026-05-12"}, "upstream": {"bbox": {"h": 23, "w": 12, "x": 290, "y": 202}, "commit": "df07bd3825405ed93c15fd61fe4d7967fc60885e", "entry_id": "commons__bialik_el_hazippor__p0001", "release_tag": null, "sha256": "bdd6f1a3b9f8821bbca0c0c836eebf3914a335f816662f1a7f0c4495e45e624e", "source_id": "commons__bialik_el_hazippor"}, "writer_id": "chaim_nachman_bialik"}
{"entry_id": "chaim_nachman_bialik__resh__v0001", "extraction": {"extracted_at": "2026-05-12T22:30:00Z", "extracted_by": "Shay Palachy-Affek", "method": "manual", "notes": "Manual seed crop from upstream scan commons__bialik_el_hazippor__p0001 (Bialik manuscript draft of 'El Hatzippor'). Bbox picked from line 4 of the scan; cursive Hebrew handwriting. Crop produced via Pillow.", "tool": "manual", "tool_version": "0.0.0-manual"}, "image": {"background": "original", "bytes": 3193, "height_px": 22, "local_path": "data/letters/chaim_nachman_bialik/resh/chaim_nachman_bialik__resh__v0001.png", "mime_type": "image/png", "sha256": "1092de5374576bc96965fd1a10b089c311bc7ba7ea975d8179cb784aa441298a", "width_px": 13}, "letter": {"codepoint": "U+05E8", "form": "regular", "name": "resh", "notes": null, "style": "cursive_ashkenazi", "unicode_char": "ר"}, "quality": {"exclusion_reasons": [], "legibility": "medium", "notes": "Seed crop at native upstream resolution; cursive Ashkenazi-style hand. Low-resolution upstream scan (409x253) limits per-letter pixel detail.", "usable_for_htr": true, "usable_for_syngen": true}, "rights": {"attribution_required": false, "attribution_text": null, "attribution_url": null, "commercial_use_allowed": true, "derivatives_allowed": true, "evidence_text": "Inherited from upstream entry commons__bialik_el_hazippor__p0001 (PDM-1.0; this work is in the public domain in its country of origin and other countries where the copyright term is the author's life plus 70 years or fewer). Bialik died in 1934; the work is public domain in Israel and most jurisdictions.", "license_expression": "PDM-1.0", "redistribution_allowed": true, "rights_basis": "public_domain", "verification_status": "inherited_from_upstream", "verified_at": "2026-05-12"}, "upstream": {"bbox": {"h": 22, "w": 13, "x": 278, "y": 202}, "commit": "df07bd3825405ed93c15fd61fe4d7967fc60885e", "entry_id": "commons__bialik_el_hazippor__p0001", "release_tag": null, "sha256": "bdd6f1a3b9f8821bbca0c0c836eebf3914a335f816662f1a7f0c4495e45e624e", "source_id": "commons__bialik_el_hazippor"}, "writer_id": "chaim_nachman_bialik"}
{"entry_id": "chaim_nachman_bialik__tav__v0001", "extraction": {"extracted_at": "2026-05-12T22:30:00Z", "extracted_by": "Shay Palachy-Affek", "method": "manual", "notes": "Manual seed crop from upstream scan commons__bialik_el_hazippor__p0001 (Bialik manuscript draft of 'El Hatzippor'). Bbox picked from line 4 of the scan; cursive Hebrew handwriting. Crop produced via Pillow.", "tool": "manual", "tool_version": "0.0.0-manual"}, "image": {"background": "original", "bytes": 3195, "height_px": 20, "local_path": "data/letters/chaim_nachman_bialik/tav/chaim_nachman_bialik__tav__v0001.png", "mime_type": "image/png", "sha256": "db0a29002f767438e82bc04d35d4e727f4c0f99bac8f6fee7c2f59e68d22e628", "width_px": 14}, "letter": {"codepoint": "U+05EA", "form": "regular", "name": "tav", "notes": null, "style": "cursive_ashkenazi", "unicode_char": "ת"}, "quality": {"exclusion_reasons": [], "legibility": "medium", "notes": "Seed crop at native upstream resolution; cursive Ashkenazi-style hand. Low-resolution upstream scan (409x253) limits per-letter pixel detail.", "usable_for_htr": true, "usable_for_syngen": true}, "rights": {"attribution_required": false, "attribution_text": null, "attribution_url": null, "commercial_use_allowed": true, "derivatives_allowed": true, "evidence_text": "Inherited from upstream entry commons__bialik_el_hazippor__p0001 (PDM-1.0; this work is in the public domain in its country of origin and other countries where the copyright term is the author's life plus 70 years or fewer). Bialik died in 1934; the work is public domain in Israel and most jurisdictions.", "license_expression": "PDM-1.0", "redistribution_allowed": true, "rights_basis": "public_domain", "verification_status": "inherited_from_upstream", "verified_at": "2026-05-12"}, "upstream": {"bbox": {"h": 20, "w": 14, "x": 309, "y": 202}, "commit": "df07bd3825405ed93c15fd61fe4d7967fc60885e", "entry_id": "commons__bialik_el_hazippor__p0001", "release_tag": null, "sha256": "bdd6f1a3b9f8821bbca0c0c836eebf3914a335f816662f1a7f0c4495e45e624e", "source_id": "commons__bialik_el_hazippor"}, "writer_id": "chaim_nachman_bialik"}
{"entry_id": "chaim_nachman_bialik__yod__v0001", "extraction": {"extracted_at": "2026-05-12T22:30:00Z", "extracted_by": "Shay Palachy-Affek", "method": "manual", "notes": "Manual seed crop from upstream scan commons__bialik_el_hazippor__p0001 (Bialik manuscript draft of 'El Hatzippor'). Bbox picked from line 4 of the scan; cursive Hebrew handwriting. Crop produced via Pillow.", "tool": "manual", "tool_version": "0.0.0-manual"}, "image": {"background": "original", "bytes": 2879, "height_px": 16, "local_path": "data/letters/chaim_nachman_bialik/yod/chaim_nachman_bialik__yod__v0001.png", "mime_type": "image/png", "sha256": "5cc808ed43f33a788e392ae66fecd580a6c183fb83b49c08fda21a66538c0207", "width_px": 7}, "letter": {"codepoint": "U+05D9", "form": "regular", "name": "yod", "notes": null, "style": "cursive_ashkenazi", "unicode_char": "י"}, "quality": {"exclusion_reasons": [], "legibility": "medium", "notes": "Split from original kaf+yod double-letter bbox; left (yod) portion only. Small stroke consistent with cursive yod. Low-resolution upstream scan.", "usable_for_htr": true, "usable_for_syngen": true}, "rights": {"attribution_required": false, "attribution_text": null, "attribution_url": null, "commercial_use_allowed": true, "derivatives_allowed": true, "evidence_text": "Inherited from upstream entry commons__bialik_el_hazippor__p0001 (PDM-1.0; this work is in the public domain in its country of origin and other countries where the copyright term is the author's life plus 70 years or fewer). Bialik died in 1934; the work is public domain in Israel and most jurisdictions.", "license_expression": "PDM-1.0", "redistribution_allowed": true, "rights_basis": "public_domain", "verification_status": "inherited_from_upstream", "verified_at": "2026-05-12"}, "upstream": {"bbox": {"h": 16, "w": 7, "x": 324, "y": 203}, "commit": "df07bd3825405ed93c15fd61fe4d7967fc60885e", "entry_id": "commons__bialik_el_hazippor__p0001", "release_tag": null, "sha256": "bdd6f1a3b9f8821bbca0c0c836eebf3914a335f816662f1a7f0c4495e45e624e", "source_id": "commons__bialik_el_hazippor"}, "writer_id": "chaim_nachman_bialik"}
1 change: 1 addition & 0 deletions data/index/writers.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"also_known_as": ["Hayyim Nahman Bialik", "Haim Nahman Bialik", "H. N. Bialik", "חיים נחמן ביאליק", "חיים נחמן ביאַליק"], "dates": {"birth_precision": "exact", "birth_year": 1873, "death_precision": "exact", "death_year": 1934}, "description": "Russian-born Hebrew poet (1873-1934), widely regarded as Israel's national poet. Among the pioneers of modern Hebrew poetry; his manuscript drafts and personal letters are a primary source of early-20th-century handwritten modern Hebrew.", "display_name": "Chaim Nachman Bialik", "ingest": {"agent_notes": "Seed writer for v0 ingest. First per-letter crops drawn from a single manuscript page (commons__bialik_el_hazippor__p0001) to validate the manual-extraction pipeline end-to-end.", "blocked_reason": null}, "languages_written": ["he", "yi"], "period": {"end": "1934", "precision": "year", "start": "1890"}, "references": [{"citation": "Wikipedia: Hayim Nahman Bialik", "kind": "secondary_url", "quote": null, "url": "https://en.wikipedia.org/wiki/Hayim_Nahman_Bialik"}, {"citation": "VIAF authority record 27069388 (Bialik, Ḥayyim Naḥman, 1873-1934)", "kind": "authority_record", "quote": null, "url": "https://viaf.org/viaf/27069388/"}, {"citation": "Wikimedia Commons: manuscript draft of 'El Hatzippor' (autograph).", "kind": "primary_url", "quote": null, "url": "https://commons.wikimedia.org/wiki/File:Bialik_El_hazippor.jpg"}], "scripts_written": ["Hebr"], "status": "verified", "writer_id": "chaim_nachman_bialik"}
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading