Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ cff-version: 1.2.0
message: Please cite this dataset using the metadata below.
type: dataset
title: Hebrew Handwritten Per-Letter Image Dataset
abstract: Per-letter image crops of handwritten Hebrew letters, grouped into sets by writer. Each crop is a derivative of a permissively-licensed upstream scan in HeOCR/public-domain-hand-written-hebrew-scans, with per-image rights inherited and attribution recorded. The index is line-oriented JSON (JSONL). Release 0.0.0-rc contains 49 per-letter image entries drawn from 2 verified writers (18 LicenseRef-Public-Domain-Israel, 31 PDM-1.0).
abstract: Per-letter image crops of handwritten Hebrew letters, grouped into sets by writer. Each crop is a derivative of a permissively-licensed upstream scan in HeOCR/public-domain-hand-written-hebrew-scans, with per-image rights inherited and attribution recorded. The index is line-oriented JSON (JSONL). Release 0.0.0-rc contains 48 per-letter image entries drawn from 2 verified writers (18 LicenseRef-Public-Domain-Israel, 30 PDM-1.0).
authors:
- name: Shay Palachy-Affek
version: 0.0.0-rc
Expand Down
2 changes: 1 addition & 1 deletion NOTICE.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Repository-authored metadata is dedicated to the public domain under CC0 1.0 Uni
Per-letter image crops are derivatives of upstream scans in [HeOCR/public-domain-hand-written-hebrew-scans](https://github.com/HeOCR/public-domain-hand-written-hebrew-scans) and carry per-entry rights inherited from the source page. The entries listed below carry a license that requires attribution (currently CC-BY-4.0, CC-BY-SA-4.0). Anyone redistributing or reusing these crops must keep the listed credit and link to the source page on which the rights claim was verified.

- Corpus release: `0.0.0-rc`
- Released at (corpus state): `2026-05-13T21:01:19Z`
- Released at (corpus state): `2026-05-13T21:37:46Z`

## Attribution-required entries

Expand Down
47 changes: 23 additions & 24 deletions data/index/entries.jsonl

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion data/index/writers.jsonl
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
{"also_known_as": ["Hayyim Nahman Bialik", "Haim Nahman Bialik", "H. N. Bialik", "חיים נחמן ביאליק", "חיים נחמן ביאַליק"], "dates": {"birth_precision": "exact", "birth_year": 1873, "death_precision": "exact", "death_year": 1934}, "description": "Russian-born Hebrew poet (1873-1934), widely regarded as Israel's national poet. Among the pioneers of modern Hebrew poetry; his manuscript drafts and personal letters are a primary source of early-20th-century handwritten modern Hebrew.", "display_name": "Chaim Nachman Bialik", "ingest": {"agent_notes": "Seed writer for v0 ingest. First per-letter crops drawn from a single manuscript page (commons__bialik_el_hazippor__p0001) to validate the manual-extraction pipeline end-to-end.", "blocked_reason": null}, "languages_written": ["he", "yi"], "period": {"end": "1934", "precision": "year", "start": "1890"}, "references": [{"citation": "Wikipedia: Hayim Nahman Bialik", "kind": "secondary_url", "quote": null, "url": "https://en.wikipedia.org/wiki/Hayim_Nahman_Bialik"}, {"citation": "VIAF authority record 27069388 (Bialik, Ḥayyim Naḥman, 1873-1934)", "kind": "authority_record", "quote": null, "url": "https://viaf.org/viaf/27069388/"}, {"citation": "Wikimedia Commons: manuscript draft of 'El Hatzippor' (autograph).", "kind": "primary_url", "quote": null, "url": "https://commons.wikimedia.org/wiki/File:Bialik_El_hazippor.jpg"}], "scripts_written": ["Hebr"], "status": "verified", "writer_id": "chaim_nachman_bialik"}
{"also_known_as": ["Rachel the Poetess", "רחל המשוררת", "Rachel Bluwstein-Sela", "Рахель Блувштейн", "רחל בלובשטיין"], "dates": {"birth_precision": "exact", "birth_year": 1890, "death_precision": "exact", "death_year": 1931}, "description": "Ukrainian-born Hebrew poet (1890-1931), known as 'Rachel the Poetess'. One of the most beloved poets of the Hebrew literary renaissance; her clear, lyrical cursive manuscripts are among the most legible surviving examples of early-20th-century handwritten modern Hebrew.", "display_name": "Rachel Bluwstein", "ingest": {"agent_notes": "Second writer added to corpus (C03). Four upstream PDM-1.0 scans available; primary letter crops drawn from commons__rachel_gan_naul__p0001 (5184x3456, highest-resolution scan in the dataset).", "blocked_reason": null}, "languages_written": ["he", "ru"], "period": {"end": "1931", "precision": "year", "start": "1910"}, "references": [{"citation": "Wikipedia: Rachel Bluwstein", "kind": "secondary_url", "quote": null, "url": "https://en.wikipedia.org/wiki/Rachel_Bluwstein"}, {"citation": "VIAF authority record 5722988 (Raḥel, 1890-1931)", "kind": "authority_record", "quote": null, "url": "https://viaf.org/viaf/5722988/"}, {"citation": "Wikimedia Commons: 'Gan Naul' manuscript photograph (Gnazim Institute archive, 2019).", "kind": "primary_url", "quote": null, "url": "https://commons.wikimedia.org/wiki/File:Gan_Naul_-_Rachel_IMG_3496.JPG"}], "scripts_written": ["Hebr"], "status": "verified", "writer_id": "rachel_bluwstein"}
{"also_known_as": ["Rachel the Poetess", "רחל המשוררת", "Rachel Bluwstein-Sela", "Рахель Блувштейн", "רחל בלובשטיין"], "dates": {"birth_precision": "exact", "birth_year": 1890, "death_precision": "exact", "death_year": 1931}, "description": "Ukrainian-born Hebrew poet (1890-1931), known as 'Rachel the Poetess'. One of the most beloved poets of the Hebrew literary renaissance; her clear, lyrical cursive manuscripts are among the most legible surviving examples of early-20th-century handwritten modern Hebrew.", "display_name": "Rachel Bluwstein", "ingest": {"agent_notes": "Second writer added to corpus (C03). Four upstream PDM-1.0 scans available; primary letter crops drawn from commons__rachel_aqara_1928__p0001 (653×1024, highest-contrast scan in the dataset). 23 letter forms extracted from stanza 1 (lines 1-3) and stanza 2 (line 1) of the 1928 poem 'Aqara'. Missing from this ingest: mem_final, kaf_final, pe, pe_final, tsadi, tsadi_final. These forms were not found in the explored lines; the gan_naul and begani_netatikha scans were not exhaustively explored and may contain them.", "blocked_reason": null}, "languages_written": ["he", "ru"], "period": {"end": "1931", "precision": "year", "start": "1910"}, "references": [{"citation": "Wikipedia: Rachel Bluwstein", "kind": "secondary_url", "quote": null, "url": "https://en.wikipedia.org/wiki/Rachel_Bluwstein"}, {"citation": "VIAF authority record 5722988 (Raḥel, 1890-1931)", "kind": "authority_record", "quote": null, "url": "https://viaf.org/viaf/5722988/"}, {"citation": "Wikimedia Commons: 'Gan Naul' manuscript photograph (Gnazim Institute archive, 2019).", "kind": "primary_url", "quote": null, "url": "https://commons.wikimedia.org/wiki/File:Gan_Naul_-_Rachel_IMG_3496.JPG"}], "scripts_written": ["Hebr"], "status": "verified", "writer_id": "rachel_bluwstein"}
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

This file was deleted.

18 changes: 9 additions & 9 deletions datapackage.json
Original file line number Diff line number Diff line change
Expand Up @@ -38,21 +38,21 @@
],
"name": "hletterscript",
"profile": "data-package",
"released_at": "2026-05-13T21:01:19Z",
"released_at": "2026-05-13T21:37:46Z",
"resources": [
{
"bytes": 95725,
"bytes": 92837,
"description": "Per-letter image index. One JSON object per cropped letter image, with upstream provenance, extraction provenance, file checksums, and inherited rights.",
"encoding": "utf-8",
"format": "jsonl",
"mediatype": "application/x-ndjson",
"name": "entries",
"path": "data/index/entries.jsonl",
"profile": "data-resource",
"record_count": 49
"record_count": 48
},
{
"bytes": 3057,
"bytes": 3382,
"description": "Writer-level catalog. One JSON object per writer; each writer defines a 'set' of letter images.",
"encoding": "utf-8",
"format": "jsonl",
Expand All @@ -70,7 +70,7 @@
"stats": {
"attribution_required_count": 0,
"entry_writer_count": 2,
"image_byte_count": 75195,
"image_byte_count": 70340,
"letter_breakdown": {
"alef": 2,
"ayin": 2,
Expand All @@ -95,17 +95,17 @@
"tet": 1,
"tsadi": 1,
"vav": 2,
"yod": 3,
"yod": 2,
"zayin": 2
},
"license_breakdown": {
"LicenseRef-Public-Domain-Israel": 18,
"PDM-1.0": 31
"PDM-1.0": 30
},
"record_count": 49,
"record_count": 48,
"writer_breakdown": {
"chaim_nachman_bialik": 25,
"rachel_bluwstein": 24
"rachel_bluwstein": 23
},
"writer_record_count": 2,
"writer_status_breakdown": {
Expand Down
Loading