Source suggestion
The TAU Modern Hebrew Baseline Dataset (Figshare, Tel Aviv University) looks like an excellent fit for this repository.
What it contains:
- 3,960 individual Hebrew character crops
- 18 contemporary writers × 22 Hebrew letters × ~10 samples each
- Images are BMP files, organized as
Writer_N/Alphabet/NN_LetterName/*.bmp
- Sizes range from 45×45 px to 438×438 px per crop
License: CC BY 4.0 — permissive, attribution required, confirmed via Figshare API.
Why it didn't land in the sister repo:
HeOCR/public-domain-hand-written-hebrew-scans is a page-level HTR corpus; character-level crops are out of scope there. The source was evaluated and explicitly rejected with a pointer here.
Direct download: https://ndownloader.figshare.com/files/20364123 (~6.4 MB zip)
Attribution: cite as the TAU Modern Hebrew Baseline Dataset, Figshare DOI 10.6084/m9.figshare.11423352.v1.
Source suggestion
The TAU Modern Hebrew Baseline Dataset (Figshare, Tel Aviv University) looks like an excellent fit for this repository.
What it contains:
Writer_N/Alphabet/NN_LetterName/*.bmpLicense: CC BY 4.0 — permissive, attribution required, confirmed via Figshare API.
Why it didn't land in the sister repo:
HeOCR/public-domain-hand-written-hebrew-scans is a page-level HTR corpus; character-level crops are out of scope there. The source was evaluated and explicitly rejected with a pointer here.
Direct download:
https://ndownloader.figshare.com/files/20364123(~6.4 MB zip)Attribution: cite as the TAU Modern Hebrew Baseline Dataset, Figshare DOI
10.6084/m9.figshare.11423352.v1.