Skip to content

Source suggestion: TAU Modern Hebrew Baseline Dataset (Figshare, CC BY 4.0) #14

@shaypal5

Description

@shaypal5

Source suggestion

The TAU Modern Hebrew Baseline Dataset (Figshare, Tel Aviv University) looks like an excellent fit for this repository.

What it contains:

  • 3,960 individual Hebrew character crops
  • 18 contemporary writers × 22 Hebrew letters × ~10 samples each
  • Images are BMP files, organized as Writer_N/Alphabet/NN_LetterName/*.bmp
  • Sizes range from 45×45 px to 438×438 px per crop

License: CC BY 4.0 — permissive, attribution required, confirmed via Figshare API.

Why it didn't land in the sister repo:
HeOCR/public-domain-hand-written-hebrew-scans is a page-level HTR corpus; character-level crops are out of scope there. The source was evaluated and explicitly rejected with a pointer here.

Direct download: https://ndownloader.figshare.com/files/20364123 (~6.4 MB zip)

Attribution: cite as the TAU Modern Hebrew Baseline Dataset, Figshare DOI 10.6084/m9.figshare.11423352.v1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions