Summary
Add Franz Kafka (1883–1924) as a writer using his Hebrew language-exercise notebook, scanned from the Max Brod estate and available as a single PDM-1.0 page in the upstream repo. Kafka's Hebrew was that of an adult learner (he began seriously studying the language c. 1920); the resulting handwriting is careful and non-fluent — an underrepresented style that adds meaningful distributional breadth to the HTR training corpus.
Writer record
Upstream source
| entry_id |
license |
Notes |
commons__kafka_hebrew_writings__p0001 |
PDM-1.0 |
Open-notebook spread of Hebrew language exercises, Max Brod estate scan |
rights_basis: public_domain, attribution_required: false.
Known quality constraints
Kafka's Hebrew was learner-level, not fluent. Expect:
- Non-fluent letter forms: careful block-printing rather than cursive; some letters may be atypical or over-deliberate
- Low variety per page: exercises may repeat high-frequency letters and skip rare ones
- Mixed legibility: some exercises clear, others cramped or uncertain
- Many crops will likely be
usable_for_syngen: false (non-fluent forms inappropriate as style exemplars for synthetic generation)
quality.notes on each entry must document the learner-hand caveat explicitly
Despite these caveats, learner-hand samples are a valid and underrepresented data point for HTR generalisation. Ingest all crops that are at minimum legibility: medium; include legibility: low crops only if they are unambiguously identifiable as a specific letter.
Acceptance criteria
Summary
Add Franz Kafka (1883–1924) as a writer using his Hebrew language-exercise notebook, scanned from the Max Brod estate and available as a single PDM-1.0 page in the upstream repo. Kafka's Hebrew was that of an adult learner (he began seriously studying the language c. 1920); the resulting handwriting is careful and non-fluent — an underrepresented style that adds meaningful distributional breadth to the HTR training corpus.
Writer record
writer_idfranz_kafkadisplay_namealso_known_asborn/diedscripts_writtenlanguages_writtenperiodUpstream source
commons__kafka_hebrew_writings__p0001rights_basis: public_domain,attribution_required: false.Known quality constraints
Kafka's Hebrew was learner-level, not fluent. Expect:
usable_for_syngen: false(non-fluent forms inappropriate as style exemplars for synthetic generation)quality.noteson each entry must document the learner-hand caveat explicitlyDespite these caveats, learner-hand samples are a valid and underrepresented data point for HTR generalisation. Ingest all crops that are at minimum
legibility: medium; includelegibility: lowcrops only if they are unambiguously identifiable as a specific letter.Acceptance criteria
franz_kafkawriter row inwriters.jsonlwithstatus: verifiedlegibility >= mediumingested; borderline crops documented withusable_for_htr: false, usable_for_syngen: falseand aquality.notesexplanationletter.stylefield reflects learner/block-print hand (not standard cursive Ashkenazi)python3 scripts/validate_indexes.py --upstream-path <upstream>python3 scripts/generate_release_artifacts.py --checkpassespython3 -m pytestpasses