Skip to content

C06: Add Franz Kafka — Hebrew language exercises (1 page, PDM-1.0) #9

@shaypal5

Description

@shaypal5

Summary

Add Franz Kafka (1883–1924) as a writer using his Hebrew language-exercise notebook, scanned from the Max Brod estate and available as a single PDM-1.0 page in the upstream repo. Kafka's Hebrew was that of an adult learner (he began seriously studying the language c. 1920); the resulting handwriting is careful and non-fluent — an underrepresented style that adds meaningful distributional breadth to the HTR training corpus.

Writer record

Field Value
writer_id franz_kafka
display_name Franz Kafka
also_known_as פרנץ קפקא, František Kafka
born / died 1883 / 1924 — PDM in all life+70 jurisdictions
scripts_written Hebr, Latn
languages_written he (learner), de, cs
period 1920–1924 (Hebrew study period)
VIAF https://viaf.org/viaf/4927658/
Wikipedia https://en.wikipedia.org/wiki/Franz_Kafka

Upstream source

entry_id license Notes
commons__kafka_hebrew_writings__p0001 PDM-1.0 Open-notebook spread of Hebrew language exercises, Max Brod estate scan

rights_basis: public_domain, attribution_required: false.

Known quality constraints

Kafka's Hebrew was learner-level, not fluent. Expect:

  • Non-fluent letter forms: careful block-printing rather than cursive; some letters may be atypical or over-deliberate
  • Low variety per page: exercises may repeat high-frequency letters and skip rare ones
  • Mixed legibility: some exercises clear, others cramped or uncertain
  • Many crops will likely be usable_for_syngen: false (non-fluent forms inappropriate as style exemplars for synthetic generation)
  • quality.notes on each entry must document the learner-hand caveat explicitly

Despite these caveats, learner-hand samples are a valid and underrepresented data point for HTR generalisation. Ingest all crops that are at minimum legibility: medium; include legibility: low crops only if they are unambiguously identifiable as a specific letter.

Acceptance criteria

  • New franz_kafka writer row in writers.jsonl with status: verified
  • All crops with legibility >= medium ingested; borderline crops documented with usable_for_htr: false, usable_for_syngen: false and a quality.notes explanation
  • letter.style field reflects learner/block-print hand (not standard cursive Ashkenazi)
  • All entries pass python3 scripts/validate_indexes.py --upstream-path <upstream>
  • python3 scripts/generate_release_artifacts.py --check passes
  • python3 -m pytest passes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions