Skip to content

Improve README presentation#38

Merged
shaypal5 merged 1 commit into
mainfrom
codex/improve-hash-readme-presentation
May 31, 2026
Merged

Improve README presentation#38
shaypal5 merged 1 commit into
mainfrom
codex/improve-hash-readme-presentation

Conversation

@shaypal5

Copy link
Copy Markdown
Contributor

Summary

Improve the GitHub landing-page presentation for HASH so a first-time reader can see representative handwriting scans immediately and understand the dataset's rights posture before digging into the JSONL indexes.

Changes

  • Add CI, metadata-license, and scan-count badges at the top of the README.
  • Add the required creator attribution near the top and the current Credits block at the bottom.
  • Add a representative sample-scan grid under docs/assets/hash-sample-grid.jpg using records with public-domain / Public Domain Mark / Israel-public-domain rights in the scan index.
  • Add an "At a Glance" table with current scan/source counts, corpus size, rights-policy link, and canonical index link.
  • Keep the existing generated status block intact.

GitHub metadata

Updated directly on the repository card with gh repo edit:

  • Description: Rights-clean dataset of scanned handwritten Hebrew-script documents for OCR/HTR research.
  • Topics: hebrew, ocr, htr, handwriting, dataset, cc0

Validation

Passed:

  • git diff --check -- README.md docs/assets/hash-sample-grid.jpg
  • README/index/sample-grid smoke check: exactly two creator-credit occurrences, Credits at the bottom, 198 indexed entries, sample grid linked, canonical index linked, sample grid is 1260x840
  • python3 scripts/validate_indexes.py: ok: 111 sources, 198 entries, 228 files verified, recipe ok

Full-suite note:

  • Initial python3 -m pytest could not collect because pyarrow was missing locally.
  • After python3 -m pip install -r requirements-dev.txt, python3 -m pytest ran and reported 74 passed / 6 failed.
  • The failures are pre-existing data-artifact/release-fixture issues on this origin/main base, not introduced by this README/image-only PR:
    • stale exports/entries.csv
    • stale datapackage.json
    • test_builder_picks_original_among_multiple_roles
    • test_attribution_gate_is_license_driven

Base-history note

The local main checkout has four unpushed HTR/transcript commits. This branch was intentionally created from origin/main to keep this PR limited to presentation changes.

@shaypal5 shaypal5 added this to the v0.1.0 milestone May 30, 2026
@shaypal5 shaypal5 added documentation Improvements or additions to documentation area:docs README, AGENTS.md, or docs/* area:license License metadata or legal text size:S Small PR (single file or trivial change) labels May 30, 2026
@shaypal5 shaypal5 merged commit e8b188c into main May 31, 2026
1 check failed
@shaypal5 shaypal5 deleted the codex/improve-hash-readme-presentation branch May 31, 2026 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:docs README, AGENTS.md, or docs/* area:license License metadata or legal text documentation Improvements or additions to documentation size:S Small PR (single file or trivial change)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant