Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ cff-version: 1.2.0
message: Please cite this dataset using the metadata below.
type: dataset
title: HASH — Hebrew Archive of Scanned Handwriting
abstract: A small, agent-friendly dataset of public-domain (or permissively licensed) scans of handwritten Hebrew documents, paired with per-scan rights evidence. The index is line-oriented JSON (JSONL); rights are recorded at both source and scan level instead of relying on collection-level claims. Release 0.1.0-rc contains 373 scan entries drawn from 77 verified sources (9 CC-BY-SA-3.0, 2 CC-BY-SA-4.0, 279 LicenseRef-Public-Domain-Israel, 2 LicenseRef-Public-Domain-Ukraine, 81 PDM-1.0).
abstract: A small, agent-friendly dataset of public-domain (or permissively licensed) scans of handwritten Hebrew documents, paired with per-scan rights evidence. The index is line-oriented JSON (JSONL); rights are recorded at both source and scan level instead of relying on collection-level claims. Release 0.1.0-rc contains 198 scan entries drawn from 48 verified sources (9 CC-BY-SA-3.0, 144 LicenseRef-Public-Domain-Israel, 1 LicenseRef-Public-Domain-Ukraine, 44 PDM-1.0).
authors:
- name: Shay Palachy-Affek
version: 0.1.0-rc
Expand Down
1,271 changes: 42 additions & 1,229 deletions NOTICE.md

Large diffs are not rendered by default.

11 changes: 5 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,15 +63,14 @@ make release
<!-- begin:status -->
## Current Status

The corpus currently contains 373 ingested scans drawn from 77 verified sources, totalling ~382.29 MiB on disk. The source-level index also tracks 15 candidate leads still being researched and 17 source records kept for provenance after being rejected as out of scope.
The corpus currently contains 198 ingested scans drawn from 48 verified sources, totalling ~283.00 MiB on disk. The source-level index also tracks 15 candidate leads still being researched and 46 source records kept for provenance after being rejected as out of scope.

License breakdown across the 373 entries:
License breakdown across the 198 entries:

- 279 `LicenseRef-Public-Domain-Israel` (Public Domain (Israel; life + 70))
- 81 `PDM-1.0` (Public Domain Mark 1.0)
- 144 `LicenseRef-Public-Domain-Israel` (Public Domain (Israel; life + 70))
- 44 `PDM-1.0` (Public Domain Mark 1.0)
- 9 `CC-BY-SA-3.0` (Creative Commons Attribution-ShareAlike 3.0 Unported)
- 2 `CC-BY-SA-4.0` (Creative Commons Attribution-ShareAlike 4.0 International)
- 2 `LicenseRef-Public-Domain-Ukraine` (Public Domain (Ukraine; life + 70))
- 1 `LicenseRef-Public-Domain-Ukraine` (Public Domain (Ukraine; life + 70))
<!-- end:status -->

The repository uses a compound licensing model: repository-authored metadata
Expand Down
185 changes: 5 additions & 180 deletions data/index/entries.jsonl

Large diffs are not rendered by default.

58 changes: 29 additions & 29 deletions data/index/sources.jsonl

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions data/review/audit_decisions.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{}
Loading
Loading