Skip to content

ryanpavlicek/linearaworkbench

Repository files navigation

Linear A Decipherment Workbench

A computational research environment for the Linear A corpus — the undeciphered Bronze Age Minoan script (~1800–1450 BCE). Built as a zero-dependency-at-runtime browser SPA that you can use online, run locally, or fork and extend.

Status: experimental research tool. Not authoritative. See Caveats below.

CI License: MIT Self-contained


What this is

About 1,700 Linear A inscriptions survive — on clay tablets, sealings, and libation tables, from sites across Crete and the Aegean. We can read most of the sounds (the script shares ~60% of its signs with the deciphered Linear B), but not the language. The corpus is small, fragmentary, and mostly administrative.

This workbench gives you 29 interactive modules for analyzing that corpus: searching, statistics, sign concordances, cross-language phonetic alignment, hypothesis testing, annotation, mapping, comparison. Everything is keyboard-accessible, mostly works offline, and persists your work to the browser.

Highlights

Cross-linguistic alignment matrix Visual phoneme-by-phoneme comparison of any Linear A word against eight reference languages (Akkadian, Hittite, Luwian, Hurrian, Ugaritic, Pre-Greek, Proto-Indo-European, Egyptian). Color-coded by match quality.
Statistical collocation PMI, log-likelihood (G²), Yates-corrected χ² with p-values, Bonferroni correction, on-demand Fisher's exact. Significance-only filter.
KWIC concordance Keyword-in-context view with sortable left/right context columns, configurable window size, dispersion plot across the corpus.
Stem families Heuristic lemmatization: clusters words that share a stem and differ only by productive suffixes. Candidate morphological paradigms.
Scribe comparison Per-scribe sign-frequency profile and pairwise comparison (Jaccard overlap, log-ratio of distinctive signs). Deep-links to SigLA for actual sign-shape paleography.
Annotation notebook Attach proposed meanings + confidence + notes to any word, inscription, or sign. Persisted in localStorage, exportable as JSON, surfaces inline throughout the workbench.
Sound shift hypothesis testing Edit any sign's phonetic value, watch the change propagate to all cross-language matches. Save snapshots with per-sign reasoning and compare them side-by-side.
Compound query builder Stackable filters across inscription metadata (site, scribe, dating period) and word features (prefix, suffix, syllable count, contains-sign, co-occurs-with). Saved queries persist.
Co-occurrence network graph Force-directed visualization of word collocation by PMI. Drag nodes, focus neighborhoods.
Findspot map Interactive geographic map with zoom, pan, minimap, and progressive label disclosure. Click a site, jump there, see all its inscriptions.
Side-by-side compare Up to four inscriptions in parallel columns with shared multi-sign words auto-highlighted in matching colors.
Similarity clustering Token-level or consonant-skeleton Levenshtein over inscription word sequences. Surfaces fragmentary copies and morphological cousins.
Sign inventory Every sign with its Unicode glyph, GORILA label, Linear B value (where shared), and example words. Empirically derived from corpus alignment.
Full glyph rendering Real Unicode Linear A characters via Noto Sans Linear A, alongside transliteration and editorial English glosses.
Facsimile + photograph + commentary Per-inscription scholarly commentary (mirrored from lineara.xyz) plus tablet imagery, loaded from local mirror or upstream CDN.
Comprehensive in-app help 35+ sections with searchable highlights, clickable navigation to every module, workflow recipes, and full keyboard-shortcut reference.

Try it

Live demo: https://ryanpavlicek.github.io/linearaworkbench/

Run locally:

git clone https://github.com/ryanpavlicek/linearaworkbench.git
cd linearaworkbench
npm install
npm run dev

Open http://localhost:5173.

Everything works out of the box. The repo ships with the full corpus (~262 KB) plus the entire upstream auxiliary mirror (~500 MB) — commentary HTML, facsimile images, GORILA PDFs — all of it. Search, sign inventory, network graphs, hypothesis testing, the map, every facsimile button, every Commentary ↗ link: zero external dependencies at runtime.

⚠️ Heads up: the repo is ~500 MB because of the bundled auxiliary mirror. The trade-off is that the GitHub Pages deployment is fully self-contained — it will keep working forever, even if upstream sources go offline. See Saving repo size if you'd prefer a small repo with runtime CDN fallback instead.

Saving repo size (optional)

If you'd rather keep the repo small (~5 MB), you can gitignore the 500 MB public/upstream/ mirror and have the app load commentary and facsimile images from upstream CDNs at runtime:

echo "public/upstream/" >> .gitignore
git rm -r --cached public/upstream
cp .env.example .env.local
# uncomment the two VITE_ASSET_BASE / VITE_COMMENTARY_BASE lines

Tradeoff: you save ~500 MB but the deployed site now depends on mwenge/lineara.xyz staying online. The 29 analytical tools still work regardless — only the Commentary ↗ and Facsimile/Photograph buttons would break if the upstream went down.

To regenerate the bundled mirror later (after the gitignore change is reverted):

npm run assets:fetch     # ~10–20 min, repopulates public/upstream/

Architecture

  • Stack: Vite + React 18 + TypeScript + Zustand. Zero non-essential runtime dependencies.
  • Code splitting: each of the 29 modules ships as its own lazy chunk (1–6 KB gzipped). Main shell is ~64 KB gzipped.
  • State: localStorage-backed for annotations, collections, saved queries, saved hypotheses, pins, display preferences. Namespaced under linear-a-workbench:.
  • Corpus: pre-built JSON in public/corpus/. Regenerated via npm run corpus:fetch from the upstream mwenge/lineara.xyz source.
  • Upstream mirror: pre-fetched copy of commentary HTML, facsimile images, and GORILA PDFs lives in public/upstream/ and is committed to the repo so deployments are fully self-contained. Regenerated via npm run assets:fetch.
  • Sign mapping: derived empirically by aligning the upstream's transliterations with its parsed glyph strings codepoint-by-codepoint. Confidence scores per sign are reported in the Sign Inventory module.
  • Glyphs: rendered via Noto Sans Linear A.
  • Asset paths: configurable via VITE_ASSET_BASE and VITE_COMMENTARY_BASE env vars; default to the bundled local mirror.

See docs/METHODOLOGY.md for the math (phonetic distance formula, PMI, alignment derivation) and known limitations.

Keyboard

  • Ctrl + / — Corpus Search
  • Ctrl + K — Query Builder
  • Ctrl + Z — Undo last reversible action
  • ? or / — Open the in-app help
  • Esc — Close detail modal
  • Alt + ← / Alt + → — Step inscription navigator (inside detail)
  • On the Findspot Map (when focused): arrow keys pan, +/- zoom, 0 resets

Project layout

linearaworkbench/
├── public/
│   ├── corpus/             # Pre-built inscription + sign JSON (~262 KB)
│   └── upstream/           # Bundled commentary + images + papers (~500 MB)
├── scripts/
│   ├── build-corpus.mjs    # Normalize upstream corpus → JSON
│   ├── fetch-corpus.mjs    # Pull upstream + rebuild
│   └── fetch-assets.mjs    # Re-mirror upstream commentary + images + PDFs
├── src/
│   ├── components/         # Shared UI (TopBar, Sidebar, DetailModal, ...)
│   ├── data/               # Sign data, language wordlists, site coords
│   ├── lib/                # Algorithms, helpers, types, persistence
│   ├── modules/            # The 29 analysis panels (lazy-loaded)
│   └── store/              # Zustand workbench store
├── docs/
│   └── METHODOLOGY.md      # Technical detail on the analytical methods
├── .github/
│   ├── workflows/          # CI + Pages auto-deploy
│   └── ISSUE_TEMPLATE/     # Bug, feature, data correction templates
└── .env.example            # How to swap bundled assets for upstream CDN

Citations

If you use this workbench in academic work, cite the underlying corpus sources, not this tool:

This workbench is exploratory infrastructure on top of those sources. The analytical claims in your paper should reference the primary scholarship.

Caveats

  • No editorial authority. We make no claims about what Linear A actually means. All comparisons, alignments, and statistics are exploratory tools.
  • Comparison wordlists are illustrative. The eight reference-language wordlists in src/data/languages.ts are short editorial collections; they are not exhaustive and have not been peer-reviewed by specialists.
  • Glyph mapping is empirical, not paleographic. We use idealized Unicode characters. For per-scribe variant analysis, use SigLA or similar paleographic resources.
  • Sign mapping confidence < 100%. The corpus has some misaligned or uncertain readings; see the confidence column in the Sign Inventory.
  • Cross-language phonetic distance is heuristic. The weighted Levenshtein formula reflects general typological intuitions, not a trained model. See methodology doc.

Contributing

See CONTRIBUTING.md. Bug reports, data corrections, and new analysis modules all welcome.

License

MIT. The MIT terms apply to the code and bundled corpus JSON. Facsimile images and GORILA PDFs hosted via the upstream remain © École Française d'Athènes and are loaded for academic reference only.

Related work

  • John Younger's Linear A Database — the canonical scholarly online reference. Every inscription detail in the workbench provides a direct Commentary ↗ link.
  • mwenge/lineara.xyz — visual catalog with tablet imagery and zoom. We bundle their corpus transcription and commentary mirror; complementary tool overall.
  • SigLA — paleographic database of Linear A signs by scribe. Use this for sign-variant analysis.
  • DAMOS — the Mycenaean (Linear B) corpus at Oslo; sister-script database.
  • GORILA — Godart, L. & Olivier, J.-P. (1976–1985). Recueil des inscriptions en linéaire A (École Française d'Athènes). The printed scholarly edition all digital projects derive from.

Acknowledgements

This workbench would not exist without the volunteer labor of mwenge, whose transcription of the GORILA corpus into structured JSON is the data foundation here. John Younger's decades of scholarly editorial work is the secondary literature source. The École Française d'Athènes holds the rights to the facsimile imagery mirrored from the upstream repository.