Linear A Decipherment Workbench

A computational research environment for the Linear A corpus — the undeciphered Bronze Age Minoan script (~1800–1450 BCE). Built as a zero-dependency-at-runtime browser SPA that you can use online, run locally, or fork and extend.

Status: experimental research tool. Not authoritative. See Caveats below.

What this is

About 1,700 Linear A inscriptions survive — on clay tablets, sealings, and libation tables, from sites across Crete and the Aegean. We can read most of the sounds (the script shares ~60% of its signs with the deciphered Linear B), but not the language. The corpus is small, fragmentary, and mostly administrative.

This workbench gives you 29 interactive modules for analyzing that corpus: searching, statistics, sign concordances, cross-language phonetic alignment, hypothesis testing, annotation, mapping, comparison. Everything is keyboard-accessible, mostly works offline, and persists your work to the browser.

Highlights


Cross-linguistic alignment matrix	Visual phoneme-by-phoneme comparison of any Linear A word against eight reference languages (Akkadian, Hittite, Luwian, Hurrian, Ugaritic, Pre-Greek, Proto-Indo-European, Egyptian). Color-coded by match quality.
Statistical collocation	PMI, log-likelihood (G²), Yates-corrected χ² with p-values, Bonferroni correction, on-demand Fisher's exact. Significance-only filter.
KWIC concordance	Keyword-in-context view with sortable left/right context columns, configurable window size, dispersion plot across the corpus.
Stem families	Heuristic lemmatization: clusters words that share a stem and differ only by productive suffixes. Candidate morphological paradigms.
Scribe comparison	Per-scribe sign-frequency profile and pairwise comparison (Jaccard overlap, log-ratio of distinctive signs). Deep-links to SigLA for actual sign-shape paleography.
Annotation notebook	Attach proposed meanings + confidence + notes to any word, inscription, or sign. Persisted in localStorage, exportable as JSON, surfaces inline throughout the workbench.
Sound shift hypothesis testing	Edit any sign's phonetic value, watch the change propagate to all cross-language matches. Save snapshots with per-sign reasoning and compare them side-by-side.
Compound query builder	Stackable filters across inscription metadata (site, scribe, dating period) and word features (prefix, suffix, syllable count, contains-sign, co-occurs-with). Saved queries persist.
Co-occurrence network graph	Force-directed visualization of word collocation by PMI. Drag nodes, focus neighborhoods.
Findspot map	Interactive geographic map with zoom, pan, minimap, and progressive label disclosure. Click a site, jump there, see all its inscriptions.
Side-by-side compare	Up to four inscriptions in parallel columns with shared multi-sign words auto-highlighted in matching colors.
Similarity clustering	Token-level or consonant-skeleton Levenshtein over inscription word sequences. Surfaces fragmentary copies and morphological cousins.
Sign inventory	Every sign with its Unicode glyph, GORILA label, Linear B value (where shared), and example words. Empirically derived from corpus alignment.
Full glyph rendering	Real Unicode Linear A characters via Noto Sans Linear A, alongside transliteration and editorial English glosses.
Facsimile + photograph + commentary	Per-inscription scholarly commentary (mirrored from lineara.xyz) plus tablet imagery, loaded from local mirror or upstream CDN.
Comprehensive in-app help	35+ sections with searchable highlights, clickable navigation to every module, workflow recipes, and full keyboard-shortcut reference.

Try it

Live demo: https://ryanpavlicek.github.io/linearaworkbench/

Run locally:

git clone https://github.com/ryanpavlicek/linearaworkbench.git
cd linearaworkbench
npm install
npm run dev

Open http://localhost:5173.

Everything works out of the box. The repo ships with the full corpus (~262 KB) plus the entire upstream auxiliary mirror (~500 MB) — commentary HTML, facsimile images, GORILA PDFs — all of it. Search, sign inventory, network graphs, hypothesis testing, the map, every facsimile button, every Commentary ↗ link: zero external dependencies at runtime.

⚠️ Heads up: the repo is ~500 MB because of the bundled auxiliary mirror. The trade-off is that the GitHub Pages deployment is fully self-contained — it will keep working forever, even if upstream sources go offline. See Saving repo size if you'd prefer a small repo with runtime CDN fallback instead.

Saving repo size (optional)

If you'd rather keep the repo small (~5 MB), you can gitignore the 500 MB public/upstream/ mirror and have the app load commentary and facsimile images from upstream CDNs at runtime:

echo "public/upstream/" >> .gitignore
git rm -r --cached public/upstream
cp .env.example .env.local
# uncomment the two VITE_ASSET_BASE / VITE_COMMENTARY_BASE lines

Tradeoff: you save ~500 MB but the deployed site now depends on mwenge/lineara.xyz staying online. The 29 analytical tools still work regardless — only the Commentary ↗ and Facsimile/Photograph buttons would break if the upstream went down.

To regenerate the bundled mirror later (after the gitignore change is reverted):

npm run assets:fetch     # ~10–20 min, repopulates public/upstream/

Architecture

Stack: Vite + React 18 + TypeScript + Zustand. Zero non-essential runtime dependencies.
Code splitting: each of the 29 modules ships as its own lazy chunk (1–6 KB gzipped). Main shell is ~64 KB gzipped.
State: localStorage-backed for annotations, collections, saved queries, saved hypotheses, pins, display preferences. Namespaced under linear-a-workbench:.
Corpus: pre-built JSON in public/corpus/. Regenerated via npm run corpus:fetch from the upstream mwenge/lineara.xyz source.
Upstream mirror: pre-fetched copy of commentary HTML, facsimile images, and GORILA PDFs lives in public/upstream/ and is committed to the repo so deployments are fully self-contained. Regenerated via npm run assets:fetch.
Sign mapping: derived empirically by aligning the upstream's transliterations with its parsed glyph strings codepoint-by-codepoint. Confidence scores per sign are reported in the Sign Inventory module.
Glyphs: rendered via Noto Sans Linear A.
Asset paths: configurable via VITE_ASSET_BASE and VITE_COMMENTARY_BASE env vars; default to the bundled local mirror.

See docs/METHODOLOGY.md for the math (phonetic distance formula, PMI, alignment derivation) and known limitations.

Keyboard

Ctrl + / — Corpus Search
Ctrl + K — Query Builder
Ctrl + Z — Undo last reversible action
? or / — Open the in-app help
Esc — Close detail modal
Alt + ← / Alt + → — Step inscription navigator (inside detail)
On the Findspot Map (when focused): arrow keys pan, +/- zoom, 0 resets

Project layout

linearaworkbench/
├── public/
│   ├── corpus/             # Pre-built inscription + sign JSON (~262 KB)
│   └── upstream/           # Bundled commentary + images + papers (~500 MB)
├── scripts/
│   ├── build-corpus.mjs    # Normalize upstream corpus → JSON
│   ├── fetch-corpus.mjs    # Pull upstream + rebuild
│   └── fetch-assets.mjs    # Re-mirror upstream commentary + images + PDFs
├── src/
│   ├── components/         # Shared UI (TopBar, Sidebar, DetailModal, ...)
│   ├── data/               # Sign data, language wordlists, site coords
│   ├── lib/                # Algorithms, helpers, types, persistence
│   ├── modules/            # The 29 analysis panels (lazy-loaded)
│   └── store/              # Zustand workbench store
├── docs/
│   └── METHODOLOGY.md      # Technical detail on the analytical methods
├── .github/
│   ├── workflows/          # CI + Pages auto-deploy
│   └── ISSUE_TEMPLATE/     # Bug, feature, data correction templates
└── .env.example            # How to swap bundled assets for upstream CDN

Citations

If you use this workbench in academic work, cite the underlying corpus sources, not this tool:

GORILA: Godart, L. & Olivier, J.-P. (1976–1985). Recueil des inscriptions en linéaire A. École Française d'Athènes.
mwenge/lineara.xyz: the digital transcription of the corpus this tool builds on, at https://github.com/mwenge/lineara.xyz.
John Younger's Linear A Database: scholarly commentary referenced throughout, at http://people.ku.edu/~jyounger/LinearA/.

This workbench is exploratory infrastructure on top of those sources. The analytical claims in your paper should reference the primary scholarship.

Caveats

No editorial authority. We make no claims about what Linear A actually means. All comparisons, alignments, and statistics are exploratory tools.
Comparison wordlists are illustrative. The eight reference-language wordlists in src/data/languages.ts are short editorial collections; they are not exhaustive and have not been peer-reviewed by specialists.
Glyph mapping is empirical, not paleographic. We use idealized Unicode characters. For per-scribe variant analysis, use SigLA or similar paleographic resources.
Sign mapping confidence < 100%. The corpus has some misaligned or uncertain readings; see the confidence column in the Sign Inventory.
Cross-language phonetic distance is heuristic. The weighted Levenshtein formula reflects general typological intuitions, not a trained model. See methodology doc.

Contributing

See CONTRIBUTING.md. Bug reports, data corrections, and new analysis modules all welcome.

License

MIT. The MIT terms apply to the code and bundled corpus JSON. Facsimile images and GORILA PDFs hosted via the upstream remain © École Française d'Athènes and are loaded for academic reference only.

Related work

John Younger's Linear A Database — the canonical scholarly online reference. Every inscription detail in the workbench provides a direct Commentary ↗ link.
mwenge/lineara.xyz — visual catalog with tablet imagery and zoom. We bundle their corpus transcription and commentary mirror; complementary tool overall.
SigLA — paleographic database of Linear A signs by scribe. Use this for sign-variant analysis.
DAMOS — the Mycenaean (Linear B) corpus at Oslo; sister-script database.
GORILA — Godart, L. & Olivier, J.-P. (1976–1985). Recueil des inscriptions en linéaire A (École Française d'Athènes). The printed scholarly edition all digital projects derive from.

Acknowledgements

This workbench would not exist without the volunteer labor of mwenge, whose transcription of the GORILA corpus into structured JSON is the data foundation here. John Younger's decades of scholarly editorial work is the secondary literature source. The École Française d'Athènes holds the rights to the facsimile imagery mirrored from the upstream repository.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github		.github
docs		docs
public		public
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linear A Decipherment Workbench

What this is

Highlights

Try it

Saving repo size (optional)

Architecture

Keyboard

Project layout

Citations

Caveats

Contributing

License

Related work

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Linear A Decipherment Workbench

What this is

Highlights

Try it

Saving repo size (optional)

Architecture

Keyboard

Project layout

Citations

Caveats

Contributing

License

Related work

Acknowledgements

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages