From 55b9775387264eaf08f60b601a602dc904bf882e Mon Sep 17 00:00:00 2001 From: "Jonathan D.A. Jewell" <6759885+hyperpolymath@users.noreply.github.com> Date: Fri, 13 Mar 2026 17:17:58 +0000 Subject: [PATCH] chore: add v3 Lithoglyph migration roadmap and update project state MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ADR-006: Lithoglyph replaces ArangoDB as primary data store. ArangoDB is retained for graph edges only during Phase 2 and fully removed in Phase 3 when Factor GQL supports graph traversals. - ROADMAP.adoc: add v1.2.0 (Lithoglyph migration), restructure Phase 3 into 3A (graph edges), 3B (GQL-DT queries), 3C (production launch), add database migration path table - ARCHITECTURE.md: update status to Phase 2, add migration architecture diagrams (current → Phase 2 → Phase 3), document why Lithoglyph - META.scm: add ADR-006, mark ADR-001 as superseded, add new principle - ECOSYSTEM.scm: add Lithoglyph as primary-dependency, Docudactyl as upstream, update tech stack - STATE.scm: update to Phase 2 at 20%, mark ArangoDB as deprecated, add migration tasks to route-to-mvp - CLAUDE.md: fix stale project structure, update key decisions table - .claude/CLAUDE.md: expand with bofig-specific build/test/module docs, migration status, architecture rules, query patterns - .github/CODEOWNERS: create missing RSR standard file Co-Authored-By: Claude Opus 4.6 --- .claude/CLAUDE.md | 143 ++++++++++++++++++++++---------- .github/CODEOWNERS | 19 +++++ .machine_readable/ECOSYSTEM.scm | 43 +++++++--- .machine_readable/META.scm | 24 ++++-- .machine_readable/STATE.scm | 69 ++++++++------- ARCHITECTURE.md | 54 +++++++++++- CLAUDE.md | 38 +++++---- ROADMAP.adoc | 92 +++++++++++++++++--- 8 files changed, 360 insertions(+), 122 deletions(-) create mode 100644 .github/CODEOWNERS diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 1f18a05..073bf35 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -11,7 +11,101 @@ The following files in `.machine_readable/` contain structured project metadata: --- -# CLAUDE.md - AI Assistant Instructions +# CLAUDE.md - Bofig Project Instructions + +## Current Phase: Phase 2 — Lithoglyph Migration + +**ADR-006 (2026-03-13):** Lithoglyph replaces ArangoDB as primary data store. + +### Pipeline +``` +Docudactyl (extraction) → Lithoglyph (storage + provenance) ← bofig (queries + visualisation) +``` + +### Migration Status +- Evidence reads/writes: **migrating** from ArangoDB → Lithoglyph GQL +- Entities: **migrating** from ArangoDB → Lithoglyph +- Claims: **pending** migration to Lithoglyph +- Relationships (graph edges): **ArangoDB** (kept until Lithoglyph Factor GQL gets graph traversals) +- User auth: **PostgreSQL** (permanent, phx.gen.auth) + +### Critical Architecture Rules +1. **New domain data features MUST target Lithoglyph**, not ArangoDB +2. **ArangoDB is deprecated for domain data** — only `relationships` edge collection remains +3. **PostgreSQL is for user auth ONLY** — never store domain data there +4. **PROMPT scores** have exactly 6 dimensions (Provenance, Replicability, Objective, Methodology, Publication, Transparency) +5. **Audience types** are: researcher, policymaker, skeptic, activist, affected_person, journalist + +## Build & Test + +```bash +# Compile (0 warnings required) +mix compile --warnings-as-errors + +# Tests (require PostgreSQL + ArangoDB running) +mix test # 257 tests + +# NER extractor tests only (no DB needed) +MIX_ENV=test mix run --no-start -e ' + ExUnit.start(autorun: false) + Code.require_file("test/evidence_graph/lithoglyph/ner_extractor_test.exs") + ExUnit.run() +' + +# Credo lint +mix credo --strict + +# Start dev server +mix phx.server +``` + +## Key Modules + +| Module | Purpose | DB | +|--------|---------|-----| +| `EvidenceGraph.Lithoglyph.Client` | Req HTTP client for Lithoglyph API | Lithoglyph | +| `EvidenceGraph.Lithoglyph.Importer` | GenServer batch import with NER | Lithoglyph → ArangoDB | +| `EvidenceGraph.Lithoglyph.NERExtractor` | Regex NER extraction from content | None (pure) | +| `EvidenceGraph.Entities` | Entity resolution, fuzzy match, merge | ArangoDB (migrating) | +| `EvidenceGraph.Claims` | Claim CRUD + PROMPT scoring | ArangoDB (migrating) | +| `EvidenceGraph.Evidence` | Evidence CRUD + metadata | ArangoDB (migrating) | +| `EvidenceGraph.Relationships` | Graph edges, traversals, contradictions | ArangoDB (kept Phase 2) | +| `EvidenceGraph.ArangoDB` | ArangoDB driver wrapper | ArangoDB | + +## ArangoDB Query Patterns + +```elixir +# Read query (no write transaction) +ArangoDB.query_read(aql, %{bind_var: value}) + +# Write query (transactional) +ArangoDB.query(aql, %{bind_var: value}) + +# Insert document +ArangoDB.insert("collection_name", %{field: value}) + +# Edge document format +%{ + _from: "evidence/evidence_123", + _to: "entities/entity_456", + relationship_type: "mentions", + weight: 1.0, + confidence: 0.9 +} +``` + +## Lithoglyph Client Patterns + +```elixir +# Query evidence from Lithoglyph +LithClient.query("SELECT * FROM evidence WHERE investigation_id = @id", %{id: inv_id}) + +# Insert with provenance (mandatory) +LithClient.insert("evidence", document, actor: "user:123", rationale: "Import from Docudactyl") + +# Dedup check +LithClient.exists_by_hash?("evidence", sha256_hash) +``` ## Language Policy (Hyperpolymath Standard) @@ -19,19 +113,14 @@ The following files in `.machine_readable/` contain structured project metadata: | Language/Tool | Use Case | Notes | |---------------|----------|-------| +| **Elixir** | This project's primary language | Phoenix, LiveView, Absinthe | | **ReScript** | Primary application code | Compiles to JS, type-safe | | **Deno** | Runtime & package management | Replaces Node/npm/bun | | **Rust** | Performance-critical, systems, WASM | Preferred for CLI tools | -| **Tauri 2.0+** | Mobile apps (iOS/Android) | Rust backend + web UI | -| **Dioxus** | Mobile apps (native UI) | Pure Rust, React-like | | **Gleam** | Backend services | Runs on BEAM or compiles to JS | | **Bash/POSIX Shell** | Scripts, automation | Keep minimal | -| **JavaScript** | Only where ReScript cannot | MCP protocol glue, Deno APIs | -| **Nickel** | Configuration language | For complex configs | +| **JavaScript** | Only where ReScript cannot | D3.js hooks in this project | | **Guile Scheme** | State/meta files | STATE.scm, META.scm, ECOSYSTEM.scm | -| **Julia** | Batch scripts, data processing | Per RSR | -| **OCaml** | AffineScript compiler | Language-specific | -| **Ada** | Safety-critical systems | Where required | ### BANNED - Do Not Use @@ -39,45 +128,15 @@ The following files in `.machine_readable/` contain structured project metadata: |--------|-------------| | TypeScript | ReScript | | Node.js | Deno | -| npm | Deno | -| Bun | Deno | -| pnpm/yarn | Deno | +| npm/Bun/pnpm/yarn | Deno | | Go | Rust | | Python | Julia/Rust/ReScript | -| Java/Kotlin | Rust/Tauri/Dioxus | -| Swift | Tauri/Dioxus | -| React Native | Tauri/Dioxus | -| Flutter/Dart | Tauri/Dioxus | - -### Mobile Development - -**No exceptions for Kotlin/Swift** - use Rust-first approach: - -1. **Tauri 2.0+** - Web UI (ReScript) + Rust backend, MIT/Apache-2.0 -2. **Dioxus** - Pure Rust native UI, MIT/Apache-2.0 - -Both are FOSS with independent governance (no Big Tech). - -### Enforcement Rules - -1. **No new TypeScript files** - Convert existing TS to ReScript -2. **No package.json for runtime deps** - Use deno.json imports -3. **No node_modules in production** - Deno caches deps automatically -4. **No Go code** - Use Rust instead -5. **No Python anywhere** - Use Julia for data/batch, Rust for systems, ReScript for apps -6. **No Kotlin/Swift for mobile** - Use Tauri 2.0+ or Dioxus - -### Package Management - -- **Primary**: Guix (guix.scm) -- **Fallback**: Nix (flake.nix) -- **JS deps**: Deno (deno.json imports) +| Java/Kotlin | Rust | ### Security Requirements - No MD5/SHA1 for security (use SHA256+) - HTTPS only (no HTTP URLs) - No hardcoded secrets -- SHA-pinned dependencies -- SPDX license headers on all files - +- SHA-pinned dependencies in workflows +- SPDX license headers on all files (`PMPL-1.0-or-later`) diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS new file mode 100644 index 0000000..aa63131 --- /dev/null +++ b/.github/CODEOWNERS @@ -0,0 +1,19 @@ +# SPDX-License-Identifier: PMPL-1.0-or-later +# CODEOWNERS - Evidence Graph for Investigative Journalism (bofig) + +# Default owner for all files +* @hyperpolymath + +# Core domain logic +/lib/evidence_graph/ @hyperpolymath + +# Lithoglyph integration (Phase 2) +/lib/evidence_graph/lithoglyph/ @hyperpolymath + +# Security-sensitive files +/config/ @hyperpolymath +/.github/workflows/ @hyperpolymath +/SECURITY.md @hyperpolymath + +# Machine-readable state +/.machine_readable/ @hyperpolymath diff --git a/.machine_readable/ECOSYSTEM.scm b/.machine_readable/ECOSYSTEM.scm index 1816d44..d948fe1 100644 --- a/.machine_readable/ECOSYSTEM.scm +++ b/.machine_readable/ECOSYSTEM.scm @@ -1,10 +1,10 @@ ; SPDX-License-Identifier: PMPL-1.0-or-later -; Copyright (c) 2026 Jonathan D.A. Jewell (hyperpolymath) +; Copyright (c) 2026 Jonathan D.A. Jewell (hyperpolymath) (ecosystem (metadata - (version "1.0.0") - (last-updated "2026-02-21") + (version "1.1.0") + (last-updated "2026-03-13") (format "ECOSYSTEM.scm v1")) (project @@ -17,18 +17,35 @@ (position-in-ecosystem (domain "investigative-journalism-tools") (novelty "First system combining PROMPT scoring, boundary objects, and i-docs navigation") - (academic-context "PhD thesis: practical infrastructure for pragmatic epistemology")) + (academic-context "PhD thesis: practical infrastructure for pragmatic epistemology") + (pipeline-position "Docudactyl (extraction) -> Lithoglyph (storage+provenance) -> bofig (navigation+visualisation)")) (related-projects - (project "formdb-debugger" (relationship "sibling-standard")) - (project "formbase" (relationship "sibling-standard")) - (project "hypothesis" (relationship "inspiration") (url "https://hypothes.is/")) - (project "zotero" (relationship "integration-target") (url "https://www.zotero.org/"))) + (project "lithoglyph" + (relationship "primary-dependency") + (role "Evidence store, provenance layer, GQL/GQL-DT query engine") + (notes "Phase 2: migrating all domain data from ArangoDB to Lithoglyph")) + (project "docudactyl" + (relationship "upstream-dependency") + (role "Multi-format HPC document extraction, feeds evidence into Lithoglyph")) + (project "verisimdb" + (relationship "sibling-standard") + (role "Octad database with VQL, shares GQL patterns with Lithoglyph")) + (project "zotero" + (relationship "integration-target") + (url "https://www.zotero.org/") + (role "Reference management, two-way sync for evidence import/export")) + (project "hypothesis" + (relationship "inspiration") + (url "https://hypothes.is/") + (role "Web annotation model"))) (technology-stack - (runtime "BEAM/OTP 26+") - (language "Elixir 1.16+") - (framework "Phoenix 1.7+ with LiveView") - (database "ArangoDB 3.11+") - (api "Absinthe GraphQL") + (runtime "BEAM/OTP 27") + (language "Elixir 1.18+") + (framework "Phoenix 1.8+ with LiveView") + (evidence-store "Lithoglyph (GQL/GQL-DT, Phase 2+)") + (graph-db "ArangoDB 3.11+ (relationships only, Phase 2; removed Phase 3)") + (auth-db "PostgreSQL 16") + (api "Absinthe GraphQL + REST") (visualization "D3.js v7"))) diff --git a/.machine_readable/META.scm b/.machine_readable/META.scm index e56496b..15613c2 100644 --- a/.machine_readable/META.scm +++ b/.machine_readable/META.scm @@ -1,18 +1,19 @@ ; SPDX-License-Identifier: PMPL-1.0-or-later -; Copyright (c) 2026 Jonathan D.A. Jewell (hyperpolymath) +; Copyright (c) 2026 Jonathan D.A. Jewell (hyperpolymath) (meta (metadata - (version "1.0.0") - (last-updated "2026-02-21") + (version "1.1.0") + (last-updated "2026-03-13") (format "META.scm v1")) (architecture-decisions (adr "001" (title "ArangoDB for primary data store") - (status "accepted") + (status "superseded") + (superseded-by "006") (decision "Use ArangoDB 3.11+ with Arangox Elixir driver via MintClient") - (rationale "Production-proven, multi-model (document + graph), managed hosting available")) + (rationale "Production-proven, multi-model (document + graph), managed hosting available. Superseded: Lithoglyph now provides these capabilities with additional provenance and type safety.")) (adr "002" (title "Phoenix LiveView over React SPA") @@ -36,9 +37,18 @@ (title "Ecto without SQL for domain models") (status "accepted") (decision "Use Ecto schemas and changesets for validation only, not persistence") - (rationale "Leverage Ecto's validation without coupling to SQL"))) + (rationale "Leverage Ecto's validation without coupling to SQL")) + + (adr "006" + (title "Lithoglyph replaces ArangoDB as primary data store") + (status "accepted") + (date "2026-03-13") + (supersedes "001") + (decision "Migrate domain data (evidence, claims, entities) from ArangoDB to Lithoglyph. ArangoDB retained for graph edges only during Phase 2, fully removed in Phase 3.") + (rationale "Eliminates data duplication (Docudactyl→Lithoglyph→ArangoDB was a copy pipeline). Lithoglyph provides mandatory provenance, WAL audit trail, GQL-DT dependent types for PROMPT scores, and compile-time verification. ArangoDB was correct for Phase 1 prototyping but duplicates what Lithoglyph already stores."))) (design-rationale (principle "Infrastructure for pragmatic epistemology") (principle "Navigation over narration") - (principle "Coordination without consensus"))) + (principle "Coordination without consensus") + (principle "Query the source, don't copy the data"))) diff --git a/.machine_readable/STATE.scm b/.machine_readable/STATE.scm index 91b2b55..cc7a1da 100644 --- a/.machine_readable/STATE.scm +++ b/.machine_readable/STATE.scm @@ -1,23 +1,23 @@ ; SPDX-License-Identifier: PMPL-1.0-or-later -; Copyright (c) 2026 Jonathan D.A. Jewell (hyperpolymath) +; Copyright (c) 2026 Jonathan D.A. Jewell (hyperpolymath) (state (metadata - (version "1.0.0") + (version "1.1.0") (last-updated "2026-03-13") (format "STATE.scm v1")) (project-context (name "bofig") (full-name "Evidence Graph for Investigative Journalism") - (phase "phase-1-poc") - (status "released")) + (phase "phase-2-lithoglyph-migration") + (status "in-progress")) (current-position - (milestone "Phase 2 - Lithoglyph Integration") + (milestone "Phase 2 - Lithoglyph Migration") (completion-percentage 100) - (phase-2-completion 10) - (focus "Phase 1 complete (v1.0.0, 257 tests). Phase 2 started: Lithoglyph integration, evidence schema extended with sha256_hash.")) + (phase-2-completion 20) + (focus "Migrating domain data from ArangoDB to Lithoglyph. NER entity extraction wired into import pipeline. ADR-006 accepted: Lithoglyph replaces ArangoDB.")) (components (component "elixir-backend" @@ -31,8 +31,8 @@ (status "complete") (completion 100) (notes "Force graph + radar chart hooks implemented")) (component "arangodb-integration" - (status "complete") (completion 100) - (notes "7 claims, 30 evidence, 38 relationships, 6 nav paths seeded")) + (status "deprecated") (completion 100) + (notes "ADR-006: superseded by Lithoglyph. Retained for relationships edge collection only during Phase 2.")) (component "zotero-integration" (status "complete") (completion 100) (notes "API client, mapper, sync, REST endpoints (import/export/batch/sync-status)")) @@ -55,12 +55,13 @@ (status "complete") (completion 100) (notes "A2ML v2.1 Cyberwar-Ready Trustfile with all sections")) (component "lithoglyph-integration" - (status "in-progress") (completion 15) - (notes "Phase 2: Lithoglyph client (Req HTTP), importer GenServer, evidence schema extended with sha256_hash") - (new-files + (status "in-progress") (completion 20) + (notes "Phase 2: HTTP client, importer GenServer, NER extractor, entity resolution, mentions edges. Next: migrate reads/writes from ArangoDB to Lithoglyph GQL.") + (files "lib/evidence_graph/lithoglyph/client.ex" - "lib/evidence_graph/lithoglyph/importer.ex") - (new-endpoints + "lib/evidence_graph/lithoglyph/importer.ex" + "lib/evidence_graph/lithoglyph/ner_extractor.ex") + (endpoints "POST /api/evidence/lithoglyph-import" "GET /api/evidence/lithoglyph-import/status"))) @@ -72,18 +73,35 @@ (route-to-mvp (remaining-tasks + (task "Migrate evidence reads to Lithoglyph GQL" (priority "high")) + (task "Migrate evidence writes to Lithoglyph GQL-DT" (priority "high")) (task "Deploy to Hetzner Cloud" (priority "high")) (task "NUJ participant recruitment" (priority "high")) + (task "Migrate entity/claim collections to Lithoglyph" (priority "medium")) (task "Month 3 decision point" (priority "high")) (task "Zotero browser extension" (priority "medium")))) (critical-next-actions - (action "Complete Lithoglyph integration: entity resolution, financial graph") + (action "Migrate Evidence.create_evidence to write via Lithoglyph GQL-DT instead of ArangoDB") + (action "Migrate evidence queries to read from Lithoglyph GQL instead of ArangoDB AQL") (action "Deploy v1.0.0 to Hetzner Cloud for NUJ testing") (action "Recruit 25 NUJ journalists for user testing") (action "Month 3 decision point: continue or pivot")) (session-history + (session "2026-03-13b" + (completed "Wired NER entity extraction into Lithoglyph importer pipeline") + (completed "Added NERExtractor module (3 strategies: titles, orgs, capitalised sequences)") + (completed "Extended Relationship schema with :entity type and :mentions edges") + (completed "Updated graph traversal helpers for entity nodes") + (completed "Added 13 NER extractor unit tests (all passing)") + (completed "Merged PR #32: feature/entity-resolution-wiring") + (completed "ADR-006: Lithoglyph replaces ArangoDB as primary data store") + (completed "Updated ROADMAP.adoc with v3 Lithoglyph migration plan") + (completed "Updated ARCHITECTURE.md with migration architecture diagrams") + (completed "Updated META.scm, ECOSYSTEM.scm, STATE.scm") + (completed "Fixed stale references in CLAUDE.md") + (completed "Created .github/CODEOWNERS")) (session "2026-03-13" (completed "Phase 2 started: Lithoglyph integration") (completed "Added {:req, ~> 0.5} dependency for Lithoglyph HTTP client") @@ -93,23 +111,12 @@ (completed "Added POST /api/evidence/lithoglyph-import endpoint") (completed "Added GET /api/evidence/lithoglyph-import/status endpoint") (completed "Added sha256_hash index to ArangoDB create_indexes") - (completed "Updated CLAUDE.md: FormDB/FormBase references → Lithoglyph/Docudactyl") + (completed "Updated CLAUDE.md: FormDB/FormBase references -> Lithoglyph/Docudactyl") (completed "Cross-repo integration plan docs created")) (session "2026-02-21" (completed "v1.0.0 release preparation") - (completed "Deleted duplicate files: CHANGELOG.adoc, CONTRIBUTING.md, MAINTAINERS.adoc, LICENSE.txt, rescript.json, Podmanfile.md, docker-compose.yml") + (completed "Deleted duplicate files") (completed "Rewrote Justfile with comprehensive Podman-based recipes") - (completed "Expanded Mustfile to 6 mandatory checks") - (completed "Created bofig.trustfile.a2ml (full A2ML v2.1)") - (completed "Created podman-compose.yml with health checks") - (completed "Created .containerignore for build efficiency") - (completed "Added /api/health endpoint (HealthController)") - (completed "Updated Containerfile to Elixir 1.18/OTP 27") - (completed "Updated all contractiles (Mustfile, Dustfile, Intentfile, Trustfile)") - (completed "Updated ROADMAP.adoc with actual 18-month plan") - (completed "Updated SECURITY.md (auth implemented, resolved gaps)") - (completed "Updated README.adoc for v1.0.0") - (completed "Updated CHANGELOG.md with v1.0.0 entry") - (completed "Updated STATE.scm to 100% Phase 1") - (completed "Bumped mix.exs version to 1.0.0") - (completed "Created .github/workflows/trustfile.yml")))) + (completed "Created bofig.trustfile.a2ml, podman-compose.yml, .containerignore") + (completed "Updated all contractiles and documentation") + (completed "Phase 1 completion: 257 tests, 0 failures")))) diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 32fb6b2..46949e5 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -561,5 +561,55 @@ Virtuoso.import_turtle(jsonld_to_turtle(data)) --- -**Last Updated:** 2025-11-22 -**Status:** Phase 0 (Architecture) → Phase 1 (PoC) beginning +**Last Updated:** 2026-03-13 +**Status:** Phase 1 complete (v1.0.0). Phase 2 in progress: migrating to Lithoglyph as primary evidence store. + +## Lithoglyph Migration (Phase 2→3) + +### Current Architecture (Phase 1) +``` +Browser → nginx → Phoenix (LiveView + REST + GraphQL) + ↓ ↓ + PostgreSQL ArangoDB 3.11+ + (user auth) (evidence, claims, entities, + relationships, navigation) +``` + +### Target Architecture (Phase 2, in progress) +``` +Browser → nginx → Phoenix (LiveView + REST + GraphQL) + ↓ ↓ ↓ + PostgreSQL Lithoglyph ArangoDB + (user auth) (evidence, (relationships + claims, edge collection + entities) only) +``` + +### Final Architecture (Phase 3) +``` +Browser → nginx → Phoenix (LiveView + REST + GraphQL) + ↓ ↓ + PostgreSQL Lithoglyph + (user auth) (all domain data: + evidence, claims, + entities, relationships) +``` + +### Why Lithoglyph + +ArangoDB was the right choice for Phase 1 (quick to prototype, multi-model). +Lithoglyph is the right choice long-term because: + +1. **Provenance is mandatory** — every mutation is a story event with actor + rationale +2. **PROMPT scores as first-class types** — `BoundedNat 0 100` verified at compile time +3. **GQL-DT dependent types** — type-safe queries with proof obligations +4. **Audit-grade WAL** — every write is journaled, reversible +5. **No data duplication** — Docudactyl → Lithoglyph → bofig queries directly (no import step) + +### Migration Strategy + +Each collection migrates independently: +1. **Evidence** (first) — highest value, most data, Lithoglyph already stores it +2. **Entities** — NER-resolved entities with aliases and merge history +3. **Claims** — with PROMPT scores as GQL-DT types +4. **Relationships** (last) — requires Factor GQL graph traversal support diff --git a/CLAUDE.md b/CLAUDE.md index 7818244..8f52667 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -25,27 +25,32 @@ Combining: ``` bofig/ -├── .git/ # Git repository ├── CLAUDE.md # This file ├── ARCHITECTURE.md # Core data model, database design, API specs -├── ROADMAP.md # 18-month implementation plan +├── ROADMAP.adoc # 18-month implementation plan (3 phases) +├── TOPOLOGY.md # ASCII architecture diagram + completion dashboard +├── 0-AI-MANIFEST.a2ml # AI agent entry point +├── .machine_readable/ # STATE.scm, META.scm, ECOSYSTEM.scm ├── docs/ │ ├── database-evaluation.md # ArangoDB vs SurrealDB vs Virtuoso -│ └── zotero-integration.md # Two-way sync design -├── lib/ # Legacy Zotero extension (2017, to be updated) -│ └── exporter.js -├── config/ # Elixir config (to be created) -├── lib/evidence_graph/ # Elixir backend -│ ├── claims.ex -│ ├── evidence.ex -│ ├── arango.ex +│ ├── zotero-integration.md # Two-way sync design +│ └── testing/ # NUJ user testing protocols +├── config/ # Elixir config (dev, test, prod, runtime) +├── lib/evidence_graph/ # Elixir backend (43 modules) +│ ├── claims.ex # Claim CRUD + PROMPT scoring +│ ├── evidence.ex # Evidence CRUD + metadata +│ ├── entities.ex # Entity resolution + NER co-reference +│ ├── relationships.ex # Graph edges (supports/contradicts/mentions) +│ ├── arango.ex # ArangoDB client (being superseded by Lithoglyph) │ └── lithoglyph/ │ ├── client.ex # Lithoglyph HTTP client (Req) -│ └── importer.ex # GenServer for batch Lithoglyph import +│ ├── importer.ex # GenServer for batch Lithoglyph import +│ └── ner_extractor.ex # Regex-based NER extraction ├── lib/evidence_graph_web/ # Phoenix web layer │ ├── schema.ex # Absinthe GraphQL schema -│ └── live/ # LiveView UIs -├── test/ # ExUnit tests +│ ├── live/ # LiveView UIs (7 pages) +│ └── plugs/ # API key auth, authorization +├── test/ # ExUnit tests (257 tests) ├── assets/ # Frontend (D3.js visualizations) └── priv/ ├── repo/seeds.exs # UK Inflation 2023 test data @@ -134,7 +139,8 @@ See [ARCHITECTURE.md](ARCHITECTURE.md) for full data model. | Decision | Rationale | |----------|-----------| -| ArangoDB over SurrealDB | Production-proven, strong Elixir support, managed hosting | +| Lithoglyph over ArangoDB (ADR-006) | Mandatory provenance, GQL-DT dependent types, no data duplication | +| ArangoDB for graph edges (Phase 2) | Retained temporarily for relationship traversals only | | Elixir over Node/Python | Concurrency, fault tolerance, LiveView for real-time | | LiveView over React | Progressive enhancement, less JavaScript | | Optional PROMPT scoring | Reduce adoption friction initially | @@ -448,9 +454,9 @@ New API endpoints for Lithoglyph evidence import: Lithoglyph client (`EvidenceGraph.Lithoglyph.Client`) uses `Req` to communicate with the Lithoglyph HTTP API. The importer GenServer (`EvidenceGraph.Lithoglyph.Importer`) manages background batch imports with progress tracking via PubSub. **Last Updated:** 2026-03-13 -**Current Phase:** Phase 2 — Lithoglyph integration started +**Current Phase:** Phase 2 — Lithoglyph migration (ADR-006: ArangoDB superseded) **Maintained By:** @Hyperpolymath -**Status:** Architecture complete, moving to implementation +**Status:** Phase 1 complete (v1.0.0). Phase 2: migrating domain data to Lithoglyph. ## Questions or Issues? diff --git a/ROADMAP.adoc b/ROADMAP.adoc index 46fa033..9d52395 100644 --- a/ROADMAP.adoc +++ b/ROADMAP.adoc @@ -23,7 +23,7 @@ * D3.js force-directed graph visualisation * RSR compliance: LICENSE, SECURITY.md, CONTRIBUTING, CHANGELOG -=== v1.0.0 - Phase 1 PoC [THIS RELEASE] +=== v1.0.0 - Phase 1 PoC [RELEASED] *Released: 2026-02-21* @@ -54,19 +54,39 @@ * *Pivot:* If core assumptions about evidence navigation are wrong * *Metrics:* Task completion rate, time-on-task, System Usability Scale (SUS) -== Phase 2: Platform (Months 7-12) +== Phase 2: Platform + Lithoglyph Integration (Months 7-12) + +=== v1.2.0 - Lithoglyph as Evidence Store [IN PROGRESS] + +*Started: 2026-03-13* + +The critical architectural shift: Lithoglyph becomes the primary evidence store, +replacing ArangoDB for domain data. ArangoDB is retained only for graph edges +(relationships) until Lithoglyph's Factor GQL runtime supports graph traversals. + +* Lithoglyph HTTP client (Req) for evidence CRUD -- *DONE* +* Batch importer GenServer with progress tracking via PubSub -- *DONE* +* NER entity extraction wired into import pipeline -- *DONE* +* Entity resolution (exact + fuzzy Jaro-Winkler + auto-create) -- *DONE* +* Evidence→entity `:mentions` relationship edges -- *DONE* +* **Migrate evidence reads to Lithoglyph GQL queries** (stop reading from ArangoDB `evidence` collection) +* **Migrate evidence writes to Lithoglyph** (insert via GQL-DT with provenance, not ArangoDB) +* **Migrate entity reads/writes to Lithoglyph** (entities collection) +* **Migrate claims to Lithoglyph** (claims collection with PROMPT as first-class types) +* **Keep ArangoDB for `relationships` edge collection only** (graph traversals) +* PROMPT scores stored as GQL-DT `BoundedNat 0 100` types (compile-time verified) === v2.0.0 - Multi-Investigation Platform * Multi-investigation dashboard with cross-referencing * Real-time collaborative editing via Phoenix PubSub * Advanced D3.js visualisations (timeline, heatmap, Sankey diagrams) -* Full-text search across evidence corpus +* Full-text search across evidence corpus (via Lithoglyph Factor GQL) * Role-based access control (admin, journalist, reviewer, reader) * API rate limiting and key management * IPFS provenance integration for tamper-evident storage * Deployment to Hetzner Cloud (EU data sovereignty) -* ArangoDB Oasis managed database +* Lithoglyph managed instance (replaces ArangoDB Oasis) === Decision Point: Month 9 @@ -75,15 +95,36 @@ * *Scale:* If 50+ active users and stable infrastructure * *Consolidate:* Focus on reliability before new features -== Phase 3: Production Ecosystem (Months 13-18) +== Phase 3: Lithoglyph-Native + Production (Months 13-18) + +=== v3.0.0 - Full Lithoglyph Migration + +The long-term architecture: bofig is a pure Lithoglyph client. ArangoDB is +fully removed. All data lives in Lithoglyph with GQL/GQL-DT queries. + +==== 3A: Graph Edges → Lithoglyph (v3.0.0-alpha) + +* Migrate `relationships` edge collection from ArangoDB to Lithoglyph +* Implement graph traversals (evidence chains, shortest path, contradiction detection) in Factor GQL +* Requires: Lithoglyph Factor GQL runtime gains graph query support (TRAVERSE, SHORTEST_PATH) +* **ArangoDB fully removed** -- no more multi-database architecture +* PostgreSQL retained for user auth only (phx.gen.auth) + +==== 3B: GQL-DT Query Integration (v3.0.0-beta) + +* Replace raw GQL queries with GQL-DT (dependently typed) queries +* PROMPT scores verified at compile-time: `BoundedNat 0 100` instead of runtime range checks +* Evidence insertions require `Rationale` (NonEmptyString) -- enforced by Lean 4 type checker +* Proof certificates for evidence integrity (SHA-256 hash verification via GQL-DT) +* Type-safe evidence chain traversals with proof obligations -=== v3.0.0 - Production Launch +==== 3C: Production Launch (v3.0.0) -* Public API with documentation and SDK +* Public API with documentation and SDK (GQL endpoint, not REST) * Formal security audit and penetration testing -* LEAN4 formal proofs for cryptographic protocols -* Integration with academic citation databases -* Cross-investigation evidence linking +* Lean 4 formal proofs for PROMPT score calculation correctness +* Integration with academic citation databases (via Lithoglyph provenance) +* Cross-investigation evidence linking (Lithoglyph multi-tenancy) * EU funding application * Academic paper on boundary objects in journalism tooling * NUJ partnership formalisation @@ -96,6 +137,33 @@ * *Sustain:* If grant funding secured, dedicated development * *Archive:* If insufficient adoption, document lessons learned +== Database Migration Path + +[cols="1,2,2,2"] +|=== +| Phase | Evidence/Claims/Entities | Relationships (Graph) | Auth + +| Phase 1 (v1.0.0) +| ArangoDB +| ArangoDB (edge collection) +| PostgreSQL + +| Phase 2 (v1.2.0–v2.0.0) +| **Lithoglyph** (primary), ArangoDB (read fallback during migration) +| ArangoDB (edge collection) +| PostgreSQL + +| Phase 3 (v3.0.0) +| **Lithoglyph** (sole store) +| **Lithoglyph** (Factor GQL graph queries) +| PostgreSQL + +|=== + +The migration is incremental: each collection moves independently. Evidence +moves first (highest value), then entities, then claims, then graph edges last +(requires Lithoglyph graph query support). + == Success Metrics [cols="1,2,1"] @@ -118,8 +186,10 @@ == Technology Stack * *Backend:* Elixir 1.18+ / Phoenix 1.8+ / OTP 27 -* *Graph DB:* ArangoDB 3.11+ (multi-model) +* *Evidence Store:* Lithoglyph (GQL/GQL-DT, narrative-first, audit-grade) +* *Graph DB:* ArangoDB 3.11+ (relationships only, Phase 2; removed in Phase 3) * *Auth DB:* PostgreSQL 16 (user accounts only) +* *Query Language:* GQL (user tier) / GQL-DT (admin tier, Lean 4 verified) * *API:* GraphQL (Absinthe) + REST (Zotero) * *Frontend:* Phoenix LiveView + D3.js * *Container:* Podman + podman-compose