diff --git a/README.md b/README.md index c261256..4db97e3 100644 --- a/README.md +++ b/README.md @@ -16,9 +16,9 @@ plain-English **["What is this?"](docs/what-is-this.md)**. ## ▶ See it move -There's no hosted click-to-try demo yet _(coming soon)_. But you can run the playground on your own -machine in about a minute — see **[Try it in 60 seconds](#-try-it-in-60-seconds)** below. It's a little -sandbox where you load a character, click its animations, and type to make it talk. +There's no hosted click-to-try demo yet — but you can run the playground on your own machine in about a +minute — see **[Try it in 60 seconds](#-try-it-in-60-seconds)** below. It's a little sandbox where you +load a character, click its animations, and type to make it talk. ![Genie moving and talking in the MASH playground, rendered entirely in the browser](assets/gifs/genie-speaking.gif) @@ -159,9 +159,11 @@ authentic voice is a clearly-labeled upgrade. - **`services/voice-server`** — the Dockerized voice helper. - **`apps/mash`** — the playground demo you ran above. -Full developer docs are on the way: [`docs/developers/overview.md`](docs/developers/overview.md) and a -copy-paste [`docs/developers/quickstart.md`](docs/developers/quickstart.md) _(coming soon)_. The -architecture is documented today in [`docs/architecture.md`](docs/architecture.md), and the build plan in +Full developer docs are here: the **[overview](docs/developers/overview.md)** (the openness pitch + +architecture), a copy-paste **[quickstart](docs/developers/quickstart.md)** (a talking character in about +ten lines), the **[API reference](docs/developers/api.md)**, **[TTS providers](docs/developers/providers.md)**, +and **[character bundles](docs/developers/bundles.md)**. The architecture is also written up in +[`docs/architecture.md`](docs/architecture.md), and the full build history in [`docs/roadmap.md`](docs/roadmap.md). --- diff --git a/assets/screenshots/genie-speaking.png b/assets/screenshots/genie-speaking.png deleted file mode 100644 index a7a7058..0000000 Binary files a/assets/screenshots/genie-speaking.png and /dev/null differ diff --git a/docs/characters.md b/docs/characters.md index a765203..a7a21a7 100644 --- a/docs/characters.md +++ b/docs/characters.md @@ -7,6 +7,10 @@ any other character anyone ever made. If it's a `.acs` file, vivify aims to run _The gallery grows as characters are captured. Here's Genie — load any `.acs` to meet the rest._ +![Genie performing his Greet animation in the browser](../assets/gifs/genie-animation.gif) + +_…and in motion — Genie's "Greet", played straight from his original animation set._ + ## How to get your own `.acs` files vivify ships **no** character files — they're Microsoft's, and you supply your own. The diff --git a/docs/cycles/cycle-19-doc-drift.md b/docs/cycles/cycle-19-doc-drift.md new file mode 100644 index 0000000..fdb912d --- /dev/null +++ b/docs/cycles/cycle-19-doc-drift.md @@ -0,0 +1,69 @@ +# Cycle 19 — doc drift + stale-marker correctness pass + +## Goal +A quick-win **correctness** pass on the docs: fix content that is now factually wrong because the work it +describes has shipped. No new page bodies — just make the DONE things stop claiming they're "coming." The +audit ([the cycle-18 follow-up](../../README.md) findings) surfaced three stale spots; this cycle fixes +exactly those and nothing else. + +**Docs only — no code; CI stays green.** The genuinely-still-stub pages (getting-started, faq, +troubleshooting, voice/overview, voice/setup, voice/sourcing-components) get real content in **later +cycles** and are NOT touched here — their "(coming soon)" signposts are accurate and stay. + +## What this cycle fixes + +1. **README — stale developer-docs marker.** Lines ~162–165 say the developer docs / quickstart are + "_(coming soon)_" — **false**: the five `docs/developers/*` pages shipped in Cycle 17 (merged, PR #20). + Rewrite to point at the live pages (overview, quickstart, api, providers, bundles). + +2. **README — hosted-demo line (~19).** "There's no hosted click-to-try demo yet _(coming soon)_" is + accurate (there is still no hosted demo) but the `_(coming soon)_` tag reads like an unfinished-doc / + broken-link marker. Reword to state the fact cleanly without the placeholder tag. (No hosted demo is + being claimed as done — the statement stays true.) + +3. **Images on main + the two spare assets.** PR #22 is merged, so all five `assets/…/genie-*` files are + on main and every doc image ref resolves. Two captured assets were produced-but-unreferenced: + - `assets/gifs/genie-animation.gif` (the Greet animation) → **wire into `docs/characters.md`** (it + strengthens that thin page by showing a character actually in motion). + - `assets/screenshots/genie-speaking.png` (a static mid-speech still) → **remove as redundant**: the + speaking **GIF** (`genie-speaking.gif`) conveys strictly more and is already used on README, + what-is-this, and developers/quickstart. (Removing the committed file only; the capture script is not + touched — out of scope — so a future run still produces it locally.) + +4. **`docs/roadmap.md` — the worst drift.** The table stopped at Cycle 12, labelled Cycles 8–11 "in + progress / not merged" though they are long merged, dropped the stale per-voice-cycle "operator + validation pending" hedges (the authentic voice + lip-sync is now confirmed working — Cycle 18 + captured real TruVoice Genie speech), and omitted Cycles 13–18 entirely. Rewrite it to the true merged + history: every cycle 0–18 shown as shipped (with its PR + cycle-doc/ADR refs), plus an honest + "in progress / planned" section for the remaining doc work (the 6 stub pages + thin-page polish). + +## What is explicitly NOT touched (accurate signposts to real stubs) +- README `:112` (getting-started), `:125` (voice/overview), `:181` (voice/sourcing-components), + `:191–193` (faq, troubleshooting) — all point at genuine stubs; "(coming soon)" is correct. +- The three install pages' footer links to `faq.md` / `troubleshooting.md` `_(coming soon)_` — correct. +- The six stub pages themselves — untouched (content lands in later cycles). + +## Acceptance check +- No page that is actually DONE still says "(coming soon)" / "on the way": specifically the README + developer-docs lines point at the live `docs/developers/*` pages. +- `docs/roadmap.md` reflects the real merged history through Cycle 18 (no cycle mislabelled "not merged"; + 13–18 present), and an honest "planned" section for what's left. +- `genie-animation.gif` is referenced (characters.md) and renders; `genie-speaking.png` is removed and no + doc references it. +- Every relative `.md` link still resolves; every `![](…)` image target exists on main. +- `pnpm -r typecheck && pnpm -r test && pnpm lint && pnpm format` green (docs only; Markdown + prettier-ignored). + +## Verification +- `grep -rn "coming soon" README.md docs/ | grep -vE 'cycles/|decisions/'` → only the accurate + stub-signposts remain (getting-started, voice/*, faq, troubleshooting); no developer-docs hit. +- `git ls-tree -r main -- assets/` vs the doc image refs → all resolve; `genie-speaking.png` no longer + referenced. +- Read `roadmap.md` against `git log --merges` (PRs #1–#22) — statuses match. +- `doc-auditor` / `code-reviewer` confirms: roadmap matches merged history, no DONE page claims "coming," + links resolve, images render. + +## Non-goals +Writing the 6 stub pages (getting-started, faq, troubleshooting, voice/*) — later cycles. Thin-page +expansion (characters gallery beyond the wired GIF, architecture page) — later/optional. No code changes. +No merge — open a PR (base `main`) and stop. diff --git a/docs/roadmap.md b/docs/roadmap.md index f65a538..cfd8ee5 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -1,24 +1,49 @@ # Roadmap -Risk is concentrated in two spikes — ACS image/animation decode (Cycle 1) and the authentic voice service (Cycle 5). Both are front-loaded as go/no-go gates. Everything else is comparatively mechanical assembly. Cycles 1 and 5 are deliberately **not** merged into their neighbors: you want the go/no-go answer before building on top of it. - -| # | Cycle | The point | Acceptance (go/no-go where noted) | -|---|-------|-----------|-----------------------------------| -| 0 | Repo + contracts | Nail the seams before building across them | Types compile (strict); stub agent loads & no-ops; bundle schema + validator exist | -| 1 | **ACS spike** — one character's pixels | Prove we can correctly decode images + animation table from raw `.acs` | **GO/NO-GO (met):** Genie + Merlin animation names match **Microsoft's published lists exactly** (76/76, 73/73); pixel decode confirmed structurally/visually (palette, transparency, dimensions, composited Greet). Byte-exact unique-image count + per-pixel grading **moved to Cycle 2** (gates `acs2bundle`) — see ADR-0009 | -| 2 | Full parser + sounds + `acs2bundle` | Generalize to the whole format; emit web-ready bundles | **MET (PR stacked on Cycles 0/1):** Genie/Merlin/Peedy/Robby convert to valid bundles (manifests pass the zod validator); exact unique-image count + lossless sprite-sheet round-trip (pixel-for-pixel, 0 mismatches across 2,775 images); synthetic fixtures exercise the full parser in CI | -| 3 | Core renderer (silent) | The browser engine: compositing, timing, branching, queue, authentic **balloon** (text only) | **MET (PR stacked on Cycles 0–2):** Genie/Merlin load and play — frames composited at offsets w/ transparency, frame-timed, probabilistic + exit branches resolve to rest, state→animation map honored; balloon styled per character shows word-wrapped typed text; full action queue + show/hide/play/speak/moveTo/gestureAt/stopCurrent/stop work. Silent — TtsProvider seam wired but defaults to StubTtsProvider (audio is Cycles 5/6). CI covers playback/queue/branching/wrap/states/compositor via synthetic-IR + fake-clock tests; the on-screen render (Genie/Merlin play, balloon) is **verified via the local harness** (gitignored — Cycle 4 is the committed app) | -| 4 | MASH demo (silent) | Showcase + dogfood the public API early | **MET (PR off main):** `apps/mash` — committed, deployable browser MASH-clone built **only** on `@vivify/core`'s public API (`createAgent` + Agent control), no `@vivify/acs` or internals (vanilla TS + Vite, no UI framework — ADR-0013). Character picker with `.acs` upload (file input + drag/drop → `createAgent(arrayBuffer)`) and optional local built-in bundles (`createAgent({manifestUrl})`); full animation list, click to play; type-to-balloon (engine-rendered styled balloon, **silent** — StubTtsProvider); stop/hide/show/replay; graceful errors on bad uploads. Ships **no** `.acs`/MS assets — built-ins are gitignored local-only; the committed app is fully usable via upload. CI unit-tests the pure UI helpers (16 tests); canvas/`createAgent` integration confirmed visually. Verified: **builds** a deployable `dist` (no IP) and dev server starts — deploying to a host is the operator's step | -| 5 | **Voice spike** — Wine + SAPI4 + TruVoice | Authentic voice + mouth timing out of the real engine | **GO/NO-GO — spike built, real-env validation pending:** `services/voice-server` ships a Node HTTP server (`POST /tts {text,voice}` → `{audioWavBase64, mouthTimeline, format}`, `GET /health`) that spawns a C++ SAPI4 bridge (`bridge/sapi4-mouth.cpp`) under Wine; the bridge captures both a WAV and a mouth/viseme timeline via the SAPI4 `ITTSNotifySinkW::Visual`/`TTSMOUTH` callback; a Dockerfile (Wine32 + Xvfb + Node) installs the user-supplied runtime and compiles the bridge. **Verified in CI (no Wine):** full Node HTTP layer end-to-end against a fake-bridge double — 35 tests (voice-args mapping, timeline parsing, routes/validation/errors/timeout); typecheck/lint/format green. **NOT yet verified (the actual GO/NO-GO):** authentic TruVoice audio + a real audio-aligned viseme timeline — needs the Docker+Wine image built/run with the user's vendor binaries. See ADR-0014 (Node HTTP + Wine SAPI4 bridge) and ADR-0015 (mouth timeline via the SAPI4 notify sink); IP gate held (no MS/L&H binaries committed; `vendor/`, the Wine prefix, and the compiled bridge gitignored) — see `docs/cycles/cycle-5-voice.md` | -| 6 | Lip-sync + audio integration | Wire the authentic provider; drive mouth overlays from the timeline; word-sync the balloon; add Web Speech fallback | **MERGED (PR #8 → main); on-screen audio/sync = operator validation:** `@vivify/voice-truvoice` ships a real `TruVoiceProvider` (POSTs to the Cycle 5 server, decodes the WAV + `mouthTimeline`, honors an `AbortSignal`) plus a small audible `WebSpeechProvider` fallback. On `speak()`, `@vivify/core` plays the WAV via an injectable `AudioSink` (default Web Audio) and drives lip-sync from **audio playback time** — composites a mouth-overlay image (chosen from the active Speaking frame's `mouth.overlays` by viseme `shape`) and reveals balloon words by progress; `stop()`/`stopCurrent()` now interrupt in-flight synthesis + audio + animation (carried-forward gap closed); silent fallback intact. New public API: `createAgentFromModel`, `AudioSink`, lipsync helpers. Structured mouth modeling lands (ADR-0010's Cycle 6 part): `MouthOverlay { overlays: FrameMouthOverlay[] }` — parser + bundle schema + `InSync` guards updated. voice-server gets permissive CORS; `apps/mash` gains a "Voice server URL" field routing speech through `TruVoiceProvider`. **Verified in CI (synthetic fixtures, no `.acs` / no engine / no browser/audio):** provider against a fake HTTP server (decode + abort), lipsync pure-fn tests, and a speak/stop-interrupts integration suite (`FakeClock` + fake `AudioSink` + fake provider). **NOT yet operator-validated:** the on-screen authentic voice + synced mouth + word-advancing balloon (needs the running Cycle 5 service + a browser); the `shape`→overlay mapping is a tunable heuristic that may need visual calibration. See ADR-0016 and `docs/cycles/cycle-6-lipsync.md` | -| 7 | **Authentic mouth density** — real-time-audio bridge | Replace the Cycle 6 interim interpolation with dense per-phoneme mouth events | **MERGED (PR #9 → main); on-screen audio/sync = operator validation:** rewrite the SAPI4 bridge to render via `CLSID_MMAudioDest` (real-time multimedia audio) so the engine emits the dense per-phoneme `Visual`/`TTSMOUTH` stream that file-audio mode (`CLSID_AudioDestFile`) suppressed. Because MMAudioDest can't tee PCM, the bridge runs **two synthesis passes** — pass A (MMAudioDest) for the dense events, pass B (AudioDestFile) for the WAV (same engine/voice/text ⇒ aligned timestamps). A **PulseAudio null-sink dummy audio device** under headless Wine lets MMAudioDest initialize (no sound card otherwise). Per-viseme timing is each callback's `GetTickCount()` arrival relative to playback start (not `qTimeStamp`, which doesn't advance per viseme). HTTP contract + bridge CLI unchanged. **Operator-verified:** dense, varied, audio-aligned mouth shapes confirmed on-screen with the authentic voice. See ADR-0019 (and ADR-0017 for the deferral) and `docs/cycles/cycle-7-realtime-audio.md` | -| 8 | **Animation return-to-rest** | Stop animations freezing on a non-neutral frame / hard-cutting to the next | **In progress (branch `cycle-8-return-to-rest`, PR → main, not merged):** core-only (`@vivify/core`) fix — a finished gesture HOLDS its end pose (so actions stack); the return path runs only when a *different* animation starts from a non-rest pose (transition through rest, no hard cut) or on explicit `stop()`. `computeExitPath` honors `transitionType`: `1` = exit-branch walk, `0` = named return animation, `2` = none. Folds in two sibling fixes the hold-model exposed: `speak()` preserves the held pose and composites lip-sync onto an overlay-bearing frame, and the balloon is shown when audio actually starts (not during synthesis). **CI-verified** (unlike the voice cycles, not operator-gated): pure `computeExitPath` + engine behavior via synthetic models + `FakeClock`. On-screen check: `DoMagic2` holds its pose, a new animation transitions through rest, `GestureRight`→`Speak` keeps pointing with lip-sync, balloon appears with audio, Stop relaxes to rest; browser-refresh to test, no container rebuild. See ADR-0020 and `docs/cycles/cycle-8-return-to-rest.md` | -| 9 | **Dockerized demo** | Collapse running MASH to one command — static container + auto-wired voice URL | **Built + Docker-validated (branch `cycle-9-dockerize-demo`, PR → main, not merged):** `apps/mash` ships as a two-stage static container (Vite build → `nginx:alpine`) listening on **8090**, built from the repo root (it imports the workspace packages). The "Voice server URL" field pre-fills with `http://localhost:8080` (build-arg `VITE_VOICE_SERVER_URL` overridable, runtime-editable, clearing it goes silent) so sound just works when the separate voice container is up; the MASH image ships **no** Wine/SAPI4/MS-IP. Repo-root `docker-compose.yml` runs MASH + voice (`docker compose up mash` skips the voice binaries). **CI green** (`resolveVoiceServerUrl` unit tests join the MASH pure-helper suite; no Docker build in CI, matching the voice-server model). **Docker-validated (implementer):** image builds from the repo root, serves HTTP 200 on 8090, the `-p 9000:8090` host-port override works, and the default voice URL is baked into the bundle. **On-screen (operator): pending** — `docker compose up`, upload a `.acs`, Speak → authentic voice + lip-sync with no manual URL paste. See ADR-0021 and `docs/cycles/cycle-9-dockerize-demo.md` | -| 10 | **Voice latency — measure + warm** | Find where the ~2–3s Speak delay goes, then trim the low-risk part | **Built + CI-green for the server side (branch `cycle-10-latency`, PR → main, not merged):** adds per-stage timing instrumentation — the bridge logs a machine-readable `[timing]` line (engine `initMs`, Pass A/B ttfb + total, `writeMs`, total) and the server logs a combined `[tts-timing]` breakdown (bridge stages + `bridgeWall`/`read`/`encode`/`total`). Warms the engine: persistent Xvfb (`:99`) + `wineserver` + startup warmup synth (`entrypoint.sh`), so the per-request command drops `xvfb-run` cold-start to a plain `wine …`. Pass A and Pass B stay **serial** — `passB_totalMs` is measured and reported as a candidate future saving, not forced parallel. **Verified in CI (no Wine):** `parseBridgeTiming` unit tests (valid line → struct; missing/garbled → null; tolerant of extra fields) + a server test driving the fake bridge asserting `onTiming` receives the parsed bridge + server stages; typecheck/test/lint/format green. **Operator-measured (no Wine/audio in CI):** real latency numbers + the cold-vs-warm `total=` delta come from a container rebuild + curl/Speak and are recorded in the cycle doc's table. See ADR-0022 and `docs/cycles/cycle-10-latency.md` | -| 11 | **Voice latency — single-pass** | Kill the duplicate synthesis pass and close the structural gap | **Built + CI-green for the pure helpers (branch `cycle-11-latency-singlepass`, PR → main, not merged):** removes Pass B — the bridge is now a **single real-time pass** that plays to the PulseAudio null sink and emits the dense mouth events, while the server records that sink's `.monitor` with `parec` concurrently (injectable `VIVIFY_CAPTURE` / `VIVIFY_CAPTURE_GRACE_MS`; null-sink format pinned in `pulse-null.pa`). The server collects raw PCM, wraps it into a WAV, and **trims leading silence** to align with the timeline; empty/silent capture → honest **500** (never a faked WAV). Locates + closes the ~2170ms gap: the bridge prints `[boot]` first in `main()` (server times the first stderr byte ⇒ `wineLoad`) and does a fast `_Exit` (skips COM/DLL unload + device drain) to close `teardown`; `[tts-timing]` drops `passB` and surfaces `wineLoad`/`capture`/`teardown` + the bridge sub-parts. `wineLoad` is named as the residual (a persistent-engine daemon would remove it — deferred). **Verified in CI (no Wine/PulseAudio):** `wrapPcmToWav` + `trimLeadingSilence` unit tests and a server test with an injected fake `captureCommand` (leading-silence + tone PCM) + timeline-only fake bridge → valid RIFF/WAVE built from the capture, aligned timeline, and the empty-capture → 500 path; typecheck/test/lint/format green. **Operator-measured (no Wine/audio in CI):** real before/after latency, WAV validity, and on-screen lip-sync alignment come from a container rebuild + curl/Speak and are recorded in the cycle doc's table. See ADR-0023 and `docs/cycles/cycle-11-latency-singlepass.md` | -| 12 | Packaging + docs | Make it adoptable | `npm i`, drop component, talking character in <10 lines; README + live demo | - -Note: Cycle 4 (demo) is pulled *before* voice so there's a visible, shippable artifact at the halfway mark and the public API gets exercised early. Voice (5/6) then upgrades it from silent to full. +The build was sequenced around two go/no-go spikes — ACS image/animation decode (Cycle 1) and the +authentic voice service (Cycle 5) — front-loaded so the risky answers came first. **Both passed**, and the +engine is now feature-complete: it loads real `.acs` characters, renders and animates them in the browser, +and speaks in the authentic L&H TruVoice voice with dense, audio-aligned lip-sync (confirmed end-to-end in +Cycle 18, which captured real Genie speech from the running stack). Everything from Cycle 13 on is +documentation and packaging polish. + +Each row links its build-cycle doc (`docs/cycles/`) and any load-bearing decision (`docs/decisions/`). + +## Shipped (all merged to `main`) + +| # | Cycle | The point | Status | +| --- | ------------------------------ | ------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------- | +| 0 | Repo + contracts | Nail the seams before building across them | **Merged** (PR #1). Strict types compile; stub agent loads & no-ops; bundle schema + validator | +| 1 | **ACS spike** (go/no-go) | Decode one character's images + animation table from raw `.acs` | **Merged — GATE PASSED.** Genie/Merlin animation names match Microsoft's lists exactly (76/76, 73/73); pixel decode confirmed. See `cycle-1-acs-spike.md`, ADR-0009 | +| 2 | Full parser + `acs2bundle` | Generalize to the whole format; emit web-ready bundles | **Merged.** Genie/Merlin/Peedy/Robby → valid bundles (zod-validated); lossless sprite-sheet round-trip (0 mismatches across 2,775 images). `cycle-2-converter.md` | +| 3 | Core renderer (silent) | The browser engine: compositing, timing, branching, action queue, balloon | **Merged** (PR #5). Compositing at offsets w/ transparency, frame timing, probabilistic + exit branches, state→animation map, full action queue, styled balloon. Silent (StubTtsProvider). `cycle-3-renderer.md` | +| 4 | MASH demo | Showcase + dogfood the public API early | **Merged** (PR #6). Committed browser playground built only on `@vivify/core`'s public API (vanilla TS + Vite). Ships no `.acs`/MS assets. `cycle-4-mash.md`, ADR-0013 | +| 5 | **Voice spike** (go/no-go) | Authentic voice + mouth timing out of the real engine | **Merged — GATE PASSED** (PR #7). `services/voice-server`: Node HTTP → C++ SAPI4 bridge under Wine; `POST /tts` → `{audio, mouthTimeline}`. `cycle-5-voice.md`, ADR-0014/0015 | +| 6 | Lip-sync + audio integration | Wire the authentic provider; drive mouth from the timeline; Web Speech fallback | **Merged** (PR #8). `@vivify/voice-truvoice` (`TruVoiceProvider` + audible `WebSpeechProvider` fallback); engine plays the WAV and drives lip-sync + word-synced balloon. `cycle-6-lipsync.md`, ADR-0016 | +| 7 | **Authentic mouth density** | Replace interim interpolation with dense per-phoneme mouth events | **Merged** (PR #9). Real-time-audio bridge (`CLSID_MMAudioDest`) emits the dense viseme stream; PulseAudio null-sink dummy device under headless Wine. `cycle-7-realtime-audio.md`, ADR-0019 | +| 8 | Animation return-to-rest | Stop animations freezing on a non-neutral frame / hard-cutting | **Merged** (PR #10). Finished gestures hold their end pose (actions stack); transition through rest only on a new animation or `stop()`. `cycle-8-return-to-rest.md`, ADR-0020 | +| 9 | Dockerized demo | Collapse running MASH to one command — static container + auto-wired voice URL | **Merged** (PR #11). `apps/mash` ships as an nginx static container on :8090; voice URL pre-filled; repo-root `docker-compose.yml`. `cycle-9-dockerize-demo.md`, ADR-0021 | +| 10 | Voice latency — measure + warm | Find where the Speak delay goes; warm the engine at startup | **Merged** (PR #12). Per-stage `[tts-timing]` instrumentation; persistent Xvfb + `wineserver` + startup warmup synth. `cycle-10-latency.md`, ADR-0022 | +| 11 | Voice latency — single-pass | Kill the duplicate synthesis pass and close the structural gap | **Merged** (PR #13). Single real-time pass captured off the null-sink monitor; honest 500 on empty capture; fast `_Exit` closes the teardown gap. `cycle-11-latency-singlepass.md`, ADR-0023 | +| 12 | Disk-persistent TTS cache | Make every repeated phrase instant | **Merged** (PR #14). Response cache keyed by `hash(text+voice)`, served before the synthesis mutex, persisted on a Docker volume. `cycle-12-tts-cache.md`, ADR-0024 | +| 13 | Repo shine | Make the front door shine for a zero-assumptions visitor | **Merged** (PR #15). Banner + rewritten README + GitHub metadata. `cycle-13-repo-shine.md`, ADR-0025 | +| 14 | Docs skeleton + landing | Build the `docs/` page map so front-door links resolve | **Merged** (PR #16). `docs/README.md` landing + canonical page map; real no-dependency pages, stubs signposted elsewhere. `cycle-14-docs-skeleton.md`, ADR-0026 | +| 15 | Voice in one `docker compose up` | Authentic voice with a single command (no host Node/pnpm) | **Merged** (PR #17). Voice image compiles its own `dist/` in-image from the repo root; `speech.h` stays user-supplied. `cycle-15-voice-one-command.md`, ADR-0027 | +| 16 | Per-platform install pages | Hand-held Windows / macOS / Linux setup, zero assumptions | **Merged** (PR #18). Tier 1 (browser voice) + Tier 2 (authentic voice) per platform. `cycle-16-install-pages.md` | +| 17 | Developer documentation | Get a competent developer productive fast | **Merged** (PR #20). The five `docs/developers/*` pages: overview, quickstart, API, providers, bundles. `cycle-17-developer-page.md` | +| 18 | Screenshots + GIFs | Real images of the running app — including Genie talking with the mouth moving | **Merged** (PRs #21 tooling, #22 assets). Playwright capture tooling + committed screenshots/GIFs of the running app (authentic TruVoice lip-sync captured). `cycle-18-screenshots.md`, ADR-0028 | + +## In progress / planned + +| # | Cycle | The point | Status | +| --- | --------------------------- | --------------------------------------------------------------------- | ------------------------------------------------------------------ | +| 19 | Doc drift correctness pass | Fix stale "(coming soon)" markers + bring this roadmap current | **In progress** (this PR). `cycle-19-doc-drift.md` | +| — | Help pages | Write the remaining stubs: getting-started, FAQ, troubleshooting | Planned | +| — | Voice docs | Write `voice/overview.md`, `voice/setup.md`, `voice/sourcing-components.md` | Planned (the install pages already cover the Tier-2 walkthrough — consolidate, don't duplicate) | +| — | Thin-page polish (optional) | Flesh out the characters gallery; round out `architecture.md` | Optional | ## Known long tail (not defeatism, just honesty) -"Any old `.acs` runs" is the goal, but expect quirky characters — gestures-at-point, heavy multi-image frame compositing, unusual branching, non-English voice configs — to need iteration after the common case works. Track these as fixtures with expected output as we find them. + +"Any old `.acs` runs" is the goal, but expect quirky characters — gestures-at-point, heavy multi-image +frame compositing, unusual branching, non-English voice configs — to need iteration after the common case +works. We track these as fixtures with expected output as we find them.