diff --git a/README.md b/README.md index 4db97e3..d2e3064 100644 --- a/README.md +++ b/README.md @@ -122,7 +122,7 @@ That voice came from old Windows speech software that can't run in a browser dir a small background helper. It's a more involved, opt-in setup — completely separate from the quick playground above — and there's a dedicated, hand-held walkthrough for it: -➡️ **[The authentic voice — what it is and how to set it up](docs/voice/overview.md)** _(coming soon)_ +➡️ **[The authentic voice — what it is and how to set it up](docs/voice/overview.md)** > Don't need the original voice? The playground still talks using your browser's built-in voice, so you > never hit a dead end. @@ -179,7 +179,7 @@ clean and legal, _you_ bring two things yourself: software that you install into the voice helper. It's all explained, with exactly where to get each piece, in -**[`docs/legal-and-assets.md`](docs/legal-and-assets.md)** (and a friendlier consumer guide is coming in +**[`docs/legal-and-assets.md`](docs/legal-and-assets.md)** (with a friendlier consumer guide in [`docs/voice/sourcing-components.md`](docs/voice/sourcing-components.md)). > In short: the _engine_ is ours and open. The _characters and voices_ are their owners', and you supply diff --git a/docs/cycles/cycle-20-voice-docs.md b/docs/cycles/cycle-20-voice-docs.md new file mode 100644 index 0000000..be7952c --- /dev/null +++ b/docs/cycles/cycle-20-voice-docs.md @@ -0,0 +1,94 @@ +# Cycle 20 — the voice/\* cluster (authentic-voice docs) + +## Goal +Fill the three empty `docs/voice/*` stubs at their canonical paths so the entire "want the real voice?" +CTA chain (README + docs landing + developer pages + glossary) stops dead-ending in a placeholder: + +- `docs/voice/overview.md` — what the authentic voice IS and why it needs a helper (the hub). +- `docs/voice/setup.md` — the **conceptual** setup hub (architecture), linking out to the OS steps. +- `docs/voice/sourcing-components.md` — the friendly front door for the three supplied files. + +Audience spans curious non-technical users **and** developers. Voice/tone follows the parked +`vision-and-docs-spec.md` (PO pasted it this session): warm, second-person, plain, nostalgia only in +clearly set-off asides. **Docs only — no code; CI stays green.** + +## Canonical-home split (the explicit no-duplication / no-contradiction decision) +There are three overlapping things; each gets exactly ONE canonical home, and the voice pages link rather +than re-paste: + +| Topic | Canonical home | What the voice pages do | +| --- | --- | --- | +| **Per-platform step-by-step** (install Docker, drop files, run command) | `docs/install/{windows,mac,linux}.md` (Tier 2) — already written | `voice/setup.md` explains the setup **conceptually** and **links** to these; it does NOT re-paste the steps | +| **Detailed sourcing list + IP/legal posture** (the actual places to get the files, ADR-0006/0027) | `docs/legal-and-assets.md` — already canonical | `voice/sourcing-components.md` is the friendly summary that **links into** legal-and-assets §2; it adds **no** direct proprietary download links | +| **What/why the authentic voice is** (conceptual) | NEW: `voice/overview.md` (hub) + `voice/setup.md` (how it fits together) | these are the new canonical conceptual pages | + +So: install pages own the *do-it-on-my-OS* steps; legal-and-assets owns the *where-to-get-it* source list + +IP posture; the voice cluster owns the *understand-what-this-is-and-how-it-fits* concepts and routes the +reader to the other two. No step-by-step is duplicated; no sourcing link is duplicated; nothing contradicts +legal-and-assets. + +## Verified facts the pages use (cross-checked against the repo) +- **Architecture:** `services/voice-server` = Dockerized **Wine + SAPI4 + L&H TruVoice** behind a thin Node + HTTP API. `POST /tts {text, voice}` → `{audioWavBase64, mouthTimeline, format}`; `GET /health`. Port + **8080**. (`services/voice-server/README.md`, ADR-0014.) +- **Why a backend:** the TruVoice voice is closed 1990s Win32 software (SAPI4) that can't run in a browser + → it runs in the Wine service; the browser calls it. (ADR-0004.) +- **The 3 user-supplied files**, all under `services/voice-server/vendor/`: `spchapi.exe` (SAPI4 runtime), + `tv_enua.exe` (L&H TruVoice voice), `sdk/include/speech.h` (SAPI4 SDK header, build-time; + `vendor/sdk/include/speech.h`). Build fails loudly naming the exact `speech.h` path if missing. +- **One command:** `docker compose up` runs MASH (:8090) + voice (:8080); the image compiles its own + `dist/` in-image, so **Docker is the only host tool** (Cycle 15 / ADR-0027). `docker compose up mash` = + demo only, silent (no voice binaries needed). +- **How MASH connects:** the voice URL field pre-fills to `http://localhost:8080` (build arg + `VITE_VOICE_SERVER_URL`); the **browser** makes the call (not container-to-container); clearing the field + goes silent. (`apps/mash/src/{app,characters}.ts`, `docker-compose.yml`.) +- **Cache:** repeats are instant — disk cache keyed by `hash(text+voice)`, persisted on the + `vivify-tts-cache` volume (Cycle 12 / ADR-0024). +- **First-utterance note:** the server warms the whole pipeline at startup; a brand-new line may clip its + very first instant slightly, a repeat won't (a cache hit can't clip). +- **Fallback vs authentic:** `WebSpeechProvider` (browser voice, zero backend) vs `TruVoiceProvider` + (authentic, needs the server). (`packages/voice-truvoice/src/index.ts`.) + +## README accuracy fixes (these pages become real, so two CTAs are now stale) +- `README.md:~125` — "The authentic voice … `_(coming soon)_`" → drop the marker (overview.md is now real). +- `README.md:~182` — "a friendlier consumer guide **is coming in** `voice/sourcing-components.md`" → + "**is in** …" (the page now exists). +- `docs/README.md` (lines 47–48) already links the three pages cleanly — no change. Glossary + developer + links already point in without a "coming soon" — no change. (No path churn.) + +## Pages — shape +Each page: warm intro → the content below → a "Where to next" nav → `← Back to the documentation home` +footer (matching the existing convention). +- **overview.md** (hub): the two voices and the tradeoff (browser fallback = instant, not original; + authentic = the real TruVoice + lip-sync, needs the helper); what TruVoice is; why a helper is needed + (closed Win32, ADR-0004); high-level "what you need" → route to setup + sourcing + install. +- **setup.md** (conceptual): how the pieces fit (MASH in the browser :8090 ↔ voice helper :8080, the call + happens in the browser, pre-filled URL, clear-to-silence); the 3 files + where they live (concept, link + sourcing for where-to-get + install for drop-in); one command `docker compose up` (Docker-only, ADR-0027) + vs `up mash`; the cache (repeats instant); the honest first-utterance clip note; then **hand off** to the + per-platform install guides for the actual steps. +- **sourcing-components.md** (front door): the 3 files in plain language + why you supply them (IP posture, + friendly); where they go; **no direct proprietary links** — link to `legal-and-assets.md` §2 for the + authoritative source list. Cross-link setup + install. + +## Acceptance check +- All three pages have real content (no "🚧 Coming soon"), in the spec voice, and cross-link each other + + the docs landing. +- **No install step-by-step is duplicated** (voice/setup links to install pages for OS steps). +- **No contradiction with `legal-and-assets.md`** and **no direct proprietary download links** anywhere in + `voice/*` (sourcing summarizes + links). +- Every documented fact matches the repo (ports, file paths, endpoint shape, one-command flow). +- README CTAs no longer say "(coming soon)" / "is coming" for these now-real pages; every relative link + resolves. +- `pnpm -r typecheck && pnpm -r test && pnpm lint && pnpm format` green (docs only; Markdown + prettier-ignored). + +## Verification +- `code-reviewer`: verifies no install-step duplication, no legal-and-assets contradiction, no proprietary + links, every fact matches the repo, all links resolve. +- `grep -rn "coming soon" README.md docs/ | grep -vE 'cycles/|decisions/'` → no hit points at a voice page. + +## Non-goals +The help cluster (getting-started, faq, troubleshooting) — next cycle. No edits to the install pages or +`legal-and-assets.md` content (only link/reference accuracy in README). No code. No merge — open a PR +(base `main`) and stop. diff --git a/docs/voice/overview.md b/docs/voice/overview.md index efa6662..b061316 100644 --- a/docs/voice/overview.md +++ b/docs/voice/overview.md @@ -1,14 +1,65 @@ # The authentic voice — overview -> 🚧 **Coming soon.** This page lands in **a later cycle**. It's a placeholder for now, so links -> pointing here already work — no dead ends. +Your character can talk two ways. Out of the box, it uses your **browser's** built-in voice — that works +instantly, with nothing to install. But vivify can also give it its **real, original voice**: the actual +late-1990s synthesizer the Microsoft Agent characters spoke with. This page explains what that is, and why +it takes a little extra setup. -**What it'll cover:** what the "authentic voice" is, in plain language, and why it runs in a small background helper. +> 💾 **Remember when…** every program had _that_ slightly robotic voice reading text aloud? That voice was +> real software — and it still works. We just had to coax it into the modern web. (Skip this box; it +> changes no instruction.) -In the meantime: +## The two voices -- New to all of this? Start with **[What is this?](../what-is-this.md)**. -- Want to try it right now? See the **[main README](../../README.md)**. +| | **Browser voice** (default) | **Authentic voice** (TruVoice) | +| --- | --- | --- | +| Sounds like | your computer's modern built-in voice | the **original** 1990s character voice | +| Lip-sync | approximate (a best guess) | exact — driven by the real engine's mouth data | +| Setup | none — it just works | a small one-time setup (Docker + three free files you supply) | +| Best for | "I just want to hear it talk" | "I want the _real_ thing" | + +Neither is wrong. The browser voice means you **never hit a dead end** — a character can always speak. The +authentic voice is the enthusiast upgrade, and it's the one that sounds like you remember. + +## What "the authentic voice" actually is + +It's **L&H TruVoice**, the text-to-speech voice that shipped with Microsoft Agent, driven by **SAPI 4** +(Microsoft's Speech API, version 4). That's the genuine article — not a modern soundalike. When Genie +speaks in his real voice, you're hearing the same engine people heard in 1998. + +## Why it needs a small "helper" + +Here's the catch: TruVoice and SAPI 4 are **closed 1990s Windows programs**. They were never meant to run +inside a web browser, and they can't — a browser has no way to load that old Windows software directly. + +So vivify does the next best thing: it runs that original software in a **small background helper** — a +little service on your own machine that knows how to speak in the real voice. When your character talks, +your browser quietly asks the helper for the audio (and the precise mouth movements for lip-sync), and +plays it back. You never interact with the helper directly; it just sits there and does the voice. + +This "the authentic voice lives in a backend service" decision is recorded in +[ADR-0004](../decisions/0004-authentic-voice-requires-backend.md), for the curious. + +## What you'll need + +Three things, all free, and only for this authentic-voice path: + +1. **Docker** — the one tool that runs the helper for you (it's also what runs the playground). +2. **Three small files you supply yourself** — the original speech software. vivify ships **none** of it + (it's not ours to give away), so you bring your own copies, once. +3. About a minute of setup. + +That's it. The playground and the browser voice need none of this. + +## Where to next + +- **How does it all fit together?** → **[Setting up the authentic voice](setup.md)** — the concepts and + the one-command flow. +- **Where do those three files come from?** → **[Where to get the voice components](sourcing-components.md)**. +- **Just give me the steps for my computer.** → the platform guides: + **[Windows](../install/windows.md)** · **[macOS](../install/mac.md)** · **[Linux](../install/linux.md)** + (each has an optional "Tier 2 — authentic voice" section). +- **What's a `.acs`? SAPI? lip-sync?** → the **[Glossary](../glossary.md)**, every term in plain English. --- diff --git a/docs/voice/setup.md b/docs/voice/setup.md index 078d0c8..2da581a 100644 --- a/docs/voice/setup.md +++ b/docs/voice/setup.md @@ -1,14 +1,79 @@ # Setting up the authentic voice -> 🚧 **Coming soon.** This page lands in **a later cycle**. It's a placeholder for now, so links -> pointing here already work — no dead ends. +This page explains **how the authentic voice fits together** — the handful of pieces and how they talk to +each other — so the actual setup makes sense. When you're ready for the click-by-click steps on _your_ +computer, this page hands you off to your platform's guide; it doesn't repeat them here. -**What it'll cover:** the full, hand-held walkthrough for the TruVoice + Wine + Docker voice service, per platform. +New to the whole idea? Start with **[The authentic voice — overview](overview.md)** first. -In the meantime: +## How the pieces fit together -- New to all of this? Start with **[What is this?](../what-is-this.md)**. -- Want to try it right now? See the **[main README](../../README.md)**. +There are just two moving parts, and they both run on your own machine via **Docker**: + +- **The playground (MASH)** — the web app you load in your browser at **`http://localhost:8090`**. +- **The voice helper** — a small background service at **`http://localhost:8080`** that runs the original + speech software and knows how to speak in the real voice. + +When you click **Speak**, the magic is quietly ordinary: **your browser** sends the line of text to the +voice helper, and the helper sends back the audio plus the exact mouth movements for lip-sync. The +character plays it. (The call goes browser → helper, not container-to-container — which is why both just +publish their address on your machine.) + +You don't configure any of this by hand. The playground's **"Voice server URL"** field is **pre-filled** +with `http://localhost:8080`, so sound just works the moment the helper is running. Clear that field and +the character goes **silent** — a handy escape hatch, never a dead end. + +## What the helper needs: three files you supply + +The voice helper runs genuine closed 1990s speech software, and **vivify ships none of it**. You drop +**three** files into one folder — `services/voice-server/vendor/` — once: + +| File | What it is | Goes at | +| --- | --- | --- | +| `spchapi.exe` | the SAPI 4 speech runtime | `services/voice-server/vendor/spchapi.exe` | +| `tv_enua.exe` | the L&H TruVoice voice (Genie & friends) | `services/voice-server/vendor/tv_enua.exe` | +| `speech.h` | the SAPI 4 SDK header the helper compiles against | `services/voice-server/vendor/sdk/include/speech.h` | + +Where these come from is its own page: **[Where to get the voice components](sourcing-components.md)**. +(If the build ever stops complaining that `speech.h` is missing, it's that third file in the +`sdk/include/` sub-folder — the build message names the exact spot.) + +## One command runs it all + +Once the three files are in place, the whole thing — playground **and** voice — comes up with a single +command from the project folder: + +```bash +docker compose up +``` + +**Docker is the only tool you need on your computer.** The voice helper compiles itself _inside_ its own +Docker image, so there's no programming toolchain to install (that's [ADR-0027](../decisions/0027-voice-one-command-build.md)). +The first build is slower because it sets up the speech engine; after that it's cached and quick. To run +**just** the silent playground without any of the voice files, use `docker compose up mash` instead. + +## Two things that are normal (so they don't surprise you) + +- **Repeats are instant.** The helper remembers every line it has spoken (a disk cache, kept between + restarts), so saying the same sentence again comes back immediately — no waiting. +- **A brand-new line may clip its very first instant.** The first time the helper speaks a sentence it has + never said, the opening moment can be ever-so-slightly clipped. It's minor, it won't happen when you + repeat that line, and the helper warms itself up at startup to keep it small. Totally normal. + +## Now do it on your computer + +The step-by-step — install Docker, get the project, drop the files, run it — lives in your platform's +install guide, under its **"Tier 2 — the authentic voice"** section: + +- 🪟 **[Install on Windows](../install/windows.md)** +- 🍎 **[Install on macOS](../install/mac.md)** +- 🐧 **[Install on Linux](../install/linux.md)** + +## Where to next + +- **What is this voice, really?** → **[The authentic voice — overview](overview.md)**. +- **Where do the three files come from?** → **[Where to get the voice components](sourcing-components.md)**. +- **The legal/IP details** → **[Legal & assets](../legal-and-assets.md)**. --- diff --git a/docs/voice/sourcing-components.md b/docs/voice/sourcing-components.md index b2d4413..da1f50d 100644 --- a/docs/voice/sourcing-components.md +++ b/docs/voice/sourcing-components.md @@ -1,14 +1,55 @@ # Where to get the voice components -> 🚧 **Coming soon.** This page lands in **a later cycle**. It's a placeholder for now, so links -> pointing here already work — no dead ends. +The authentic voice needs three small files that **you supply yourself**. This is the friendly tour of +_what_ they are and _why_ you bring your own. For the exact, up-to-date list of where to download each one, +this page points you to **[Legal & assets](../legal-and-assets.md)**, which is the single source of truth +for sourcing. -**What it'll cover:** a friendly, consumer-focused guide to the free speech components you supply yourself. +New here? The **[authentic voice overview](overview.md)** explains the big picture first. -In the meantime: +## Why _you_ supply them (and we don't) -- New to all of this? Start with **[What is this?](../what-is-this.md)**. -- Want to try it right now? See the **[main README](../../README.md)**. +The original character voice is **closed 1990s Microsoft / Lernout & Hauspie speech software**. It isn't +ours to give away — so vivify ships none of it, never bundles it, and never auto-downloads it. You bring +your own copies, once. That's also what keeps vivify itself free, open, and clean to share. (The full +posture is [ADR-0006](../decisions/0006-permissive-license-no-bundled-ip.md) and +[ADR-0027](../decisions/0027-voice-one-command-build.md).) + +It's a one-time thing, and the files are free and findable — they're old, archived software, not something +you buy. + +## The three files + +| File | In plain terms | What it's for | +| --- | --- | --- | +| **`spchapi.exe`** | the SAPI 4 speech runtime | the engine that turns text into speech | +| **`tv_enua.exe`** | the L&H TruVoice voice (American English) | the actual voice — Genie's "voice", and friends | +| **`speech.h`** | the SAPI 4 SDK header | a small build-time file the voice helper compiles against | + +The first two are the speech engine and its voice. The third, `speech.h`, is a developer header file used +only while the helper builds itself — it carries Microsoft's copyright, so it gets the **same treatment** +as the binaries: user-supplied, never committed, never auto-fetched. + +They all go in one place — `services/voice-server/vendor/` — with `speech.h` in a `sdk/include/` +sub-folder. The **[setup page](setup.md)** shows how they fit in; your **install guide** shows exactly +where to drop them. + +## Where to actually download them + +We don't link the proprietary files directly here, on purpose. Instead, +**[Legal & assets → §2 "Speech runtime"](../legal-and-assets.md)** lists the community sources that are +known to work (and which file comes from where). Start there — it's kept current and explains each option. + +> One handy fact: you do **not** need the old Microsoft Agent program itself (`msagent.exe`) — vivify +> reimplements the character engine. Only these _speech_ pieces are needed, and only for the authentic +> voice. + +## Where to next + +- **The authoritative source list** → **[Legal & assets](../legal-and-assets.md)**. +- **How the voice setup fits together** → **[Setting up the authentic voice](setup.md)**. +- **Do it on your computer** → **[Windows](../install/windows.md)** · **[macOS](../install/mac.md)** · + **[Linux](../install/linux.md)**. ---