Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ That voice came from old Windows speech software that can't run in a browser dir
a small background helper. It's a more involved, opt-in setup — completely separate from the quick
playground above — and there's a dedicated, hand-held walkthrough for it:

➡️ **[The authentic voice — what it is and how to set it up](docs/voice/overview.md)** _(coming soon)_
➡️ **[The authentic voice — what it is and how to set it up](docs/voice/overview.md)**

> Don't need the original voice? The playground still talks using your browser's built-in voice, so you
> never hit a dead end.
Expand Down Expand Up @@ -179,7 +179,7 @@ clean and legal, _you_ bring two things yourself:
software that you install into the voice helper.

It's all explained, with exactly where to get each piece, in
**[`docs/legal-and-assets.md`](docs/legal-and-assets.md)** (and a friendlier consumer guide is coming in
**[`docs/legal-and-assets.md`](docs/legal-and-assets.md)** (with a friendlier consumer guide in
[`docs/voice/sourcing-components.md`](docs/voice/sourcing-components.md)).

> In short: the _engine_ is ours and open. The _characters and voices_ are their owners', and you supply
Expand Down
94 changes: 94 additions & 0 deletions docs/cycles/cycle-20-voice-docs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Cycle 20 — the voice/\* cluster (authentic-voice docs)

## Goal
Fill the three empty `docs/voice/*` stubs at their canonical paths so the entire "want the real voice?"
CTA chain (README + docs landing + developer pages + glossary) stops dead-ending in a placeholder:

- `docs/voice/overview.md` — what the authentic voice IS and why it needs a helper (the hub).
- `docs/voice/setup.md` — the **conceptual** setup hub (architecture), linking out to the OS steps.
- `docs/voice/sourcing-components.md` — the friendly front door for the three supplied files.

Audience spans curious non-technical users **and** developers. Voice/tone follows the parked
`vision-and-docs-spec.md` (PO pasted it this session): warm, second-person, plain, nostalgia only in
clearly set-off asides. **Docs only — no code; CI stays green.**

## Canonical-home split (the explicit no-duplication / no-contradiction decision)
There are three overlapping things; each gets exactly ONE canonical home, and the voice pages link rather
than re-paste:

| Topic | Canonical home | What the voice pages do |
| --- | --- | --- |
| **Per-platform step-by-step** (install Docker, drop files, run command) | `docs/install/{windows,mac,linux}.md` (Tier 2) — already written | `voice/setup.md` explains the setup **conceptually** and **links** to these; it does NOT re-paste the steps |
| **Detailed sourcing list + IP/legal posture** (the actual places to get the files, ADR-0006/0027) | `docs/legal-and-assets.md` — already canonical | `voice/sourcing-components.md` is the friendly summary that **links into** legal-and-assets §2; it adds **no** direct proprietary download links |
| **What/why the authentic voice is** (conceptual) | NEW: `voice/overview.md` (hub) + `voice/setup.md` (how it fits together) | these are the new canonical conceptual pages |

So: install pages own the *do-it-on-my-OS* steps; legal-and-assets owns the *where-to-get-it* source list +
IP posture; the voice cluster owns the *understand-what-this-is-and-how-it-fits* concepts and routes the
reader to the other two. No step-by-step is duplicated; no sourcing link is duplicated; nothing contradicts
legal-and-assets.

## Verified facts the pages use (cross-checked against the repo)
- **Architecture:** `services/voice-server` = Dockerized **Wine + SAPI4 + L&H TruVoice** behind a thin Node
HTTP API. `POST /tts {text, voice}` → `{audioWavBase64, mouthTimeline, format}`; `GET /health`. Port
**8080**. (`services/voice-server/README.md`, ADR-0014.)
- **Why a backend:** the TruVoice voice is closed 1990s Win32 software (SAPI4) that can't run in a browser
→ it runs in the Wine service; the browser calls it. (ADR-0004.)
- **The 3 user-supplied files**, all under `services/voice-server/vendor/`: `spchapi.exe` (SAPI4 runtime),
`tv_enua.exe` (L&H TruVoice voice), `sdk/include/speech.h` (SAPI4 SDK header, build-time;
`vendor/sdk/include/speech.h`). Build fails loudly naming the exact `speech.h` path if missing.
- **One command:** `docker compose up` runs MASH (:8090) + voice (:8080); the image compiles its own
`dist/` in-image, so **Docker is the only host tool** (Cycle 15 / ADR-0027). `docker compose up mash` =
demo only, silent (no voice binaries needed).
- **How MASH connects:** the voice URL field pre-fills to `http://localhost:8080` (build arg
`VITE_VOICE_SERVER_URL`); the **browser** makes the call (not container-to-container); clearing the field
goes silent. (`apps/mash/src/{app,characters}.ts`, `docker-compose.yml`.)
- **Cache:** repeats are instant — disk cache keyed by `hash(text+voice)`, persisted on the
`vivify-tts-cache` volume (Cycle 12 / ADR-0024).
- **First-utterance note:** the server warms the whole pipeline at startup; a brand-new line may clip its
very first instant slightly, a repeat won't (a cache hit can't clip).
- **Fallback vs authentic:** `WebSpeechProvider` (browser voice, zero backend) vs `TruVoiceProvider`
(authentic, needs the server). (`packages/voice-truvoice/src/index.ts`.)

## README accuracy fixes (these pages become real, so two CTAs are now stale)
- `README.md:~125` — "The authentic voice … `_(coming soon)_`" → drop the marker (overview.md is now real).
- `README.md:~182` — "a friendlier consumer guide **is coming in** `voice/sourcing-components.md`" →
"**is in** …" (the page now exists).
- `docs/README.md` (lines 47–48) already links the three pages cleanly — no change. Glossary + developer
links already point in without a "coming soon" — no change. (No path churn.)

## Pages — shape
Each page: warm intro → the content below → a "Where to next" nav → `← Back to the documentation home`
footer (matching the existing convention).
- **overview.md** (hub): the two voices and the tradeoff (browser fallback = instant, not original;
authentic = the real TruVoice + lip-sync, needs the helper); what TruVoice is; why a helper is needed
(closed Win32, ADR-0004); high-level "what you need" → route to setup + sourcing + install.
- **setup.md** (conceptual): how the pieces fit (MASH in the browser :8090 ↔ voice helper :8080, the call
happens in the browser, pre-filled URL, clear-to-silence); the 3 files + where they live (concept, link
sourcing for where-to-get + install for drop-in); one command `docker compose up` (Docker-only, ADR-0027)
vs `up mash`; the cache (repeats instant); the honest first-utterance clip note; then **hand off** to the
per-platform install guides for the actual steps.
- **sourcing-components.md** (front door): the 3 files in plain language + why you supply them (IP posture,
friendly); where they go; **no direct proprietary links** — link to `legal-and-assets.md` §2 for the
authoritative source list. Cross-link setup + install.

## Acceptance check
- All three pages have real content (no "🚧 Coming soon"), in the spec voice, and cross-link each other +
the docs landing.
- **No install step-by-step is duplicated** (voice/setup links to install pages for OS steps).
- **No contradiction with `legal-and-assets.md`** and **no direct proprietary download links** anywhere in
`voice/*` (sourcing summarizes + links).
- Every documented fact matches the repo (ports, file paths, endpoint shape, one-command flow).
- README CTAs no longer say "(coming soon)" / "is coming" for these now-real pages; every relative link
resolves.
- `pnpm -r typecheck && pnpm -r test && pnpm lint && pnpm format` green (docs only; Markdown
prettier-ignored).

## Verification
- `code-reviewer`: verifies no install-step duplication, no legal-and-assets contradiction, no proprietary
links, every fact matches the repo, all links resolve.
- `grep -rn "coming soon" README.md docs/ | grep -vE 'cycles/|decisions/'` → no hit points at a voice page.

## Non-goals
The help cluster (getting-started, faq, troubleshooting) — next cycle. No edits to the install pages or
`legal-and-assets.md` content (only link/reference accuracy in README). No code. No merge — open a PR
(base `main`) and stop.
63 changes: 57 additions & 6 deletions docs/voice/overview.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,65 @@
# The authentic voice — overview

> 🚧 **Coming soon.** This page lands in **a later cycle**. It's a placeholder for now, so links
> pointing here already work — no dead ends.
Your character can talk two ways. Out of the box, it uses your **browser's** built-in voice — that works
instantly, with nothing to install. But vivify can also give it its **real, original voice**: the actual
late-1990s synthesizer the Microsoft Agent characters spoke with. This page explains what that is, and why
it takes a little extra setup.

**What it'll cover:** what the "authentic voice" is, in plain language, and why it runs in a small background helper.
> 💾 **Remember when…** every program had _that_ slightly robotic voice reading text aloud? That voice was
> real software — and it still works. We just had to coax it into the modern web. (Skip this box; it
> changes no instruction.)

In the meantime:
## The two voices

- New to all of this? Start with **[What is this?](../what-is-this.md)**.
- Want to try it right now? See the **[main README](../../README.md)**.
| | **Browser voice** (default) | **Authentic voice** (TruVoice) |
| --- | --- | --- |
| Sounds like | your computer's modern built-in voice | the **original** 1990s character voice |
| Lip-sync | approximate (a best guess) | exact — driven by the real engine's mouth data |
| Setup | none — it just works | a small one-time setup (Docker + three free files you supply) |
| Best for | "I just want to hear it talk" | "I want the _real_ thing" |

Neither is wrong. The browser voice means you **never hit a dead end** — a character can always speak. The
authentic voice is the enthusiast upgrade, and it's the one that sounds like you remember.

## What "the authentic voice" actually is

It's **L&H TruVoice**, the text-to-speech voice that shipped with Microsoft Agent, driven by **SAPI 4**
(Microsoft's Speech API, version 4). That's the genuine article — not a modern soundalike. When Genie
speaks in his real voice, you're hearing the same engine people heard in 1998.

## Why it needs a small "helper"

Here's the catch: TruVoice and SAPI 4 are **closed 1990s Windows programs**. They were never meant to run
inside a web browser, and they can't — a browser has no way to load that old Windows software directly.

So vivify does the next best thing: it runs that original software in a **small background helper** — a
little service on your own machine that knows how to speak in the real voice. When your character talks,
your browser quietly asks the helper for the audio (and the precise mouth movements for lip-sync), and
plays it back. You never interact with the helper directly; it just sits there and does the voice.

This "the authentic voice lives in a backend service" decision is recorded in
[ADR-0004](../decisions/0004-authentic-voice-requires-backend.md), for the curious.

## What you'll need

Three things, all free, and only for this authentic-voice path:

1. **Docker** — the one tool that runs the helper for you (it's also what runs the playground).
2. **Three small files you supply yourself** — the original speech software. vivify ships **none** of it
(it's not ours to give away), so you bring your own copies, once.
3. About a minute of setup.

That's it. The playground and the browser voice need none of this.

## Where to next

- **How does it all fit together?** → **[Setting up the authentic voice](setup.md)** — the concepts and
the one-command flow.
- **Where do those three files come from?** → **[Where to get the voice components](sourcing-components.md)**.
- **Just give me the steps for my computer.** → the platform guides:
**[Windows](../install/windows.md)** · **[macOS](../install/mac.md)** · **[Linux](../install/linux.md)**
(each has an optional "Tier 2 — authentic voice" section).
- **What's a `.acs`? SAPI? lip-sync?** → the **[Glossary](../glossary.md)**, every term in plain English.

---

Expand Down
77 changes: 71 additions & 6 deletions docs/voice/setup.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,79 @@
# Setting up the authentic voice

> 🚧 **Coming soon.** This page lands in **a later cycle**. It's a placeholder for now, so links
> pointing here already work — no dead ends.
This page explains **how the authentic voice fits together** — the handful of pieces and how they talk to
each other — so the actual setup makes sense. When you're ready for the click-by-click steps on _your_
computer, this page hands you off to your platform's guide; it doesn't repeat them here.

**What it'll cover:** the full, hand-held walkthrough for the TruVoice + Wine + Docker voice service, per platform.
New to the whole idea? Start with **[The authentic voice — overview](overview.md)** first.

In the meantime:
## How the pieces fit together

- New to all of this? Start with **[What is this?](../what-is-this.md)**.
- Want to try it right now? See the **[main README](../../README.md)**.
There are just two moving parts, and they both run on your own machine via **Docker**:

- **The playground (MASH)** — the web app you load in your browser at **`http://localhost:8090`**.
- **The voice helper** — a small background service at **`http://localhost:8080`** that runs the original
speech software and knows how to speak in the real voice.

When you click **Speak**, the magic is quietly ordinary: **your browser** sends the line of text to the
voice helper, and the helper sends back the audio plus the exact mouth movements for lip-sync. The
character plays it. (The call goes browser → helper, not container-to-container — which is why both just
publish their address on your machine.)

You don't configure any of this by hand. The playground's **"Voice server URL"** field is **pre-filled**
with `http://localhost:8080`, so sound just works the moment the helper is running. Clear that field and
the character goes **silent** — a handy escape hatch, never a dead end.

## What the helper needs: three files you supply

The voice helper runs genuine closed 1990s speech software, and **vivify ships none of it**. You drop
**three** files into one folder — `services/voice-server/vendor/` — once:

| File | What it is | Goes at |
| --- | --- | --- |
| `spchapi.exe` | the SAPI 4 speech runtime | `services/voice-server/vendor/spchapi.exe` |
| `tv_enua.exe` | the L&H TruVoice voice (Genie & friends) | `services/voice-server/vendor/tv_enua.exe` |
| `speech.h` | the SAPI 4 SDK header the helper compiles against | `services/voice-server/vendor/sdk/include/speech.h` |

Where these come from is its own page: **[Where to get the voice components](sourcing-components.md)**.
(If the build ever stops complaining that `speech.h` is missing, it's that third file in the
`sdk/include/` sub-folder — the build message names the exact spot.)

## One command runs it all

Once the three files are in place, the whole thing — playground **and** voice — comes up with a single
command from the project folder:

```bash
docker compose up
```

**Docker is the only tool you need on your computer.** The voice helper compiles itself _inside_ its own
Docker image, so there's no programming toolchain to install (that's [ADR-0027](../decisions/0027-voice-one-command-build.md)).
The first build is slower because it sets up the speech engine; after that it's cached and quick. To run
**just** the silent playground without any of the voice files, use `docker compose up mash` instead.

## Two things that are normal (so they don't surprise you)

- **Repeats are instant.** The helper remembers every line it has spoken (a disk cache, kept between
restarts), so saying the same sentence again comes back immediately — no waiting.
- **A brand-new line may clip its very first instant.** The first time the helper speaks a sentence it has
never said, the opening moment can be ever-so-slightly clipped. It's minor, it won't happen when you
repeat that line, and the helper warms itself up at startup to keep it small. Totally normal.

## Now do it on your computer

The step-by-step — install Docker, get the project, drop the files, run it — lives in your platform's
install guide, under its **"Tier 2 — the authentic voice"** section:

- 🪟 **[Install on Windows](../install/windows.md)**
- 🍎 **[Install on macOS](../install/mac.md)**
- 🐧 **[Install on Linux](../install/linux.md)**

## Where to next

- **What is this voice, really?** → **[The authentic voice — overview](overview.md)**.
- **Where do the three files come from?** → **[Where to get the voice components](sourcing-components.md)**.
- **The legal/IP details** → **[Legal & assets](../legal-and-assets.md)**.

---

Expand Down
53 changes: 47 additions & 6 deletions docs/voice/sourcing-components.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,55 @@
# Where to get the voice components

> 🚧 **Coming soon.** This page lands in **a later cycle**. It's a placeholder for now, so links
> pointing here already work — no dead ends.
The authentic voice needs three small files that **you supply yourself**. This is the friendly tour of
_what_ they are and _why_ you bring your own. For the exact, up-to-date list of where to download each one,
this page points you to **[Legal & assets](../legal-and-assets.md)**, which is the single source of truth
for sourcing.

**What it'll cover:** a friendly, consumer-focused guide to the free speech components you supply yourself.
New here? The **[authentic voice overview](overview.md)** explains the big picture first.

In the meantime:
## Why _you_ supply them (and we don't)

- New to all of this? Start with **[What is this?](../what-is-this.md)**.
- Want to try it right now? See the **[main README](../../README.md)**.
The original character voice is **closed 1990s Microsoft / Lernout & Hauspie speech software**. It isn't
ours to give away — so vivify ships none of it, never bundles it, and never auto-downloads it. You bring
your own copies, once. That's also what keeps vivify itself free, open, and clean to share. (The full
posture is [ADR-0006](../decisions/0006-permissive-license-no-bundled-ip.md) and
[ADR-0027](../decisions/0027-voice-one-command-build.md).)

It's a one-time thing, and the files are free and findable — they're old, archived software, not something
you buy.

## The three files

| File | In plain terms | What it's for |
| --- | --- | --- |
| **`spchapi.exe`** | the SAPI 4 speech runtime | the engine that turns text into speech |
| **`tv_enua.exe`** | the L&H TruVoice voice (American English) | the actual voice — Genie's "voice", and friends |
| **`speech.h`** | the SAPI 4 SDK header | a small build-time file the voice helper compiles against |

The first two are the speech engine and its voice. The third, `speech.h`, is a developer header file used
only while the helper builds itself — it carries Microsoft's copyright, so it gets the **same treatment**
as the binaries: user-supplied, never committed, never auto-fetched.

They all go in one place — `services/voice-server/vendor/` — with `speech.h` in a `sdk/include/`
sub-folder. The **[setup page](setup.md)** shows how they fit in; your **install guide** shows exactly
where to drop them.

## Where to actually download them

We don't link the proprietary files directly here, on purpose. Instead,
**[Legal & assets → §2 "Speech runtime"](../legal-and-assets.md)** lists the community sources that are
known to work (and which file comes from where). Start there — it's kept current and explains each option.

> One handy fact: you do **not** need the old Microsoft Agent program itself (`msagent.exe`) — vivify
> reimplements the character engine. Only these _speech_ pieces are needed, and only for the authentic
> voice.

## Where to next

- **The authoritative source list** → **[Legal & assets](../legal-and-assets.md)**.
- **How the voice setup fits together** → **[Setting up the authentic voice](setup.md)**.
- **Do it on your computer** → **[Windows](../install/windows.md)** · **[macOS](../install/mac.md)** ·
**[Linux](../install/linux.md)**.

---

Expand Down
Loading