diff --git a/docker-compose.yml b/docker-compose.yml index 47efc93..5f14ca5 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -8,8 +8,9 @@ # handled by the voice server reflecting the request Origin.) # # `docker compose up mash` runs just the demo (no voice binaries required). The `voice` -# service needs your user-supplied engine files in services/voice-server/vendor/ and a -# prebuilt dist (`pnpm --filter @vivify/voice-server typecheck`) — see that service's README. +# service needs your user-supplied engine files in services/voice-server/vendor/ +# (spchapi.exe, tv_enua.exe, sdk/include/speech.h). It compiles its own dist/ in-image +# (Cycle 15), so no host Node/pnpm is needed — just `docker compose up`. See that service's README. services: mash: build: @@ -25,7 +26,13 @@ services: voice: build: - context: services/voice-server + # Cycle 15: context is the REPO ROOT so the image can compile the server's dist/ from the + # pnpm workspace itself (no host Node/pnpm, no prebuilt dist). The Dockerfile's own ignore + # (services/voice-server/Dockerfile.dockerignore) lets this build read the gitignored + # vendor/ (user-supplied SAPI4/TruVoice + speech.h) while the root .dockerignore keeps + # vendor out of the MASH image. + context: . + dockerfile: services/voice-server/Dockerfile image: vivify-voice ports: - '8080:8080' diff --git a/docs/cycles/cycle-15-voice-one-command.md b/docs/cycles/cycle-15-voice-one-command.md new file mode 100644 index 0000000..4147fdc --- /dev/null +++ b/docs/cycles/cycle-15-voice-one-command.md @@ -0,0 +1,82 @@ +# Cycle 15 — authentic voice in one `docker compose up` + +## Goal +Make the authentic TruVoice voice run with a single `docker compose up` once the user has dropped in their +supplied files — **no host Node/pnpm, no manual `dist` build**. Before this cycle the voice image only +`COPY`'d a host-prebuilt `dist/`, so a user had to install Node 20 + pnpm, `pnpm install`, and run a +typecheck to emit `dist/` before building. This cycle moves that build **into the image**. Ports (MASH +8090 / voice 8080), the TTS cache, and its named volume are unchanged. Code cycle — the operator rebuilds ++ tests the full Wine path. + +## The change + +### 1. In-image `dist` build (multi-stage Dockerfile) +`services/voice-server/Dockerfile` gains a first stage that compiles the server itself, mirroring the +proven `apps/mash/Dockerfile`: + +```dockerfile +FROM node:20-slim AS build +RUN corepack enable # pnpm@9.15.0 (pinned in root package.json) +WORKDIR /repo +COPY . . +RUN pnpm install --frozen-lockfile +RUN pnpm --filter @vivify/voice-server run build # tsc --build → dist/ +``` + +The runtime (Debian + Wine + SAPI4) stage then does `COPY --from=build +/repo/services/voice-server/dist /opt/vivify/dist/` instead of `COPY dist/`. A new +`"build": "tsc --build"` script in `services/voice-server/package.json` builds **only** the emit (the +`@vivify/types` project reference first, then the server) — not the test typecheck. `@vivify/types` imports +are type-only (all `import type`), so nothing from the workspace ships at runtime. The final image keeps +the Node **runtime** but **no pnpm / TypeScript toolchain** (those live only in the discarded build stage). + +### 2. Build context → repo root +The build now needs the pnpm workspace (lockfile, `packages/types`), so `docker-compose.yml`'s `voice` +service switches to `build: { context: ., dockerfile: services/voice-server/Dockerfile }` — exactly how +`mash` builds. Every runtime-stage `COPY` source gets the `services/voice-server/` prefix (`vendor/`, +`bridge/`, `pulse-null.pa`, `entrypoint.sh`). + +### 3. Per-Dockerfile ignore so the voice build can read `vendor/` +The root `.dockerignore` excludes `services/voice-server/vendor/` so the proprietary engine can never enter +the **MASH** image (which also builds from the root and does `COPY . .`). But the **voice** image must read +`vendor/` at build time. Solution: a Dockerfile-specific ignore, +`services/voice-server/Dockerfile.dockerignore`, which BuildKit uses **instead of** the root ignore for the +voice build. It mirrors the root ignore **except** it allows `vendor/`. The root ignore is unchanged, so +MASH's posture is untouched. + +### 4. `speech.h` stays user-supplied (license decision) +The SAPI4 SDK header carries _"Copyright 1994-1998 Microsoft Corporation. All rights reserved."_ with no +redistribution grant. Auto-fetching it (even at build time from a third-party mirror) would make our build +reproduce Microsoft IP with no license, violating +[ADR-0006](../decisions/0006-permissive-license-no-bundled-ip.md) / the zero-bundled-IP rule. So it stays +**user-supplied** under the gitignored `services/voice-server/vendor/sdk/include/speech.h`; the build +**fails loudly** with the exact drop path + a pointer to `docs/legal-and-assets.md` if it's missing. (A +future clean-room header could remove it entirely — out of scope; see ADR-0027.) + +## What is verified where +- **CI (this repo):** `pnpm --filter @vivify/voice-server run build` emits `dist/`; `pnpm -r typecheck && + pnpm -r test && pnpm lint && pnpm format` green (the compose YAML is prettier-clean; no `src`/test + change). +- **Docker, in this sandbox (verified, not assumed):** `docker build --target build -f + services/voice-server/Dockerfile .` ran `pnpm install` + `tsc --build` **inside** the image and emitted + `/repo/services/voice-server/dist/main.js` — proving the host needs no toolchain. Running that stage + confirmed the Dockerfile-specific ignore lets the build read `services/voice-server/vendor/` + (`spchapi.exe`, `tv_enua.exe`, `sdk/include/speech.h`) and the other runtime COPY sources (`bridge/`, + `pulse-null.pa`, `entrypoint.sh`), while bridge build artifacts stay excluded. +- **Operator (the acceptance — full Wine path can't run in the sandbox):** from a clean checkout, drop the + **3** files into `services/voice-server/vendor/`, then `docker compose build --no-cache && docker compose + up` → both containers up, MASH on 8090, voice on 8080, upload a `.acs` → Speak → authentic Genie (first + synthesis ~3–4s; repeats instant via the cache). No host Node/pnpm. The Debian/Wine/SAPI4 install steps + are environment-specific and remain operator-validated (the same boundary as every voice cycle). + +### Final minimal steps (after this cycle) +- **Was:** install Node + pnpm → `pnpm install` → `pnpm --filter @vivify/voice-server typecheck` (build + dist) → drop 3 files → `docker compose up`. +- **Now:** drop 3 user-supplied files into `services/voice-server/vendor/` — `spchapi.exe`, `tv_enua.exe`, + `sdk/include/speech.h` (sources in [`docs/legal-and-assets.md`](../legal-and-assets.md)) → `docker + compose up`. **Docker is the only host tool.** + +## Non-goals +The full per-platform install-page rewrite (`docs/install/*`) is the deferred docs cycle — this cycle only +updates the voice-server README to the one-command flow. No `@vivify/core`/browser change. Removing +`speech.h` via a clean-room header is a possible future cycle, not this one. See ADR-0027. diff --git a/docs/decisions/0027-voice-one-command-build.md b/docs/decisions/0027-voice-one-command-build.md new file mode 100644 index 0000000..e2b9540 --- /dev/null +++ b/docs/decisions/0027-voice-one-command-build.md @@ -0,0 +1,32 @@ +# ADR-0027: authentic voice in one `docker compose up` — compile the server's dist/ inside the image, build from the repo root, keep speech.h user-supplied +Status: Accepted · Date: 2026-06-21 + +## Context +Running the authentic TruVoice voice secretly required a **host toolchain**. The voice image only `COPY`'d a host-prebuilt `dist/`, so a user had to install Node 20 + pnpm, run `pnpm install`, and run a typecheck to emit `dist/` **before** `docker compose up`. The goal of Cycle 15 was one `docker compose up` once the user drops in their supplied files — **no host Node/pnpm, no manual dist build; Docker as the only host tool**. + +This is a code cycle (Dockerfile + compose). It is Tier-2 / authentic-voice context — the zero-bundled-IP rule ([ADR-0006](0006-permissive-license-no-bundled-ip.md)) is binding, and the full Debian + Wine + SAPI4 path cannot run in vivify's sandbox. + +## Decision + +**1. Compile the server's `dist/` INSIDE the image (multi-stage).** +A `node:20-slim` `build` stage runs corepack (pnpm@9.15.0, pinned in the root `package.json`) + `pnpm install --frozen-lockfile` + `pnpm --filter @vivify/voice-server run build` — a new `"build": "tsc --build"` script that builds the `@vivify/types` project reference first, then emits the server. The Wine/SAPI4 runtime stage then does `COPY --from=build /repo/services/voice-server/dist /opt/vivify/dist/` instead of `COPY dist/`. WHY: this removes the host Node/pnpm + manual-dist prerequisite entirely. `@vivify/types` imports are type-only (`import type`), so nothing from the workspace ships at runtime; and because the build stage is discarded, the final image keeps only the Node **runtime** — no pnpm, no TypeScript toolchain. + +**2. The build context becomes the repo root.** +`docker-compose.yml`'s `voice` service uses `context: .` + `dockerfile: services/voice-server/Dockerfile` (mirroring `apps/mash`), and every runtime-stage `COPY` source is prefixed `services/voice-server/` (`vendor/`, `bridge/`, `pulse-null.pa`, `entrypoint.sh`). WHY: the in-image build needs the whole pnpm workspace — the lockfile and `packages/types` — which live **above** the service directory, so the context can no longer be the service dir. + +**3. A Dockerfile-specific ignore lets the voice build read `vendor/`, while the root `.dockerignore` keeps `vendor/` out of the MASH image.** +`services/voice-server/Dockerfile.dockerignore` mirrors the root ignore **except** it deliberately allows `services/voice-server/vendor/`; BuildKit uses this per-Dockerfile ignore instead of the root one for the voice build. WHY: both images now build from the repo root. The root ignore must keep excluding `vendor/` so the proprietary engine never enters the **MASH** image (which does `COPY . .`), but the **voice** image legitimately needs `vendor/` at build time to install the SAPI4/TruVoice runtime. This was verified in-sandbox: building the `build` stage showed `vendor/` present in the voice build context (`spchapi.exe`, `tv_enua.exe`, `sdk/include/speech.h`). + +**4. `speech.h` stays user-supplied — do NOT auto-fetch it.** +The SAPI4 SDK header carries _"Copyright 1994-1998 Microsoft Corporation. All rights reserved."_ with no redistribution grant. It stays gitignored + user-supplied at `services/voice-server/vendor/sdk/include/speech.h`, and the build **fails loudly** with the exact drop path (and a pointer to `docs/legal-and-assets.md`) if it's missing. WHY: legal safety over convenience. Auto-fetching it — even at build time from a third-party mirror — would make our build reproduce Microsoft IP with no license, violating [ADR-0006](0006-permissive-license-no-bundled-ip.md) / zero-bundled-IP. The PO confirmed this path. The alternative considered and **deferred**: a clean-room minimal SAPI4 header that would remove the IP entirely — substantial, ABI-sensitive, a possible future cycle. + +## Consequences +- **New minimal user flow.** Drop **3** user-supplied files into `services/voice-server/vendor/` — `spchapi.exe`, `tv_enua.exe`, `sdk/include/speech.h` (sources in `docs/legal-and-assets.md`) — then `docker compose up`. **Docker is the only host tool.** (Was: install Node + pnpm → `pnpm install` → typecheck to build `dist/` → drop 3 files → `docker compose up`.) +- **Verification boundary (CI/sandbox vs operator).** Verified in-sandbox: the in-image `node` `build` stage was built and emitted `/repo/services/voice-server/dist/main.js` (proving the host needs no toolchain), and the Dockerfile-specific ignore was confirmed to let the build read `services/voice-server/vendor/`. The **operator** validates what the sandbox cannot: the full Debian + Wine + SAPI4 runtime image and end-to-end authentic voice — Wine isn't reproducible in vivify's sandbox, the same boundary as every voice cycle. +- **IP posture preserved.** No binaries, `.acs`, or `speech.h` are committed; `vendor/` stays gitignored; the MASH image still excludes `vendor/` via the unchanged root `.dockerignore`. [ADR-0006](0006-permissive-license-no-bundled-ip.md) is intact. +- **Runtime unchanged.** Ports (MASH 8090 / voice 8080), the TTS cache, and its named volume are untouched by this cycle. + +## Related +- [ADR-0006](0006-permissive-license-no-bundled-ip.md) — MIT, zero bundled third-party IP; the binding rule behind decision 4 (and why `vendor/`/`speech.h` stay user-supplied). +- [ADR-0014](0014-voice-server-architecture.md) — the voice-server architecture this cycle repackages (it changes how `dist/` is built, not the Wine/SAPI4 service). +- `docs/cycles/cycle-15-voice-one-command.md` — the cycle this ADR records, with the full Dockerfile/compose detail and the verified-where breakdown. diff --git a/services/voice-server/Dockerfile b/services/voice-server/Dockerfile index 4bc48c9..99848ba 100644 --- a/services/voice-server/Dockerfile +++ b/services/voice-server/Dockerfile @@ -10,7 +10,28 @@ # steps were derived by inspecting the actual CAB installers + their INF scripts, but # whether advpack/regsvr32 fully complete *under Wine* is the remaining unverified step; # the verification RUN below fails the build loudly (with diagnostics) if they don't. +# +# BUILD CONTEXT IS THE REPO ROOT (Cycle 15): the server's `dist/` is now compiled INSIDE +# this image (the `build` stage below) instead of on the host, so a user needs only Docker +# (no host Node/pnpm). Build via `docker compose up` (compose sets context: .), or by hand: +# docker build -f services/voice-server/Dockerfile -t vivify-voice . +# The user still supplies the gitignored proprietary files under services/voice-server/vendor/ +# (spchapi.exe, tv_enua.exe, sdk/include/speech.h) — see docs/legal-and-assets.md. A +# Dockerfile-specific ignore (services/voice-server/Dockerfile.dockerignore) lets THIS build +# read vendor/ while the root .dockerignore keeps it out of the MASH image. + +# --- stage 1: compile the Node server's dist/ from the pnpm workspace (no host toolchain) --- +# Mirrors apps/mash/Dockerfile: copy the monorepo, install with the pinned pnpm, build only +# the voice-server (its `build` = `tsc --build`, which builds its @vivify/types project +# reference first, then emits services/voice-server/dist). @vivify/types imports are type-only. +FROM node:20-slim AS build +RUN corepack enable +WORKDIR /repo +COPY . . +RUN pnpm install --frozen-lockfile +RUN pnpm --filter @vivify/voice-server run build +# --- stage 2: the Wine + SAPI4 + TruVoice runtime image --- FROM debian:bookworm-slim # 32-bit Wine + Xvfb (SAPI4/TruVoice are 32-bit; the bridge needs a display) + build + Node. @@ -44,7 +65,7 @@ WORKDIR /opt/vivify # CLSID {D67C0280-C743-11cd-80E5-00AA003E4B50} -> Speech.dll via AddReg. # tv_enua.exe -> tv_enua.dll (engine) + tvenuax.dll + msvcp50/msvcirt runtimes; # tv_enua.inf self-registers them with regsvr32. -COPY vendor/ /opt/vendor/ +COPY services/voice-server/vendor/ /opt/vendor/ RUN set -eux; \ wineboot --init; wineserver -w; \ # Cycle 7: point Wine's audio at the PulseAudio driver (the null sink is provided at \ @@ -108,27 +129,27 @@ RUN set -eu; \ # to open the exe with c0000135 (STATUS_DLL_NOT_FOUND) before the engine is even reached. # Build ANSI (no -municode): the SAPI4 interface macros resolve to the *A* forms the code # targets. `set -e` + the trailing `test -f` make a failed compile abort the build. -COPY bridge/ /opt/vivify/bridge/ +COPY services/voice-server/bridge/ /opt/vivify/bridge/ RUN set -eux; cd /opt/vivify/bridge; \ test -f "$SAPI4_SDK/include/speech.h" \ - || { echo "FATAL: speech.h missing — drop the SAPI4 SDK header at services/voice-server/vendor/sdk/include/speech.h (see bridge/README.md)"; exit 1; }; \ + || { echo "FATAL: speech.h missing — drop the user-supplied SAPI4 SDK header at services/voice-server/vendor/sdk/include/speech.h (see docs/legal-and-assets.md). It is gitignored and never shipped."; exit 1; }; \ i686-w64-mingw32-g++ -O2 -static -static-libgcc -static-libstdc++ \ -o sapi4-mouth.exe sapi4-mouth.cpp \ -I"$SAPI4_SDK/include" -lole32 -loleaut32 -luuid -lwinmm; \ test -f sapi4-mouth.exe # --- the Node HTTP server --- -# The server uses ONLY Node built-ins at runtime (its @vivify/types imports are -# type-only and erased at compile), so we just copy the prebuilt dist and run it. -# Build dist first on the host: `pnpm --filter @vivify/voice-server typecheck` -# (emits dist/ via tsc --build), then `docker build services/voice-server`. -COPY dist/ /opt/vivify/dist/ +# The server uses ONLY Node built-ins at runtime (its @vivify/types imports are type-only and +# erased at compile), so we just copy the dist built in stage 1 and run it with the runtime Node +# (no pnpm / TypeScript toolchain ships in this image). Cycle 15: dist is compiled IN-IMAGE (the +# `build` stage), so the host no longer needs Node/pnpm and no prebuilt dist/ is required. +COPY --from=build /repo/services/voice-server/dist /opt/vivify/dist/ # --- dummy audio device (Cycle 7): PulseAudio null sink for Wine's MMAudioDest --- # entrypoint.sh starts the null sink before the server; the bridge (spawned per request) # inherits PULSE_SERVER. See docs/cycles/cycle-7-realtime-audio.md. -COPY pulse-null.pa /etc/pulse/vivify-null.pa -COPY entrypoint.sh /opt/vivify/entrypoint.sh +COPY services/voice-server/pulse-null.pa /etc/pulse/vivify-null.pa +COPY services/voice-server/entrypoint.sh /opt/vivify/entrypoint.sh RUN chmod +x /opt/vivify/entrypoint.sh # Cycle 10 (warm engine): DISPLAY points at the persistent Xvfb that entrypoint.sh starts, diff --git a/services/voice-server/Dockerfile.dockerignore b/services/voice-server/Dockerfile.dockerignore new file mode 100644 index 0000000..fb2a65c --- /dev/null +++ b/services/voice-server/Dockerfile.dockerignore @@ -0,0 +1,42 @@ +# Build-context ignore for services/voice-server/Dockerfile (Cycle 15). +# +# BuildKit uses THIS file instead of the repo-root .dockerignore for the voice build (compose +# sets context: . + dockerfile: services/voice-server/Dockerfile). It mirrors the root ignore +# EXCEPT it deliberately ALLOWS services/voice-server/vendor/ — the voice image installs the +# user-supplied SAPI4/TruVoice runtime + speech.h from vendor/ at build time. vendor/ stays +# gitignored (never committed), and the root .dockerignore still excludes it, so it can never +# enter the MASH image. + +# Dependencies + build outputs (reinstalled / rebuilt inside the image) +**/node_modules +**/dist +**/build +**/coverage +**/*.tsbuildinfo +.turbo + +# VCS / editor / CI noise +.git +.github +.gitignore +**/.DS_Store + +# Other engine/IP artifacts we still never want in the build context +services/voice-server/.wine/ +services/voice-server/prefix/ +services/voice-server/bridge/*.exe +services/voice-server/bridge/*.dll +services/voice-server/bridge/*.o +**/*.acs +**/*.acf +**/*.aca +**/*.acd + +# MASH built-in bundles are local-only, user-supplied +apps/mash/public/characters/ + +# Local-only Claude config +.claude/settings.local.json + +# NOTE: services/voice-server/vendor/ is intentionally NOT listed here — the voice image needs +# the user-supplied SAPI4/TruVoice installers + speech.h from it. It stays gitignored. diff --git a/services/voice-server/README.md b/services/voice-server/README.md index d43d5b8..343cf36 100644 --- a/services/voice-server/README.md +++ b/services/voice-server/README.md @@ -10,19 +10,30 @@ Full design: `../../docs/cycles/cycle-5-voice.md`. > or run** in vivify's sandbox (no Wine there). The GO/NO-GO is proven only by > running the curl test below in a real Docker/Wine environment. -## 1. Supply the proprietary runtime (never committed — gitignored `vendor/`) -Drop into `services/voice-server/vendor/` (see `../../docs/legal-and-assets.md`): +## 1. Drop in the three proprietary files (never committed — gitignored `vendor/`) +Drop into `services/voice-server/vendor/` (sourcing is in `../../docs/legal-and-assets.md`): - `spchapi.exe` — Microsoft Speech API 4.0 runtime. - `tv_enua.exe` — L&H TruVoice American English (Genie's voice). +- `sdk/include/speech.h` — the SAPI4 SDK header the bridge compiles against, i.e. + `services/voice-server/vendor/sdk/include/speech.h`. -Also supply the **SAPI4 SDK headers/libs** (to compile the bridge) where the -Dockerfile's `$SAPI4_SDK` expects them. (Sources are listed in legal-and-assets.md; -TETYYS/SAPI4 documents a working Wine install of exactly these.) +`speech.h` stays user-supplied because it's Microsoft-copyrighted ("All rights reserved") +with no redistribution grant, so we never ship it (see ADR-0027 / ADR-0006). The build +**fails loudly** with the exact drop path if it's missing. -## 2. Build dist, then the image +That's the only host setup. **Docker is the only host tool** — no Node, no pnpm, no manual +`dist` build. The image compiles the server's `dist/` itself in a `node:20-slim` build stage +(`pnpm install` + `tsc --build`). + +## 2. Build the image +The build context is the **repo root** (the build reads the pnpm workspace). From the repo root, +either let compose do it: +``` +docker compose up # compose sets the context (.) and dockerfile +``` +or build by hand: ``` -pnpm --filter @vivify/voice-server typecheck # emits dist/ (server is pure Node built-ins at runtime) -docker build -t vivify-voice services/voice-server +docker build -f services/voice-server/Dockerfile -t vivify-voice . ``` ## 3. Run @@ -30,6 +41,7 @@ docker build -t vivify-voice services/voice-server docker run --rm -p 8080:8080 vivify-voice curl localhost:8080/health # -> {"ok":true} ``` +This needs an image built with the three `vendor/` files above (they're baked in at build time). ## 4. GO/NO-GO test ``` @@ -173,6 +185,8 @@ evict the oldest entries by mtime on write. `[cache] N entries, M on disk`. ## Local dev without Wine -The HTTP layer can be exercised against a fake bridge: +The HTTP layer can be exercised against a fake bridge. Build `dist/` once on the host +(`pnpm --filter @vivify/voice-server run build`), then: `VIVIFY_SAPI4_BRIDGE="node test/fake-bridge.mjs" node dist/main.js` — returns a canned WAV + timeline so you can hit `/tts` without the engine. (This proves plumbing, NOT the voice.) +This host build is only for local dev; the Docker image builds its own `dist/` in-image. diff --git a/services/voice-server/package.json b/services/voice-server/package.json index cc59ea7..97ce971 100644 --- a/services/voice-server/package.json +++ b/services/voice-server/package.json @@ -15,6 +15,7 @@ "dist" ], "scripts": { + "build": "tsc --build", "typecheck": "tsc --build --pretty && tsc -p tsconfig.test.json", "test": "vitest run --passWithNoTests", "start": "node dist/main.js",