Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,9 @@
# handled by the voice server reflecting the request Origin.)
#
# `docker compose up mash` runs just the demo (no voice binaries required). The `voice`
# service needs your user-supplied engine files in services/voice-server/vendor/ and a
# prebuilt dist (`pnpm --filter @vivify/voice-server typecheck`) — see that service's README.
# service needs your user-supplied engine files in services/voice-server/vendor/
# (spchapi.exe, tv_enua.exe, sdk/include/speech.h). It compiles its own dist/ in-image
# (Cycle 15), so no host Node/pnpm is needed — just `docker compose up`. See that service's README.
services:
mash:
build:
Expand All @@ -25,7 +26,13 @@ services:

voice:
build:
context: services/voice-server
# Cycle 15: context is the REPO ROOT so the image can compile the server's dist/ from the
# pnpm workspace itself (no host Node/pnpm, no prebuilt dist). The Dockerfile's own ignore
# (services/voice-server/Dockerfile.dockerignore) lets this build read the gitignored
# vendor/ (user-supplied SAPI4/TruVoice + speech.h) while the root .dockerignore keeps
# vendor out of the MASH image.
context: .
dockerfile: services/voice-server/Dockerfile
image: vivify-voice
ports:
- '8080:8080'
Expand Down
82 changes: 82 additions & 0 deletions docs/cycles/cycle-15-voice-one-command.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Cycle 15 — authentic voice in one `docker compose up`

## Goal
Make the authentic TruVoice voice run with a single `docker compose up` once the user has dropped in their
supplied files — **no host Node/pnpm, no manual `dist` build**. Before this cycle the voice image only
`COPY`'d a host-prebuilt `dist/`, so a user had to install Node 20 + pnpm, `pnpm install`, and run a
typecheck to emit `dist/` before building. This cycle moves that build **into the image**. Ports (MASH
8090 / voice 8080), the TTS cache, and its named volume are unchanged. Code cycle — the operator rebuilds
+ tests the full Wine path.

## The change

### 1. In-image `dist` build (multi-stage Dockerfile)
`services/voice-server/Dockerfile` gains a first stage that compiles the server itself, mirroring the
proven `apps/mash/Dockerfile`:

```dockerfile
FROM node:20-slim AS build
RUN corepack enable # pnpm@9.15.0 (pinned in root package.json)
WORKDIR /repo
COPY . .
RUN pnpm install --frozen-lockfile
RUN pnpm --filter @vivify/voice-server run build # tsc --build → dist/
```

The runtime (Debian + Wine + SAPI4) stage then does `COPY --from=build
/repo/services/voice-server/dist /opt/vivify/dist/` instead of `COPY dist/`. A new
`"build": "tsc --build"` script in `services/voice-server/package.json` builds **only** the emit (the
`@vivify/types` project reference first, then the server) — not the test typecheck. `@vivify/types` imports
are type-only (all `import type`), so nothing from the workspace ships at runtime. The final image keeps
the Node **runtime** but **no pnpm / TypeScript toolchain** (those live only in the discarded build stage).

### 2. Build context → repo root
The build now needs the pnpm workspace (lockfile, `packages/types`), so `docker-compose.yml`'s `voice`
service switches to `build: { context: ., dockerfile: services/voice-server/Dockerfile }` — exactly how
`mash` builds. Every runtime-stage `COPY` source gets the `services/voice-server/` prefix (`vendor/`,
`bridge/`, `pulse-null.pa`, `entrypoint.sh`).

### 3. Per-Dockerfile ignore so the voice build can read `vendor/`
The root `.dockerignore` excludes `services/voice-server/vendor/` so the proprietary engine can never enter
the **MASH** image (which also builds from the root and does `COPY . .`). But the **voice** image must read
`vendor/` at build time. Solution: a Dockerfile-specific ignore,
`services/voice-server/Dockerfile.dockerignore`, which BuildKit uses **instead of** the root ignore for the
voice build. It mirrors the root ignore **except** it allows `vendor/`. The root ignore is unchanged, so
MASH's posture is untouched.

### 4. `speech.h` stays user-supplied (license decision)
The SAPI4 SDK header carries _"Copyright 1994-1998 Microsoft Corporation. All rights reserved."_ with no
redistribution grant. Auto-fetching it (even at build time from a third-party mirror) would make our build
reproduce Microsoft IP with no license, violating
[ADR-0006](../decisions/0006-permissive-license-no-bundled-ip.md) / the zero-bundled-IP rule. So it stays
**user-supplied** under the gitignored `services/voice-server/vendor/sdk/include/speech.h`; the build
**fails loudly** with the exact drop path + a pointer to `docs/legal-and-assets.md` if it's missing. (A
future clean-room header could remove it entirely — out of scope; see ADR-0027.)

## What is verified where
- **CI (this repo):** `pnpm --filter @vivify/voice-server run build` emits `dist/`; `pnpm -r typecheck &&
pnpm -r test && pnpm lint && pnpm format` green (the compose YAML is prettier-clean; no `src`/test
change).
- **Docker, in this sandbox (verified, not assumed):** `docker build --target build -f
services/voice-server/Dockerfile .` ran `pnpm install` + `tsc --build` **inside** the image and emitted
`/repo/services/voice-server/dist/main.js` — proving the host needs no toolchain. Running that stage
confirmed the Dockerfile-specific ignore lets the build read `services/voice-server/vendor/`
(`spchapi.exe`, `tv_enua.exe`, `sdk/include/speech.h`) and the other runtime COPY sources (`bridge/`,
`pulse-null.pa`, `entrypoint.sh`), while bridge build artifacts stay excluded.
- **Operator (the acceptance — full Wine path can't run in the sandbox):** from a clean checkout, drop the
**3** files into `services/voice-server/vendor/`, then `docker compose build --no-cache && docker compose
up` → both containers up, MASH on 8090, voice on 8080, upload a `.acs` → Speak → authentic Genie (first
synthesis ~3–4s; repeats instant via the cache). No host Node/pnpm. The Debian/Wine/SAPI4 install steps
are environment-specific and remain operator-validated (the same boundary as every voice cycle).

### Final minimal steps (after this cycle)
- **Was:** install Node + pnpm → `pnpm install` → `pnpm --filter @vivify/voice-server typecheck` (build
dist) → drop 3 files → `docker compose up`.
- **Now:** drop 3 user-supplied files into `services/voice-server/vendor/` — `spchapi.exe`, `tv_enua.exe`,
`sdk/include/speech.h` (sources in [`docs/legal-and-assets.md`](../legal-and-assets.md)) → `docker
compose up`. **Docker is the only host tool.**

## Non-goals
The full per-platform install-page rewrite (`docs/install/*`) is the deferred docs cycle — this cycle only
updates the voice-server README to the one-command flow. No `@vivify/core`/browser change. Removing
`speech.h` via a clean-room header is a possible future cycle, not this one. See ADR-0027.
32 changes: 32 additions & 0 deletions docs/decisions/0027-voice-one-command-build.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# ADR-0027: authentic voice in one `docker compose up` — compile the server's dist/ inside the image, build from the repo root, keep speech.h user-supplied
Status: Accepted · Date: 2026-06-21

## Context
Running the authentic TruVoice voice secretly required a **host toolchain**. The voice image only `COPY`'d a host-prebuilt `dist/`, so a user had to install Node 20 + pnpm, run `pnpm install`, and run a typecheck to emit `dist/` **before** `docker compose up`. The goal of Cycle 15 was one `docker compose up` once the user drops in their supplied files — **no host Node/pnpm, no manual dist build; Docker as the only host tool**.

This is a code cycle (Dockerfile + compose). It is Tier-2 / authentic-voice context — the zero-bundled-IP rule ([ADR-0006](0006-permissive-license-no-bundled-ip.md)) is binding, and the full Debian + Wine + SAPI4 path cannot run in vivify's sandbox.

## Decision

**1. Compile the server's `dist/` INSIDE the image (multi-stage).**
A `node:20-slim` `build` stage runs corepack (pnpm@9.15.0, pinned in the root `package.json`) + `pnpm install --frozen-lockfile` + `pnpm --filter @vivify/voice-server run build` — a new `"build": "tsc --build"` script that builds the `@vivify/types` project reference first, then emits the server. The Wine/SAPI4 runtime stage then does `COPY --from=build /repo/services/voice-server/dist /opt/vivify/dist/` instead of `COPY dist/`. WHY: this removes the host Node/pnpm + manual-dist prerequisite entirely. `@vivify/types` imports are type-only (`import type`), so nothing from the workspace ships at runtime; and because the build stage is discarded, the final image keeps only the Node **runtime** — no pnpm, no TypeScript toolchain.

**2. The build context becomes the repo root.**
`docker-compose.yml`'s `voice` service uses `context: .` + `dockerfile: services/voice-server/Dockerfile` (mirroring `apps/mash`), and every runtime-stage `COPY` source is prefixed `services/voice-server/` (`vendor/`, `bridge/`, `pulse-null.pa`, `entrypoint.sh`). WHY: the in-image build needs the whole pnpm workspace — the lockfile and `packages/types` — which live **above** the service directory, so the context can no longer be the service dir.

**3. A Dockerfile-specific ignore lets the voice build read `vendor/`, while the root `.dockerignore` keeps `vendor/` out of the MASH image.**
`services/voice-server/Dockerfile.dockerignore` mirrors the root ignore **except** it deliberately allows `services/voice-server/vendor/`; BuildKit uses this per-Dockerfile ignore instead of the root one for the voice build. WHY: both images now build from the repo root. The root ignore must keep excluding `vendor/` so the proprietary engine never enters the **MASH** image (which does `COPY . .`), but the **voice** image legitimately needs `vendor/` at build time to install the SAPI4/TruVoice runtime. This was verified in-sandbox: building the `build` stage showed `vendor/` present in the voice build context (`spchapi.exe`, `tv_enua.exe`, `sdk/include/speech.h`).

**4. `speech.h` stays user-supplied — do NOT auto-fetch it.**
The SAPI4 SDK header carries _"Copyright 1994-1998 Microsoft Corporation. All rights reserved."_ with no redistribution grant. It stays gitignored + user-supplied at `services/voice-server/vendor/sdk/include/speech.h`, and the build **fails loudly** with the exact drop path (and a pointer to `docs/legal-and-assets.md`) if it's missing. WHY: legal safety over convenience. Auto-fetching it — even at build time from a third-party mirror — would make our build reproduce Microsoft IP with no license, violating [ADR-0006](0006-permissive-license-no-bundled-ip.md) / zero-bundled-IP. The PO confirmed this path. The alternative considered and **deferred**: a clean-room minimal SAPI4 header that would remove the IP entirely — substantial, ABI-sensitive, a possible future cycle.

## Consequences
- **New minimal user flow.** Drop **3** user-supplied files into `services/voice-server/vendor/` — `spchapi.exe`, `tv_enua.exe`, `sdk/include/speech.h` (sources in `docs/legal-and-assets.md`) — then `docker compose up`. **Docker is the only host tool.** (Was: install Node + pnpm → `pnpm install` → typecheck to build `dist/` → drop 3 files → `docker compose up`.)
- **Verification boundary (CI/sandbox vs operator).** Verified in-sandbox: the in-image `node` `build` stage was built and emitted `/repo/services/voice-server/dist/main.js` (proving the host needs no toolchain), and the Dockerfile-specific ignore was confirmed to let the build read `services/voice-server/vendor/`. The **operator** validates what the sandbox cannot: the full Debian + Wine + SAPI4 runtime image and end-to-end authentic voice — Wine isn't reproducible in vivify's sandbox, the same boundary as every voice cycle.
- **IP posture preserved.** No binaries, `.acs`, or `speech.h` are committed; `vendor/` stays gitignored; the MASH image still excludes `vendor/` via the unchanged root `.dockerignore`. [ADR-0006](0006-permissive-license-no-bundled-ip.md) is intact.
- **Runtime unchanged.** Ports (MASH 8090 / voice 8080), the TTS cache, and its named volume are untouched by this cycle.

## Related
- [ADR-0006](0006-permissive-license-no-bundled-ip.md) — MIT, zero bundled third-party IP; the binding rule behind decision 4 (and why `vendor/`/`speech.h` stay user-supplied).
- [ADR-0014](0014-voice-server-architecture.md) — the voice-server architecture this cycle repackages (it changes how `dist/` is built, not the Wine/SAPI4 service).
- `docs/cycles/cycle-15-voice-one-command.md` — the cycle this ADR records, with the full Dockerfile/compose detail and the verified-where breakdown.
41 changes: 31 additions & 10 deletions services/voice-server/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,28 @@
# steps were derived by inspecting the actual CAB installers + their INF scripts, but
# whether advpack/regsvr32 fully complete *under Wine* is the remaining unverified step;
# the verification RUN below fails the build loudly (with diagnostics) if they don't.
#
# BUILD CONTEXT IS THE REPO ROOT (Cycle 15): the server's `dist/` is now compiled INSIDE
# this image (the `build` stage below) instead of on the host, so a user needs only Docker
# (no host Node/pnpm). Build via `docker compose up` (compose sets context: .), or by hand:
# docker build -f services/voice-server/Dockerfile -t vivify-voice .
# The user still supplies the gitignored proprietary files under services/voice-server/vendor/
# (spchapi.exe, tv_enua.exe, sdk/include/speech.h) — see docs/legal-and-assets.md. A
# Dockerfile-specific ignore (services/voice-server/Dockerfile.dockerignore) lets THIS build
# read vendor/ while the root .dockerignore keeps it out of the MASH image.

# --- stage 1: compile the Node server's dist/ from the pnpm workspace (no host toolchain) ---
# Mirrors apps/mash/Dockerfile: copy the monorepo, install with the pinned pnpm, build only
# the voice-server (its `build` = `tsc --build`, which builds its @vivify/types project
# reference first, then emits services/voice-server/dist). @vivify/types imports are type-only.
FROM node:20-slim AS build
RUN corepack enable
WORKDIR /repo
COPY . .
RUN pnpm install --frozen-lockfile
RUN pnpm --filter @vivify/voice-server run build

# --- stage 2: the Wine + SAPI4 + TruVoice runtime image ---
FROM debian:bookworm-slim

# 32-bit Wine + Xvfb (SAPI4/TruVoice are 32-bit; the bridge needs a display) + build + Node.
Expand Down Expand Up @@ -44,7 +65,7 @@ WORKDIR /opt/vivify
# CLSID {D67C0280-C743-11cd-80E5-00AA003E4B50} -> Speech.dll via AddReg.
# tv_enua.exe -> tv_enua.dll (engine) + tvenuax.dll + msvcp50/msvcirt runtimes;
# tv_enua.inf self-registers them with regsvr32.
COPY vendor/ /opt/vendor/
COPY services/voice-server/vendor/ /opt/vendor/
RUN set -eux; \
wineboot --init; wineserver -w; \
# Cycle 7: point Wine's audio at the PulseAudio driver (the null sink is provided at \
Expand Down Expand Up @@ -108,27 +129,27 @@ RUN set -eu; \
# to open the exe with c0000135 (STATUS_DLL_NOT_FOUND) before the engine is even reached.
# Build ANSI (no -municode): the SAPI4 interface macros resolve to the *A* forms the code
# targets. `set -e` + the trailing `test -f` make a failed compile abort the build.
COPY bridge/ /opt/vivify/bridge/
COPY services/voice-server/bridge/ /opt/vivify/bridge/
RUN set -eux; cd /opt/vivify/bridge; \
test -f "$SAPI4_SDK/include/speech.h" \
|| { echo "FATAL: speech.h missing — drop the SAPI4 SDK header at services/voice-server/vendor/sdk/include/speech.h (see bridge/README.md)"; exit 1; }; \
|| { echo "FATAL: speech.h missing — drop the user-supplied SAPI4 SDK header at services/voice-server/vendor/sdk/include/speech.h (see docs/legal-and-assets.md). It is gitignored and never shipped."; exit 1; }; \
i686-w64-mingw32-g++ -O2 -static -static-libgcc -static-libstdc++ \
-o sapi4-mouth.exe sapi4-mouth.cpp \
-I"$SAPI4_SDK/include" -lole32 -loleaut32 -luuid -lwinmm; \
test -f sapi4-mouth.exe

# --- the Node HTTP server ---
# The server uses ONLY Node built-ins at runtime (its @vivify/types imports are
# type-only and erased at compile), so we just copy the prebuilt dist and run it.
# Build dist first on the host: `pnpm --filter @vivify/voice-server typecheck`
# (emits dist/ via tsc --build), then `docker build services/voice-server`.
COPY dist/ /opt/vivify/dist/
# The server uses ONLY Node built-ins at runtime (its @vivify/types imports are type-only and
# erased at compile), so we just copy the dist built in stage 1 and run it with the runtime Node
# (no pnpm / TypeScript toolchain ships in this image). Cycle 15: dist is compiled IN-IMAGE (the
# `build` stage), so the host no longer needs Node/pnpm and no prebuilt dist/ is required.
COPY --from=build /repo/services/voice-server/dist /opt/vivify/dist/

# --- dummy audio device (Cycle 7): PulseAudio null sink for Wine's MMAudioDest ---
# entrypoint.sh starts the null sink before the server; the bridge (spawned per request)
# inherits PULSE_SERVER. See docs/cycles/cycle-7-realtime-audio.md.
COPY pulse-null.pa /etc/pulse/vivify-null.pa
COPY entrypoint.sh /opt/vivify/entrypoint.sh
COPY services/voice-server/pulse-null.pa /etc/pulse/vivify-null.pa
COPY services/voice-server/entrypoint.sh /opt/vivify/entrypoint.sh
RUN chmod +x /opt/vivify/entrypoint.sh

# Cycle 10 (warm engine): DISPLAY points at the persistent Xvfb that entrypoint.sh starts,
Expand Down
42 changes: 42 additions & 0 deletions services/voice-server/Dockerfile.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Build-context ignore for services/voice-server/Dockerfile (Cycle 15).
#
# BuildKit uses THIS file instead of the repo-root .dockerignore for the voice build (compose
# sets context: . + dockerfile: services/voice-server/Dockerfile). It mirrors the root ignore
# EXCEPT it deliberately ALLOWS services/voice-server/vendor/ — the voice image installs the
# user-supplied SAPI4/TruVoice runtime + speech.h from vendor/ at build time. vendor/ stays
# gitignored (never committed), and the root .dockerignore still excludes it, so it can never
# enter the MASH image.

# Dependencies + build outputs (reinstalled / rebuilt inside the image)
**/node_modules
**/dist
**/build
**/coverage
**/*.tsbuildinfo
.turbo

# VCS / editor / CI noise
.git
.github
.gitignore
**/.DS_Store

# Other engine/IP artifacts we still never want in the build context
services/voice-server/.wine/
services/voice-server/prefix/
services/voice-server/bridge/*.exe
services/voice-server/bridge/*.dll
services/voice-server/bridge/*.o
**/*.acs
**/*.acf
**/*.aca
**/*.acd

# MASH built-in bundles are local-only, user-supplied
apps/mash/public/characters/

# Local-only Claude config
.claude/settings.local.json

# NOTE: services/voice-server/vendor/ is intentionally NOT listed here — the voice image needs
# the user-supplied SAPI4/TruVoice installers + speech.h from it. It stays gitignored.
Loading
Loading