Spectra turns talk into a spec. A non-technical founder talks; an AI voice agent builds a validated PRD live while running an underlying workflow state machine. The voice tool calls ARE the structured extraction — PRD blocks fill a live dashboard, a studio-lit 3D avatar lip-syncs to the voice, and the whole session is durable in SQLite (kill the server mid-call, restart, reload → state resumes).
The one sentence the demo proves: "Voice + an agent that maintains structured state turns a founder's rambling into a validated PRD artifact in real time."
Vapi web call ──tool call──> POST /vapi/webhook ──> mutate PRD ──> SQLite ──> WebSocket ──> dashboard
cd ~/voice-prd
python3 -m venv .venv && ./.venv/bin/pip install -r requirements.txt
./.venv/bin/python -m uvicorn main:app --reload --port 8000Open http://localhost:8000/?s=demo. (The ?s= is the session id — keep it demo.)
Smoke-test with no voice needed:
curl -s -X POST localhost:8000/vapi/webhook -H 'Content-Type: application/json' \
-d '{"message":{"type":"tool-calls","call":{"id":"demo"},
"toolCallList":[{"id":"1","function":{"name":"add_requirement",
"arguments":{"title":"Bill upload"}}}]}}'A card should pop into the open browser tab. GET /export/demo.md returns the artifact.
The Vapi assistant POSTs tool calls to a Server URL. It must be a public HTTPS URL that's up whenever you take a call — pick one of:
A fixed URL that never changes, so the webhook can't go stale (the #1 cause of "the call
captured nothing / dropped mid-sentence"). render.yaml + runtime.txt are included.
- Push this repo to GitHub.
- Render → New + → Blueprint → pick the repo (it reads
render.yaml). - Set env vars:
VAPI_PUBLIC_KEY,VAPI_ASSISTANT_ID(and optionalRPM_AVATAR_URL). The server serves these to the browser via/config.js, so no keys are committed. AddOPENAI_API_KEYto enable Generate PRD (extracts the whole PRD from the call transcript with OpenAI gpt-4o — a config-proof fallback when live tool calls don't fire). - Deploy → copy the live URL, e.g.
https://spectra.onrender.com.
- Dashboard:
https://<app>.onrender.com/?s=demo - Vapi tool Server URL:
https://<app>.onrender.com/vapi/webhook?s=demo(the?s=demoroutes every tool call into thedemosession deterministically). - Health check: open
https://<app>.onrender.com/debug— shows the active session and per-session item counts, so you can confirm Vapi is actually writing.
Free-tier caveats: the service spins down after ~15 min idle (~50s cold start — open the
URL once right before a live demo), and SQLite is ephemeral (state resets on redeploy; add
a Render persistent disk or a hosted DB later if you need durability across deploys). Railway
works the same way with startCommand: uvicorn main:app --host 0.0.0.0 --port $PORT and stays
warm if you want no cold starts.
ngrok http 8000Copy the https://…ngrok…app URL; your webhook is that URL + /vapi/webhook?s=demo.
In the Vapi dashboard:
-
Assistant → Model: pick a fast model (temperature ~0.5). Paste the block below as the whole system prompt, and set the First message field to its last line. Do NOT paste the demo script (§4) or any tool code into the prompt — if the prompt contains literal
add_requirement(...)-style code, the model reads it out loud ("to equals add requirement, title…"). Describe tools in words; let Vapi's Functions do the calling.# Identity You are Naina, a warm, sharp product manager at Spectra. You interview a founder by voice and turn what they say into a structured Product Requirements Document in real time. You sound human — natural spoken English, short sentences, one thought at a time. You're curious and encouraging, never robotic. # Your job Through a relaxed conversation, draw out everything a good PRD needs and quietly record each piece as you go. A live dashboard and an exportable spec hold all the detail — you never read it back. Cover these, roughly in order, but follow the founder's lead: 1. Big picture — what they're building and the goal. Record this as the product overview: a one or two sentence introduction plus the objectives/targets. 2. Who it's for — target users, buyers, and anyone with a stake (regulators, ops, support). Record each as a stakeholder. 3. A real example — walk through one or two concrete users and what they do with it. Record each as a use-case story with the person's name. 4. Features — anything the product must, should, or could do. Record each as a requirement, give it a category (Functionality, Design, UX, Performance, Regulations…) and a priority of must, should, or could. 5. Data — what information the system stores. Record each as a data model with its fields; note row-level security if the data is sensitive. 6. Integrations — any outside service (Stripe, Twilio, email, fax, maps…). Record each as an integration with its purpose. 7. Compliance — the moment they mention medical, financial, or personal data, flag a compliance gate, briefly say why, and ask if they want it added; if they agree, mark it accepted. 8. Timeline — any dates, phases, or launch targets. Record each as a milestone. 9. Unknowns — anything they're unsure about or want to revisit. Record each as an open question. # How you talk - One short, natural sentence at a time. Ask ONE question, then listen. - After you capture something, give a quick human confirmation — "Got it, adding bill upload." — then move on. - If they jump around, follow them; fill gaps later with a gentle nudge ("Any sensitive data involved?", "What's your rough timeline?", "Anything still up in the air?"). - If they ask you a question, just answer it in one sentence. # Hard rules - NEVER say tool names, function names, parameter names, JSON, braces, or any code out loud. Record everything silently in the background — the founder only hears natural talk. - Never narrate the recording itself; just confirm the idea in plain words. - Only record something when the founder actually states it; don't invent details. - Keep it moving and friendly — this is a chat, not an interrogation. - When the spec feels complete, tell them they can export it as a PRD whenever they're ready. # First message "Hi, I'm Naina — I'll turn your idea into a product spec as we talk. So, what are you building, and who's it for?" -
Assistant → Tools (Functions): add the nine custom tools below — these map 1:1 to the PRD sections in the export. Set the Server URL (assistant-level or per-tool) to your reachable webhook from §2, including the session, e.g.
https://<app>.onrender.com/vapi/webhook?s=demo(or the ngrok equivalent). If this URL is wrong/stale, tool calls capture nothing and the call can drop mid-sentence — check/debugto confirm writes are landing.
Tool definitions (paste each)
- Put your Public Key and Assistant ID in a gitignored
static/config.js(so keys never hit the public repo):cp static/config.example.js static/config.js # then edit static/config.js: # window.VAPI_PUBLIC_KEY = "pk_..."; # window.VAPI_ASSISTANT_ID = "your-assistant-id";
index.htmlreadswindow.VAPI_*at runtime; the server returns empty JS ifconfig.jsis absent, so fresh clones get no 404 and still run the fallback panel. Use the Public key (client-side); never put the Private key in the browser.
@vapi-ai/web is an ESM module, so index.html imports it via jsDelivr +esm and
exposes it as window.Vapi:
<script type="module">
import Vapi from "https://cdn.jsdelivr.net/npm/@vapi-ai/web@latest/+esm";
window.Vapi = Vapi;
</script>Then new Vapi(PUBLIC_KEY) + vapi.start(ASSISTANT_ID) and the events call-start,
call-end, speech-start, speech-end, volume-level, message (confirmed against the
Vapi docs). The module load + typeof window.Vapi === "object" are verified by verify.mjs.
Still do the live mic test (hear the agent + mic dot turns red) before wiring the script —
that's the one thing only a real call with your keys can prove. This is your H0–1.5 gate.
node verify.mjs drives real Chrome (via puppeteer-core + your installed Google Chrome),
runs the full fallback script through the UI, asserts the cards render + WebGL is live + no
console errors, and writes verify.png. Run it after any frontend change:
./.venv/bin/python -m uvicorn main:app --port 8000 & # server must be up
node verify.mjs && open verify.pngThe webhook payload parsing (parse_tool_calls in main.py) is already defensive about
toolCallList vs toolCalls and string-vs-object arguments — but eyeball one real payload
in the uvicorn logs and adjust if Vapi's shape changed.
Say these lines; each deterministically drives a tool. Rehearse verbatim.
| You say | Expect on screen |
|---|---|
| "I need an app where patients upload their medical bills." | MedicalBill data model + a "Bill upload" requirement; agent flags HIPAA gate → say "yes" → ✅ |
| "We need to extract the price from each bill." | "Price extraction" requirement |
| "If a bill is over $500, alert a lawyer." | requirement (MUST) + a notification integration |
| "Add row-level security so a patient only sees their own bills." | RLS note on MedicalBill (the Wayco/Postgres beat) |
| (click Export .md) | clean PRD downloads; (say) "and it pastes straight into Notion" |
| (refresh the page) | everything resumes from SQLite — the durability beat |
Closing line: "That's voice maintaining durable, structured workflow state — not a chat wrapper."
Re-mentioning a data model (the RLS line) updates it instead of duplicating —
add_data_modelupserts by name.
Press the ` (backtick) key or click ⌨ fallback (bottom-left) to open a hidden
panel with one button per script line, plus ▶ Run full MedLegal script. These post to the
same /vapi/webhook, so the dashboard and 3D core react identically — you can narrate
over a keyboard-driven run if voice fails and the backup video isn't enough. The panel is
unobtrusive and off by default, so it won't show during a clean voice demo.
-
curlto reset between takes:curl -X POST localhost:8000/api/reset/demo - Run the full script 5–10× until it's muscle memory.
- Record a clean screen+audio capture of a perfect run. Conference wifi/mic failure is the #1 demo killer — if anything dies live, you play the video and narrate.
- Have the browser already open to
?s=demo, server + ngrok already running, mic permission granted, volume up, notifications silenced.
Blender/GLTF (post-hackathon ionous upgrade) · Twilio · Temporal · live Notion API · auth ·
a second transcript→JSON pipeline. The 3D is procedural; the durability is a 30-line SQLite
mirror (state.py); the structured extraction is the tool calls themselves.
schema.py— Pydantic PRD models +to_markdown()state.py— in-memory cache + SQLite mirror (the durability story)main.py— FastAPI:/vapi/webhook,/ws/{sid},/export/{sid}.md,/api/reset/{sid}static/index.html— transcript rail + "N captured" counter + recording visual + client-side markdown export/copy + formatted PDF export (print-to-PDF) + Generate PRD (transcript → PRD via Claude) + Vapi wiringstatic/scene.js— loads a realistic Ready Player Me GLB avatar (static/avatar.glb), studio-lit, that idles/blinks and lip-syncs to Vapi volume; swap viaRPM_AVATAR_URLinconfig.js