A kid-safe, expressive robot platform combining real-time motor control, an animated TFT face, and optional networked AI planner.
Two ESP32-S3 microcontrollers handle the deterministic, safety-critical work: one drives motors with PID control and enforces safety limits, the other renders an animated face on a 320x240 TFT touch display (esp32-face). A Raspberry Pi 5 orchestrates everything at 50 Hz β reading sensors, running the state machine, applying layered safety policies, and streaming telemetry to a browser UI. An optional AI planner server on a separate machine (3090 Ti) generates expressive behavior plans via a local LLM.
Reflexes are local and deterministic. Planner is remote and optional.
| Component | Hardware | Role |
|---|---|---|
| Supervisor | Raspberry Pi 5 | 50 Hz orchestration, safety policy, HTTP/WS API |
| Face MCU | ESP32-S3 (ES3C28P) | 320x240 TFT face renderer + touch/buttons telemetry |
| Reflex MCU | ESP32-S3 WROOM | Differential drive, PID, encoders, IMU, ultrasonic, safety |
| AI Server | PC with 3090 Ti (off-robot) | Planner/conversation LLM + TTS on LAN (`LLM_BACKEND=ollama |
| Motor Driver | TB6612FNG | Dual H-bridge for differential drive |
| Power | 2S LiPo | Split into dirty (motors) and clean 5V regulated rails |
robot-buddy/
βββ supervisor/ # Python supervisor (Raspberry Pi 5, process-isolated workers)
β βββ core/ # 50 Hz tick loop, state machine, safety, behavior engine
β βββ devices/ # MCU clients (reflex, face), protocol, expressions
β βββ io/ # Serial transport, COBS framing, CRC
β βββ workers/ # Process-isolated workers (TTS, vision, AI)
β βββ messages/ # NDJSON envelope, event/action types
β βββ api/ # FastAPI HTTP/WebSocket server, param registry
β βββ mock/ # Mock Reflex MCU for testing (PTY-based)
β βββ tests/ # pytest test suite
β βββ pyproject.toml # Package metadata, deps
βββ server/ # AI planner server (3090 Ti, FastAPI + backend switch)
β βββ app/ # FastAPI app, LLM/STT/TTS backends, prompts, schemas
β βββ tests/ # pytest test suite
β βββ Modelfile # Legacy Ollama model config
β βββ pyproject.toml # Package metadata, deps
βββ esp32-face/ # Face MCU firmware (ESP32-S3, C/C++, ESP-IDF)
β βββ main/ # TFT face rendering + touch/buttons + USB protocol
βββ esp32-reflex/ # Reflex MCU firmware (ESP32-S3, C/C++, ESP-IDF)
β βββ main/ # Differential drive, PID, IMU, safety, encoders
βββ dashboard/ # React dashboard (Vite + TypeScript + Biome)
β βββ src/ # Components, hooks, stores, tabs
βββ specs/ # Completed specifications (immutable reference)
βββ docs/ # TODO, architecture, protocols, wiring, power, research
βββ deploy/ # Deployment (systemd service, install/update scripts)
βββ tools/ # Dev utilities (face sim V3, parity check)
βββ training/ # Wake word model training
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β On Robot β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Raspberry Pi 5 β Supervisor β β
β β β β
β β 50 Hz tick loop: β β
β β read telemetry β state machine β safety β β
β β policies β send commands β broadcast β β
β β β β
β β HTTP API (:8080) WebSocket (:8080/ws) β β
β β Vision process (separate OS process, 10-20Hz) β β
β ββββββββ¬βββββββββββββββββββββββ¬βββββββββββββββββββ β
β β USB serial (COBS) β USB serial (COBS) β
β ββββββββΌβββββββ βββββββΌββββββββ β
β β Reflex MCU β β Face MCU β β
β β ESP32-S3 β β ESP32-S3 β β
β β β β β β
β β Motors, PID β β 320x240 TFT β β
β β Encoders β β Face + β β
β β IMU, Range β β Touch UI β β
β β Safety β β β β
β βββββββββββββββ βββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β HTTP (LAN, optional)
β
ββββββββββΌββββββββββββββββββββββββββββ
β AI Server (3090 Ti PC) β
β β
β FastAPI planner server β
β LLM backend: ollama | vllm β
β TTS: Orpheus (vLLM) + espeak shed β
β POST /plan / WS /converse /tts β
ββββββββββββββββββββββββββββββββββββββ
BOOT β IDLE β TELEOP / WANDER β ERROR
- BOOT β IDLE: automatic when Reflex MCU connects with no faults
- IDLE β TELEOP/WANDER: via
set_modecommand - Any β ERROR: on disconnect, ESTOP, TILT, or BROWNOUT
- ERROR β IDLE: via
clear_error()when Reflex is healthy
- Mode gate β no motion outside TELEOP/WANDER
- Fault gate β any fault β zero twist
- Reflex disconnect β zero twist
- Ultrasonic range scaling (hard stop at 250 mm, 50% at 500 mm)
- Stale range fallback (50% cap)
- Vision confidence scaling
- Stale vision timeout (500 ms)
Safety-critical enforcement also runs on the Reflex MCU itself (acceleration limits, command TTL, hard stop). The supervisor applies additional caps above this.
Binary packets over USB serial with COBS framing:
[type:u8][seq:u8][payload:N][crc16:u16-LE]
For esp32-face, this protocol carries face state/gesture/system/talking commands and touch/button/status telemetry only. Audio transport is supervisor-side USB audio.
Auto-reconnect with exponential backoff (0.5sβ5s). See docs/protocols.md for packet definitions.
| Component | Stack |
|---|---|
| Supervisor | Python 3.11+, asyncio, FastAPI, uvicorn, pyserial, OpenCV |
| AI Server | Python 3.11+, FastAPI, httpx, Pydantic, Ollama (compat) + vLLM (migration target) |
| ESP32 Firmware | C/C++, ESP-IDF (FreeRTOS), CMake |
| Build (Python) | Hatchling via pyproject.toml, uv for dependency management |
| Build (ESP32) | idf.py build (CMake), source ~/esp/esp-idf/export.sh |
| Dashboard | React 19, Vite, TypeScript, Zustand, TanStack Query |
| Tests | pytest, pytest-asyncio, Vitest |
| Linting | ruff (Python), clang-format + cppcheck (C++), Biome (TypeScript) |
cd supervisor
uv sync --group dev
# Run with mock hardware (no physical robot needed)
just run-mock
# Run with real hardware
just run
# Other options
uv run python -m supervisor --no-vision # Disable vision worker
uv run python -m supervisor --http-port 8080 # Custom HTTP port
uv run python -m supervisor --planner-api http://10.0.0.20:8100 --robot-id robot-1# Install and run the server
cd server
uv sync --extra dev --extra llm --extra stt --extra tts
# Recommended testing profile (vLLM planner + CPU STT + espeak)
LLM_BACKEND=vllm STT_DEVICE=cpu TTS_BACKEND=espeak \
uv run --extra llm --extra stt --extra tts python -m app.mainThe server starts on port 8100. See server/README.md for full API docs and configuration.
Requires ESP-IDF toolchain.
cd esp32-face # or esp32-reflex
idf.py build
idf.py flash
idf.py monitorjust run-dashboard # dev server with hot reload
just build-dashboard # production build β supervisor/static/All commands are available via just (see justfile):
just test-all # run all tests (supervisor, server, dashboard)
just lint # check Python + C++ + dashboard
just lint-fix # auto-fix formatting
just preflight # full pre-commit check (lint + tests + parity)
just sim # run face simulator V3
just check-parity # verify simβMCU constant alignmentThe supervisor includes a PTY-based mock Reflex MCU (supervisor/mock/mock_reflex.py) that simulates serial communication, telemetry, and fault injection. Use just run-mock to run the full supervisor stack without any hardware.
When the supervisor is running, open http://<robot_ip>:8080 in a browser for:
- Live telemetry display with diagnostic tree
- Mode control (IDLE, TELEOP, WANDER)
- E-STOP button
- Face control (moods, gestures, talking, conversation state)
- Parameter tuning sliders (PID gains, speed limits, safety thresholds)
- Monitor tab (device health, comms, power, sensors, faults, workers)
- MJPEG video stream (if vision enabled)
Supervisor β YAML config file (schema in supervisor/config.py):
- Sections: serial, control, safety, network, logging, vision
- Default serial paths:
/dev/robot_reflex,/dev/robot_face(via udev symlinks)
AI Server β environment variables:
LLM_BACKEND,VLLM_MODEL_NAME,LLM_MAX_INFLIGHT,PERFORMANCE_MODE- legacy compatibility:
OLLAMA_URL,MODEL_NAME,PLAN_TIMEOUT_S,TEMPERATURE,NUM_CTX - See
server/README.mdfor the full table
ESP32 β sdkconfig.defaults + config.h constants
| Endpoint | Method | Description |
|---|---|---|
/status |
GET | Current robot state (JSON) |
/params |
GET | Full parameter registry |
/params |
POST | Transactional parameter updates |
/actions |
POST | RPC: set_mode, e_stop, clear_e_stop |
/video |
GET | MJPEG stream (if vision enabled) |
/debug/devices |
GET | Device connection state |
/debug/planner |
GET | Planner state |
/debug/mcu_benchmark |
GET | MCU benchmark run status |
/ws |
WS | Telemetry stream (20 Hz, JSON) |
/ws/logs |
WS | Live log stream |
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Server + selected LLM backend status |
/plan |
POST | Accept world state + robot_id/seq/monotonic_ts_ms, return plan + plan_id echo metadata |
/converse |
WS | Conversation stream (single active session per robot_id) |
/tts |
POST | Direct TTS with optional metadata (robot_id, seq, monotonic_ts_ms) |
Plan actions: say(text), emote(name, intensity), gesture(name, params), skill(name) β planner proposes intent and supervisor executes deterministic skills.
| Failure condition | Immediate supervisor action | Motion policy | Face policy | Speech policy |
|---|---|---|---|---|
/plan unreachable / non-200 |
Mark planner disconnected; skip remote plan apply | Local deterministic only (patrol_drift/avoid_obstacle/safe stop) |
confused gesture with cooldown |
Cancel queued planner speech |
/converse TTS fails mid-turn |
Stop playback and clear talking flag | No change to motion authority | Show thinking briefly then restore previous mood |
Attempt fallback backend once; if unavailable, skip speech |
- Supervisor: 50 Hz control loop, state machine, safety policies, conversation state, mood choreography
- Supervisor: serial transport with COBS framing, CRC, auto-reconnect, protocol v2 (timestamps + seq)
- Supervisor: FastAPI HTTP/WebSocket API with telemetry streaming
- Supervisor: process-isolated workers (TTS, vision, AI, ear)
- Supervisor: mock Reflex MCU for hardware-free development
- AI Server: FastAPI + vLLM planner (Qwen), conversation, TTS (Orpheus + espeak)
- ESP32 Face: TFT face rendering, 13 moods, 13 gestures, conversation border, touch/button telemetry
- ESP32 Reflex: motor control, PID, encoders, IMU, safety enforcement
- Dashboard: React 19, live telemetry, face control, monitor tab, mode control
- Voice pipeline: wake word ("hey buddy") + VAD β STT β LLM β TTS β face animation
- WANDER mode driven by deterministic skills + planner intent
- Deterministic telemetry: timestamps, sequence numbers, clock sync, raw packet logging
- Reflex MCU hardware commissioning (breadboard bring-up)
- Personality engine implementation (spec complete, implementation pending)
- Conversation memory / interaction history
- Wake word model improvements (recall 42% β 80%+)
- Additional modes: LINE_FOLLOW, BALL, CRANE, CHARGING
- Home Assistant integration
- Voice ID / speaker identification
See docs/TODO.md for the detailed backlog and specs/ for design specifications.