Skip to content

0mdb/robot-buddy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

305 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Robot Buddy

A kid-safe, expressive robot platform combining real-time motor control, an animated TFT face, and optional networked AI planner.

How It Works

Two ESP32-S3 microcontrollers handle the deterministic, safety-critical work: one drives motors with PID control and enforces safety limits, the other renders an animated face on a 320x240 TFT touch display (esp32-face). A Raspberry Pi 5 orchestrates everything at 50 Hz β€” reading sensors, running the state machine, applying layered safety policies, and streaming telemetry to a browser UI. An optional AI planner server on a separate machine (3090 Ti) generates expressive behavior plans via a local LLM.

Reflexes are local and deterministic. Planner is remote and optional.

Hardware

Component Hardware Role
Supervisor Raspberry Pi 5 50 Hz orchestration, safety policy, HTTP/WS API
Face MCU ESP32-S3 (ES3C28P) 320x240 TFT face renderer + touch/buttons telemetry
Reflex MCU ESP32-S3 WROOM Differential drive, PID, encoders, IMU, ultrasonic, safety
AI Server PC with 3090 Ti (off-robot) Planner/conversation LLM + TTS on LAN (`LLM_BACKEND=ollama
Motor Driver TB6612FNG Dual H-bridge for differential drive
Power 2S LiPo Split into dirty (motors) and clean 5V regulated rails

Repository Layout

robot-buddy/
β”œβ”€β”€ supervisor/       # Python supervisor (Raspberry Pi 5, process-isolated workers)
β”‚   β”œβ”€β”€ core/            # 50 Hz tick loop, state machine, safety, behavior engine
β”‚   β”œβ”€β”€ devices/         # MCU clients (reflex, face), protocol, expressions
β”‚   β”œβ”€β”€ io/              # Serial transport, COBS framing, CRC
β”‚   β”œβ”€β”€ workers/         # Process-isolated workers (TTS, vision, AI)
β”‚   β”œβ”€β”€ messages/        # NDJSON envelope, event/action types
β”‚   β”œβ”€β”€ api/             # FastAPI HTTP/WebSocket server, param registry
β”‚   β”œβ”€β”€ mock/            # Mock Reflex MCU for testing (PTY-based)
β”‚   β”œβ”€β”€ tests/           # pytest test suite
β”‚   └── pyproject.toml   # Package metadata, deps
β”œβ”€β”€ server/              # AI planner server (3090 Ti, FastAPI + backend switch)
β”‚   β”œβ”€β”€ app/             # FastAPI app, LLM/STT/TTS backends, prompts, schemas
β”‚   β”œβ”€β”€ tests/           # pytest test suite
β”‚   β”œβ”€β”€ Modelfile        # Legacy Ollama model config
β”‚   └── pyproject.toml   # Package metadata, deps
β”œβ”€β”€ esp32-face/       # Face MCU firmware (ESP32-S3, C/C++, ESP-IDF)
β”‚   └── main/            # TFT face rendering + touch/buttons + USB protocol
β”œβ”€β”€ esp32-reflex/        # Reflex MCU firmware (ESP32-S3, C/C++, ESP-IDF)
β”‚   └── main/            # Differential drive, PID, IMU, safety, encoders
β”œβ”€β”€ dashboard/           # React dashboard (Vite + TypeScript + Biome)
β”‚   └── src/             # Components, hooks, stores, tabs
β”œβ”€β”€ specs/               # Completed specifications (immutable reference)
β”œβ”€β”€ docs/                # TODO, architecture, protocols, wiring, power, research
β”œβ”€β”€ deploy/              # Deployment (systemd service, install/update scripts)
β”œβ”€β”€ tools/               # Dev utilities (face sim V3, parity check)
└── training/            # Wake word model training

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ On Robot                                             β”‚
β”‚                                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚ Raspberry Pi 5 β€” Supervisor                    β”‚  β”‚
β”‚  β”‚                                                β”‚  β”‚
β”‚  β”‚  50 Hz tick loop:                              β”‚  β”‚
β”‚  β”‚    read telemetry β†’ state machine β†’ safety     β”‚  β”‚
β”‚  β”‚    policies β†’ send commands β†’ broadcast        β”‚  β”‚
β”‚  β”‚                                                β”‚  β”‚
β”‚  β”‚  HTTP API (:8080)  WebSocket (:8080/ws)        β”‚  β”‚
β”‚  β”‚  Vision process (separate OS process, 10-20Hz) β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚         β”‚ USB serial (COBS)    β”‚ USB serial (COBS)   β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”             β”‚
β”‚  β”‚ Reflex MCU  β”‚        β”‚  Face MCU   β”‚             β”‚
β”‚  β”‚ ESP32-S3    β”‚        β”‚  ESP32-S3   β”‚             β”‚
β”‚  β”‚             β”‚        β”‚             β”‚             β”‚
β”‚  β”‚ Motors, PID β”‚        β”‚ 320x240 TFT β”‚             β”‚
β”‚  β”‚ Encoders    β”‚        β”‚ Face +      β”‚             β”‚
β”‚  β”‚ IMU, Range  β”‚        β”‚ Touch UI    β”‚             β”‚
β”‚  β”‚ Safety      β”‚        β”‚             β”‚             β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β”‚ HTTP (LAN, optional)
         β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ AI Server (3090 Ti PC)             β”‚
β”‚                                    β”‚
β”‚ FastAPI planner server             β”‚
β”‚ LLM backend: ollama | vllm         β”‚
β”‚ TTS: Orpheus (vLLM) + espeak shed  β”‚
β”‚ POST /plan / WS /converse /tts     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

State Machine

BOOT β†’ IDLE β†’ TELEOP / WANDER β†’ ERROR

  • BOOT β†’ IDLE: automatic when Reflex MCU connects with no faults
  • IDLE β†’ TELEOP/WANDER: via set_mode command
  • Any β†’ ERROR: on disconnect, ESTOP, TILT, or BROWNOUT
  • ERROR β†’ IDLE: via clear_error() when Reflex is healthy

Safety Policies (Defense in Depth)

  1. Mode gate β€” no motion outside TELEOP/WANDER
  2. Fault gate β€” any fault β†’ zero twist
  3. Reflex disconnect β†’ zero twist
  4. Ultrasonic range scaling (hard stop at 250 mm, 50% at 500 mm)
  5. Stale range fallback (50% cap)
  6. Vision confidence scaling
  7. Stale vision timeout (500 ms)

Safety-critical enforcement also runs on the Reflex MCU itself (acceleration limits, command TTL, hard stop). The supervisor applies additional caps above this.

Serial Protocol

Binary packets over USB serial with COBS framing:

[type:u8][seq:u8][payload:N][crc16:u16-LE]

For esp32-face, this protocol carries face state/gesture/system/talking commands and touch/button/status telemetry only. Audio transport is supervisor-side USB audio.

Auto-reconnect with exponential backoff (0.5s–5s). See docs/protocols.md for packet definitions.

Tech Stack

Component Stack
Supervisor Python 3.11+, asyncio, FastAPI, uvicorn, pyserial, OpenCV
AI Server Python 3.11+, FastAPI, httpx, Pydantic, Ollama (compat) + vLLM (migration target)
ESP32 Firmware C/C++, ESP-IDF (FreeRTOS), CMake
Build (Python) Hatchling via pyproject.toml, uv for dependency management
Build (ESP32) idf.py build (CMake), source ~/esp/esp-idf/export.sh
Dashboard React 19, Vite, TypeScript, Zustand, TanStack Query
Tests pytest, pytest-asyncio, Vitest
Linting ruff (Python), clang-format + cppcheck (C++), Biome (TypeScript)

Getting Started

Supervisor (Raspberry Pi 5)

cd supervisor
uv sync --group dev

# Run with mock hardware (no physical robot needed)
just run-mock

# Run with real hardware
just run

# Other options
uv run python -m supervisor --no-vision         # Disable vision worker
uv run python -m supervisor --http-port 8080    # Custom HTTP port
uv run python -m supervisor --planner-api http://10.0.0.20:8100 --robot-id robot-1

AI Planner Server (3090 Ti PC)

# Install and run the server
cd server
uv sync --extra dev --extra llm --extra stt --extra tts

# Recommended testing profile (vLLM planner + CPU STT + espeak)
LLM_BACKEND=vllm STT_DEVICE=cpu TTS_BACKEND=espeak \
uv run --extra llm --extra stt --extra tts python -m app.main

The server starts on port 8100. See server/README.md for full API docs and configuration.

ESP32 Firmware

Requires ESP-IDF toolchain.

cd esp32-face   # or esp32-reflex
idf.py build
idf.py flash
idf.py monitor

Dashboard

just run-dashboard         # dev server with hot reload
just build-dashboard       # production build β†’ supervisor/static/

Development

All commands are available via just (see justfile):

just test-all              # run all tests (supervisor, server, dashboard)
just lint                  # check Python + C++ + dashboard
just lint-fix              # auto-fix formatting
just preflight             # full pre-commit check (lint + tests + parity)
just sim                   # run face simulator V3
just check-parity          # verify sim↔MCU constant alignment

Mock Mode

The supervisor includes a PTY-based mock Reflex MCU (supervisor/mock/mock_reflex.py) that simulates serial communication, telemetry, and fault injection. Use just run-mock to run the full supervisor stack without any hardware.

Dashboard

When the supervisor is running, open http://<robot_ip>:8080 in a browser for:

  • Live telemetry display with diagnostic tree
  • Mode control (IDLE, TELEOP, WANDER)
  • E-STOP button
  • Face control (moods, gestures, talking, conversation state)
  • Parameter tuning sliders (PID gains, speed limits, safety thresholds)
  • Monitor tab (device health, comms, power, sensors, faults, workers)
  • MJPEG video stream (if vision enabled)

Configuration

Supervisor β€” YAML config file (schema in supervisor/config.py):

  • Sections: serial, control, safety, network, logging, vision
  • Default serial paths: /dev/robot_reflex, /dev/robot_face (via udev symlinks)

AI Server β€” environment variables:

  • LLM_BACKEND, VLLM_MODEL_NAME, LLM_MAX_INFLIGHT, PERFORMANCE_MODE
  • legacy compatibility: OLLAMA_URL, MODEL_NAME, PLAN_TIMEOUT_S, TEMPERATURE, NUM_CTX
  • See server/README.md for the full table

ESP32 β€” sdkconfig.defaults + config.h constants

Supervisor API

Endpoint Method Description
/status GET Current robot state (JSON)
/params GET Full parameter registry
/params POST Transactional parameter updates
/actions POST RPC: set_mode, e_stop, clear_e_stop
/video GET MJPEG stream (if vision enabled)
/debug/devices GET Device connection state
/debug/planner GET Planner state
/debug/mcu_benchmark GET MCU benchmark run status
/ws WS Telemetry stream (20 Hz, JSON)
/ws/logs WS Live log stream

AI Server API

Endpoint Method Description
/health GET Server + selected LLM backend status
/plan POST Accept world state + robot_id/seq/monotonic_ts_ms, return plan + plan_id echo metadata
/converse WS Conversation stream (single active session per robot_id)
/tts POST Direct TTS with optional metadata (robot_id, seq, monotonic_ts_ms)

Plan actions: say(text), emote(name, intensity), gesture(name, params), skill(name) β€” planner proposes intent and supervisor executes deterministic skills.

Supervisor Fallback Policy

Failure condition Immediate supervisor action Motion policy Face policy Speech policy
/plan unreachable / non-200 Mark planner disconnected; skip remote plan apply Local deterministic only (patrol_drift/avoid_obstacle/safe stop) confused gesture with cooldown Cancel queued planner speech
/converse TTS fails mid-turn Stop playback and clear talking flag No change to motion authority Show thinking briefly then restore previous mood Attempt fallback backend once; if unavailable, skip speech

Project Status

Working

  • Supervisor: 50 Hz control loop, state machine, safety policies, conversation state, mood choreography
  • Supervisor: serial transport with COBS framing, CRC, auto-reconnect, protocol v2 (timestamps + seq)
  • Supervisor: FastAPI HTTP/WebSocket API with telemetry streaming
  • Supervisor: process-isolated workers (TTS, vision, AI, ear)
  • Supervisor: mock Reflex MCU for hardware-free development
  • AI Server: FastAPI + vLLM planner (Qwen), conversation, TTS (Orpheus + espeak)
  • ESP32 Face: TFT face rendering, 13 moods, 13 gestures, conversation border, touch/button telemetry
  • ESP32 Reflex: motor control, PID, encoders, IMU, safety enforcement
  • Dashboard: React 19, live telemetry, face control, monitor tab, mode control
  • Voice pipeline: wake word ("hey buddy") + VAD β†’ STT β†’ LLM β†’ TTS β†’ face animation
  • WANDER mode driven by deterministic skills + planner intent
  • Deterministic telemetry: timestamps, sequence numbers, clock sync, raw packet logging

In Progress

  • Reflex MCU hardware commissioning (breadboard bring-up)
  • Personality engine implementation (spec complete, implementation pending)

Future

  • Conversation memory / interaction history
  • Wake word model improvements (recall 42% β†’ 80%+)
  • Additional modes: LINE_FOLLOW, BALL, CRANE, CHARGING
  • Home Assistant integration
  • Voice ID / speaker identification

See docs/TODO.md for the detailed backlog and specs/ for design specifications.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors