Skip to content

VladSh77/ai-kiosk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎙️ AI Voice Kiosk — Offline NLP for Point of Sale

Python STT TTS License Status

Developed by Fayna Digital — Author: Volodymyr Shevchenko


Interactive voice assistant for Karkandaki Armenian restaurant — a production-grade offline kiosk built by Fayna Digital.

Deployed at: ul. Kolejowa 41, Ostrów Wielkopolski, Poland
Contact: 530 324 239


Overview

An autonomous voice-driven information kiosk designed for use at trade fairs, restaurants, and events. The system listens for Polish speech, understands natural questions about the menu, prices, and promotions, and responds with a neural female voice — all 100% offline, without cloud APIs or ongoing subscription costs.

Key Features

Feature Implementation
Offline Speech Recognition Vosk with Polish language model
Offline Neural Text-to-Speech Piper TTS — VITS model pl_PL-gosia-medium, no internet required
NLP Engine Deterministic rule-based keyword matcher (zero hallucinations)
UI Tkinter fullscreen kiosk mode (no browser dependency)
Business Rules Hardcoded allergen data, age validation, forbidden topics — AI cannot invent answers
Promo Loop Background thread plays random promotions between interactions
Continuous Dialog Multi-turn conversation — asks multiple questions in one session

Architecture

┌─────────────────────────────────────────────────┐
│                  KarkandakiKiosk                 │
│  ┌──────────┐  ┌────────────┐  ┌─────────────┐  │
│  │ STTEngine│  │NLPProcessor│  │  TTSEngine  │  │
│  │  (Vosk)  │→ │ (keywords) │→ │  (Piper)    │  │
│  └──────────┘  └────────────┘  └─────────────┘  │
│                                                  │
│  Modes: PROMO ←→ DIALOG                          │
│  PROMO: background promo loop (TTS every 15s)    │
│  DIALOG: listen → match → speak → loop           │
└─────────────────────────────────────────────────┘

Dialog flow:

  1. User presses START button
  2. Kiosk says: "Słucham, w czym mogę pomóc?"
  3. STT listens → Vosk transcribes Polish speech
  4. NLP matches keywords → deterministic response (no LLM)
  5. TTS speaks response via Piper → afplay (macOS) / aplay (Linux)
  6. Loop continues until: goodbye phrase detected, 15s inactivity, or STOP pressed

Project Structure

ai-kiosk/
├── src/
│   ├── main.py                  # Entry point — KarkandakiKiosk Tkinter app
│   ├── config/
│   │   ├── settings.py          # Business rules, allergens, audio config
│   │   └── knowledge.py         # Menu data, QA knowledge base, system prompt
│   ├── nlp/
│   │   └── processor.py         # Keyword-based NLP — 15+ intent categories
│   ├── stt/
│   │   └── engine.py            # Vosk offline STT engine (Polish)
│   ├── tts/
│   │   └── engine.py            # Edge TTS neural voice engine
│   ├── kiosk/
│   │   └── kiosk_mode.py        # Chromium kiosk mode manager (Linux deploy)
│   └── assets/
│       ├── images/              # UI images
│       ├── models/vosk/         # Vosk STT model (not in git — see Setup)
│       └── models/piper/        # Piper TTS model .onnx (not in git — see Setup)
├── tests/
│   ├── test_stt.py              # STT integration test (10s mic recording)
│   └── test_tts.py              # TTS smoke test
├── scripts/
│   └── install-kiosk.sh         # systemd service installer (Linux/Raspberry Pi)
├── docs/
│   └── 01_client_constraints_adr.md  # Architecture Decision Record
├── data/
│   ├── menu.json                # Menu data (reference)
│   └── qa.json                  # Q&A pairs (reference)
├── index.html                   # Web menu page (static, no server needed)
└── requirements.txt

Setup

Requirements

  • Python 3.11 (required — piper-tts depends on onnxruntime which has no Python 3.13+ wheels yet)
  • macOS (uses afplay) or Linux (uses aplay from alsa-utils)
  • Microphone

Install

git clone https://github.com/VladSh77/ai-kiosk.git
cd ai-kiosk

# macOS: install Python 3.11 if needed
brew install python@3.11 espeak-ng

python3.11 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Download Vosk Polish Model (STT)

mkdir -p src/assets/models/vosk-model-pl
curl -L https://alphacephei.com/vosk/models/vosk-model-small-pl-0.22.zip -o vosk.zip
unzip vosk.zip -d src/assets/models/
mv src/assets/models/vosk-model-small-pl-0.22 src/assets/models/vosk-model-pl
rm vosk.zip

Download Piper Polish Voice Model (TTS)

mkdir -p src/assets/models/piper
curl -L -o src/assets/models/piper/pl_PL-gosia-medium.onnx \
  "https://huggingface.co/rhasspy/piper-voices/resolve/main/pl/pl_PL/gosia/medium/pl_PL-gosia-medium.onnx"
curl -L -o src/assets/models/piper/pl_PL-gosia-medium.onnx.json \
  "https://huggingface.co/rhasspy/piper-voices/resolve/main/pl/pl_PL/gosia/medium/pl_PL-gosia-medium.onnx.json"

Model size: ~60 MB. Alternative voice (male): replace gosia with mc_speech.

Run

source venv/bin/activate
python3 src/main.py

Configuration

All business-critical settings are in src/config/settings.py:

FULLSCREEN_MODE = True      # Lock to fullscreen (kiosk mode)
HIDE_CURSOR = True          # Hide mouse cursor
NOISE_GATE_THRESHOLD = 500  # Adjust for venue ambient noise
ENABLE_BARGE_IN = True      # User can interrupt TTS playback
MIN_AGE_RECORD = 16         # Karkandakowy Rekord age restriction (hardcoded)
UNKNOWN_RESPONSE = "Nie znam odpowiedzi, zapytaj operatora."

Allergen data is hardcoded (not AI-generated) to prevent hallucinations:

ALLERGENS_DB = {
    "karkandak_slodki_nutella": ["orzechy", "mleko", "soja", "gluten"],
    "karkandak_wytrawny_mieso": ["gluten", "jaja"],
    ...
}

NLP — Supported Intents

The keyword engine covers 15+ intent categories:

Intent Example query Response
Greeting cześć, hej Welcome message
Menu list co macie, jakie smaki Full menu with price
Price ile kosztuje, cena "8 zł per piece"
What is karkandak co to jest Product description
Recommendation co polecasz, co wziąć Taste-based suggestion
Spicy/mild ostre, pikantne Spice level guide
Children dla dzieci, dziecko Safe recommendation
Allergens alergen, gluten, orzechy Hardcoded safe answer
Opening hours godziny, otwarte 8:00–22:00 daily
Address gdzie, adres, ulica ul. Kolejowa 41
Delivery dowóz, dostawa 10 zł delivery info
Challenge rekord, wyzwanie Rules + age warning
Goodbye dziękuję, do widzenia Farewell + session end

Deployment (Linux / Raspberry Pi)

sudo bash scripts/install-kiosk.sh
sudo systemctl status ai-kiosk
journalctl -u ai-kiosk -f

The service auto-restarts on failure and starts at boot.


Architecture Decisions

See docs/01_client_constraints_adr.md for the full ADR covering:

  • Why offline STT (Vosk) over cloud (Google/Whisper)
  • Why deterministic NLP over LLM
  • Why Piper TTS over edge-tts (offline VITS vs cloud-dependent Neural)
  • Noise gate strategy for trade fair environments
  • GDPR-compliant lead collection design

Business Context

Client: Karkandaki restaurant — Armenian snack bar at trade fairs in Poland
Use case: Hands-free customer service at busy market stands
Problem solved: Staff cannot simultaneously serve customers and answer repetitive questions
ROI: Handles 100% of FAQ traffic autonomously, frees staff for upselling


Built by

Fayna Digital — Systems architecture & AI automation agency
fayna.agency · github.com/VladSh77

Core Tech: Python 3.11 · Vosk · Piper TTS · Tkinter · 100% offline architecture

About

AI Sales Kiosk на базі Voiceflow — автоматизований AI-продавець для виставок і точок продажу

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors