Skip to content

isx9/voice2query

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Voice2Query

Python License Status

A voice-driven natural language interface for querying a tourism database. Ask a question in English or Italian — by voice or text — and get back a SQL query, a results table, and a chart. No SQL knowledge required.

Built for the Data Management course at the University of Naples Federico II (2026).

Table of Contents

Demo

Voice query▶ Watch the demo

Text query▶ Watch the demo

Both demos follow the same pipeline: the question (spoken or typed) is transcribed if needed, corrected for ASR errors, translated into SQL by WrenAI, run against campania_tourism, and rendered as a table + chart.

Architecture

Voice / Text
    |
    v
Whisper (ASR)
    |
    v
ASR Error Correction
    |
    v
WrenAI (Text-to-SQL) — GPT-4o-mini
    |
    v
PostgreSQL (campania_tourism)
    |
    v
Results + Chart

Project Structure

voice2query/
├── asr/                          — Task 1: Speech-to-Text (Whisper)
│   ├── task1_speech_to_text.ipynb
│   └── README.md
├── databases/                    — PostgreSQL schema and data
│   └── database_postgresql/
│       ├── 01_schema1.sql
│       ├── 02_seed_data1.sql
│       ├── 03_queries1.sql
│       ├── 04_init_db1.py
│       └── README.md
├── text2sql/                     — Task 2: Text-to-SQL (WrenAI)
│   ├── docker/
│   │   ├── docker-compose.yaml
│   │   ├── config.yaml
│   │   ├── .env.example
│   │   └── README.md
│   └── task2_text_to_sql.ipynb
├── pipeline/                     — Full end-to-end pipeline
│   ├── voice2query_pipeline.ipynb
│   └── README.md
├── dashboard/                    — Streamlit web interface
│   ├── app.py
│   └── README.md
├── docs/                         — Architecture and design documentation
├── start.bat                     — One-click startup script (Windows)
└── README.md

Database

The campania_tourism PostgreSQL database covers the main tourist destinations of the Campania region, including cities, attractions, hotels, restaurants, events, users, bookings and reviews.

Table Rows
cities 10
attractions 20
hotels 19
restaurants 15
events 12
users 10
bookings 13
reviews 15

Technology Stack

Component Technology
Speech-to-Text OpenAI Whisper (turbo)
ASR Correction Custom edit-distance module
Text-to-SQL WrenAI 0.29
LLM GPT-4o-mini (OpenAI API)
Embeddings text-embedding-3-small
Vector store Qdrant
Database PostgreSQL 16
Web interface Streamlit
Visualisation Plotly

How to Run

Prerequisites

  • Docker Desktop 4.17+
  • Python 3.11+
  • An OpenAI API key (platform.openai.com) — each user needs their own key

Note: start.bat is Windows-only. On macOS/Linux, run the Docker and database steps manually using the commands below — there is no start.sh equivalent yet (contributions welcome).

First-time setup

1. Create the PostgreSQL container (only once per machine):

docker run --name campania-pg -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres -p 5432:5432 -d postgres:16

⚠️ postgres/postgres is a local development default only — don't reuse it for anything beyond running this project on your own machine.

2. Configure your OpenAI API key:

cp text2sql/docker/.env.example text2sql/docker/.env

Then open text2sql/docker/.env and fill in the key: OPENAI_API_KEY=sk-...

3. Install Python dependencies:

pip install openai-whisper streamlit pandas plotly sqlalchemy psycopg2-binary requests sounddevice scipy

Start everything (Windows)

.\start.bat

This will:

  1. Check that Docker is running
  2. Start all WrenAI Docker containers
  3. Start the PostgreSQL container
  4. Initialise the database
  5. Launch the dashboard at http://localhost:8501

WrenAI interface: http://localhost:3000

Start everything (macOS/Linux)

docker compose -f text2sql/docker/docker-compose.yaml up -d
docker start campania-pg
python databases/database_postgresql/04_init_db1.py
streamlit run dashboard/app.py

Notebooks

Notebook Description
asr/task1_speech_to_text.ipynb Whisper transcription and ASR correction
text2sql/task2_text_to_sql.ipynb Text-to-SQL with 7 query tests
pipeline/voice2query_pipeline.ipynb Full end-to-end pipeline

Results

A quick snapshot — full details in docs/documentation.md:

  • Text-to-SQL: 7/7 test queries succeeded across JOIN, GROUP BY, HAVING, subqueries, CASE, and multi-table LEFT JOIN, in both English and Italian
  • ASR: Whisper turbo transcribed clean-audio queries correctly; the custom edit-distance corrector fixed 4/4 simulated domain-specific error types
  • Average end-to-end response time: ~20–25 seconds per query (WrenAI + GPT-4o-mini)

Languages Supported

English and Italian — these are the two languages the pipeline has been built and tested for, including the ASR correction module's number-word mapping and the domain query test set.

Whisper's underlying turbo model supports 99+ languages out of the box, so the pipeline could in principle be extended to other languages. That would require updating the ASR correction module and re-validating accuracy for each new language — it hasn't been tested here.

Documentation

The docs/ folder contains:

  • documentation.md — full system architecture, methodology, LLM/DBMS selection process, results, and conclusions
  • related-works.md — analysis of the 8 papers that informed the project's design decisions
  • WrenAI.md — in-depth look at WrenAI's architecture, MDL, and known limitations

Limitations

  • Requires an active OpenAI API key and incurs a small per-query cost (~$0.002)
  • Average response time of 20–25 seconds is fine for a demo, not for production-grade interactivity
  • The ASR corrector relies on a fixed keyword list and doesn't generalise beyond the Campania tourism vocabulary
  • start.bat only supports Windows out of the box

Authors & Acknowledgments

Built by Isabella Di Lorenzi and Maria Pasconcino for the Data Management final project, University of Naples Federico II (2026).

Thanks to the authors of the papers referenced in docs/related-works.md, whose work on cascaded and end-to-end Speech-to-SQL pipelines, ASR error correction, and multilingual NLIDBs shaped some design decisions in this project.

License

This project is released under the MIT License — feel free to use, adapt, or build on it for your own work.

About

Voice-driven NL→SQL assistant for a tourism database — ask a question in English or Italian, get back SQL, a results table, and a chart. Whisper ASR + WrenAI/GPT-4o-mini + PostgreSQL + Streamlit.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages