A self-contained demo that translates natural language into SQL queries using an LLM (Azure OpenAI) by default, with a rule-based fallback engine, a Streamlit UI with chat history and auto-visualization, a Microsoft Teams bot, and optional golden-SQL evaluation.
Status: Demo / educational project – not for production use. All data is synthetic. Translations may be incomplete or inaccurate.
┌─────────────────────────┐
│ Microsoft Teams │
│ (Bot Framework) │
└────────────┬────────────┘
│ HTTPS
┌────────────▼────────────┐
│ Azure App Service │
│ (webapp.py + bot.py) │
│ Managed Identity auth │
└────────────┬────────────┘
│
┌────────────────────────────────────┐│┌──────────────────────────┐
│ Streamlit UI (app.py) │││ Azure OpenAI Service │
│ ││└──────────────────────────┘
│ Chat history + Auto-viz + Eval │├──────────┐
│ az login (local dev auth) ││ │
└────────────────┬──────────────────┘│ │
│ │ │
┌────────▼────────┐ ┌───────▼──────┐ │
│ LLM Engine │ │ Rule-based │ │
│ (llm_engine.py)│ │ (engine.py) │ │
└────────┬────────┘ └──────┬───────┘ │
└──────────┬───────┘ │
┌─────▼─────┐ │
│ SQLite │ │
│ (db.py) │ │
└─────┬─────┘ │
┌─────▼─────┐ │
│ visualize │ │
│ .py │ │
└───────────┘ │
| Module | Purpose |
|---|---|
nl2sql/schema.py |
Defines tables, columns, aliases, and join relationships from the schema CSV. |
nl2sql/db.py |
Creates the SQLite database, runs DDL, and loads seed CSVs. Provides execute_sql() for safe read-only queries. |
nl2sql/llm_engine.py |
Sends NL + schema to Azure OpenAI Chat Completions API. Supports both Entra ID (managed identity / az login) and API key auth. |
nl2sql/engine.py |
Rule-based NL to SQL translator using regex pattern matching. Fallback engine. |
nl2sql/eval.py |
Golden-SQL evaluation: loads reference pairs, runs the engine, compares results. |
nl2sql/visualize.py |
Suggests the best chart type (bar, line, pie, scatter, area) based on data shape and query intent. |
app.py |
Streamlit chat-style app with conversation history, auto-visualization, and evaluation sidebar. |
webapp.py |
aiohttp web server entry-point for the Teams bot (deployed to Azure App Service). |
bot.py |
Bot Framework ActivityHandler that processes Teams messages, translates to SQL, returns Adaptive Cards. |
function_app/teams-manifest/ |
Teams app manifest template. Replace {{MICROSOFT_APP_ID}} with your App ID before packaging. |
| Resource | Purpose | SKU / Tier |
|---|---|---|
| Azure OpenAI Service or Azure AI Foundry | LLM for NL-to-SQL translation | Any tier with a chat deployment (e.g. GPT-4o-mini). Models can be provisioned through Azure OpenAI directly or via an Azure AI Foundry project. |
| App Registration (Entra ID) | Bot identity for Bot Framework auth | Single-tenant or multi-tenant |
| Azure Bot | Routes messages between Teams and the App Service | Free (F0) or Standard (S1) |
| Azure App Service (Linux) | Hosts the bot web server (webapp.py) |
B1 or higher (Python 3.11, Linux) |
| App Service Plan | Compute plan for the App Service | Basic B1 (Linux) |
| Principal | Role | Scope | When |
|---|---|---|---|
| App Service managed identity | Cognitive Services OpenAI User | Azure OpenAI resource | Using Azure OpenAI directly |
| App Service managed identity | Azure AI Developer | Azure AI Foundry project | Using a model deployed via AI Foundry |
| Your user account (local dev) | Cognitive Services OpenAI User | Azure OpenAI resource | Using Azure OpenAI directly |
| Your user account (local dev) | Azure AI Developer | Azure AI Foundry project | Using a model deployed via AI Foundry |
# 1. Clone and enter the repo
cd nl2sql-demo
# 2. Create a virtual environment
python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS / Linux
# source .venv/bin/activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Configure the LLM
cp .env.example .env
# Edit .env and set AZURE_OPENAI_ENDPOINT and AZURE_OPENAI_DEPLOYMENT
# 5. Authenticate to Azure (for Entra ID auth)
az login
# 6. Launch the app
streamlit run app.pyThe database is auto-created on first run. Click Initialize DB in the sidebar to reset it at any time.
| Mode | How it works | Requires |
|---|---|---|
| LLM (default) | Sends the NL question + schema to Azure OpenAI Chat Completions API | AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_DEPLOYMENT in .env + az login |
| Rule-based | Local pattern-matching, no network calls | Nothing — works offline |
The sidebar has a radio toggle to switch between engines. If Azure OpenAI configuration is missing, the app automatically falls back to rule-based mode.
| Variable | Required | Default | Description |
|---|---|---|---|
AZURE_OPENAI_ENDPOINT |
Yes (LLM mode) | — | Azure OpenAI endpoint URL |
AZURE_OPENAI_DEPLOYMENT |
Yes (LLM mode) | — | Azure OpenAI deployment name |
AZURE_OPENAI_API_VERSION |
No | 2024-02-01 |
Azure OpenAI API version |
AZURE_OPENAI_API_KEY |
No | — | API key (fallback if Entra ID is unavailable) |
MICROSOFT_APP_ID |
Yes (Teams bot) | — | Bot App Registration client ID |
MICROSOFT_APP_PASSWORD |
Yes (Teams bot) | — | Bot App Registration client secret |
MICROSOFT_APP_TENANT_ID |
Yes (Teams bot) | — | Azure AD tenant ID (for single-tenant bots) |
The Streamlit UI uses a conversational chat interface (st.chat_input + st.chat_message). All questions and responses are stored in st.session_state and persist across reruns within the same browser session. Use the Clear history button in the sidebar to reset.
When Auto-visualize results is enabled (default), the app analyzes each query result and automatically renders the most appropriate chart:
| Chart type | When it's used |
|---|---|
| Bar | Categorical grouping with numeric values (e.g. "total value by plant") |
| Pie | Small number of categories with distribution keywords (e.g. "breakdown by vendor") |
| Line | Time-series data with trend keywords (e.g. "orders over time") |
| Scatter | Two numeric columns with many rows |
| Area | Time-series with area-style keywords |
The logic lives in nl2sql/visualize.py and uses heuristics based on column types, row count, and NL query keywords.
The bot is deployed as an Azure App Service running an aiohttp web server (webapp.py + bot.py). Users can chat with the NL2SQL agent directly in Microsoft Teams. The bot returns results as Adaptive Cards with SQL, data tables, and visualization suggestions.
| Command | Description |
|---|---|
| (any question) | Translates to SQL, executes, and returns results as an Adaptive Card |
help |
Shows available commands and example queries |
history |
Shows recent queries for the conversation |
clear history |
Clears conversation history |
- An App Registration in Entra ID (Azure AD) with a client secret
- An Azure Bot resource linked to the App Registration
- An Azure App Service (Linux, Python 3.11) with system-assigned managed identity
- The managed identity granted Cognitive Services OpenAI User on your Azure OpenAI resource
- Azure Portal → App registrations → New registration
- Name: e.g.
NL2SQL Bot - Supported account types: Single tenant (or multi-tenant)
- Click Register
- Note the Application (client) ID and Directory (tenant) ID
- Go to Certificates & secrets → New client secret → copy the Value
- Azure Portal → Create a resource → search Azure Bot
- Bot handle: e.g.
nl2sql-teams-bot - Type of App: match your App Registration (Single Tenant / Multi Tenant)
- Use existing app registration: paste your App ID
- Click Create
- Go to the bot → Channels → click Microsoft Teams → Apply
# Create an App Service Plan (Linux)
az appservice plan create \
--name nl2sql-bot-plan \
--resource-group <your-rg> \
--location <region> \
--sku B1 \
--is-linux
# Create the Web App
az webapp create \
--name <your-app-name> \
--resource-group <your-rg> \
--plan nl2sql-bot-plan \
--runtime "PYTHON:3.11"
# Enable managed identity
az webapp identity assign \
--name <your-app-name> \
--resource-group <your-rg>
# Set the startup command
az webapp config set \
--name <your-app-name> \
--resource-group <your-rg> \
--startup-file "python webapp.py"
# Set environment variables
az webapp config appsettings set \
--name <your-app-name> \
--resource-group <your-rg> \
--settings \
MICROSOFT_APP_ID=<your-app-id> \
MICROSOFT_APP_PASSWORD=<your-client-secret> \
MICROSOFT_APP_TENANT_ID=<your-tenant-id> \
AZURE_OPENAI_ENDPOINT=<your-openai-endpoint> \
AZURE_OPENAI_DEPLOYMENT=<your-deployment-name> \
WEBSITES_PORT=8000 \
WEBSITES_CONTAINER_START_TIME_LIMIT=600
# Deploy the code
az webapp up \
--name <your-app-name> \
--resource-group <your-rg> \
--runtime "PYTHON:3.11"In the Azure Portal, go to your Azure OpenAI resource → Access control (IAM) → Add role assignment:
- Role: Cognitive Services OpenAI User
- Assign access to: Managed identity → App Service → select your app
- Click Review + assign
In the Azure Bot resource → Configuration → set Messaging endpoint to:
https://<your-app-name>.azurewebsites.net/api/messages
- Copy
function_app/teams-manifest/manifest.jsonand replace{{MICROSOFT_APP_ID}}with your actual App ID - Add two icon files to the same folder:
color.png— 192×192 px full-color iconoutline.png— 32×32 px transparent-background white outline icon
- Zip the three files (manifest.json + color.png + outline.png) into a single ZIP — files must be at the root of the ZIP, not inside a subfolder
- In Teams → Apps → Manage your apps → Upload a custom app → select the ZIP file
- Prepare the ZIP as described above
- Go to Teams Admin Center → Teams apps → Manage apps → Upload new app
- Upload the ZIP file
- Set app policies to control which users can access the bot
- Go to the Azure Bot resource → Test in Web Chat
- Type a query to verify the bot responds before deploying to Teams
pytest -vTests use a temporary SQLite database so they don't interfere with your main
data/demo.sqlite.
| # | Natural language input |
|---|---|
| 1 | List all open purchase orders |
| 2 | Show all plants |
| 3 | Top 5 purchase orders by open value in USD |
| 4 | Count of purchase orders by vendor |
| 5 | Total open value by plant |
| 6 | Show purchase orders for vendor Honeywell Aerospace |
| 7 | Show plants in the US |
| 8 | Count of materials by segment |
| 9 | Show purchase orders with open quantity greater than 100 |
| 10 | Total open value by material group |
| 11 | Average lead time by plant |
| 12 | Top 3 plants by total open purchase order value |
| 13 | Show materials with lifecycle Active |
| 14 | Show purchase orders with exception message Expedite |
The schema is auto-loaded from data/snowflake_table_columns.csv (28 tables
from DEMO_ANALYTICS.DEMO_PUBLISHED). Key business tables:
AIML_OPEN_PURCHASE_ORDERS (47 columns) - Open PO lines with vendor, plant, material, qty, value
CORE_PLANT (23 columns) - Plant master data with location info
EDW_INVENTORY_SEGMENTATION_SNAPSHOT (113 columns) - Inventory snapshots
EDW_MATL_LOC_DEMAND_INFO (24 columns) - Material demand information
Plus 24 SAP raw tables (EKKO, EKPO, MARA, MARC, LFA1, etc.).
Seed data: 10 plants, 30 POs, 40 inventory snapshots, 25 demand records — all synthetic dummy values.
The LLM engine uses the openai Python package with the AzureOpenAI client.
Authentication order:
- If
AZURE_OPENAI_API_KEYis set → uses API key auth - Otherwise → uses
DefaultAzureCredential(managed identity in Azure,az loginlocally)
Edit the _SYSTEM_PROMPT in nl2sql/llm_engine.py to adjust the
instructions, add few-shot examples, or change the output format.
- Replace
data/snowflake_table_columns.csvwith an updated export. - The schema is parsed automatically at import time — no code changes needed.
- Add join relationships in
nl2sql/schema.pyif applicable. - Regenerate seed data:
python generate_seed_data.py - Re-initialise: click Initialize DB or call
init_db(force=True).
- Open
nl2sql/engine.py. - Pattern matching happens in dedicated
_detect_*functions (tables, aggregates, columns, filters, group-by, ordering). - To handle a new phrase, add a regex or keyword check in the appropriate
function and update the
_QueryPlan. - Add corresponding golden examples in
data/golden/golden.sql.jsonland runpytestto verify.
- Demo only – This is an educational prototype, not production software.
- LLM accuracy – The LLM engine generally produces better SQL than the rule-based engine, but may still hallucinate columns or misinterpret ambiguous queries. Always review generated SQL.
- Rule-based – The rule-based engine handles common patterns but will fail on complex or ambiguous queries.
- API costs – LLM mode makes API calls to Azure OpenAI. Each query costs a small amount of tokens.
- Synthetic data – All names, emails, and transactions are fake. No personally identifiable information (PII) is used.
- No authentication / authorisation – The Streamlit app exposes a raw SQL execution path (read-only). Do not deploy on an untrusted network without additional safeguards.
- SQL injection mitigation – Table and column names are strictly white-listed from the schema. User-supplied values in filters are embedded as literals (not parameterised) for simplicity; a production system should use parameterised queries throughout.
- Accuracy – Translations are best-effort. Always review generated SQL before using results for any decision-making.
This project is provided as-is for demonstration purposes.