A LangGraph-powered multi-agent system that answers complex business questions by intelligently orchestrating SQL generation, document retrieval, and answer synthesis — all through natural language.
Business users frequently ask questions that span both structured databases AND unstructured documents. For example:
"What were our top 5 products by revenue last quarter, and what did customer reviews say about each?"
Traditional chatbots handle either SQL OR document search — never both intelligently. MADA solves this by deploying specialized agents that collaborate to deliver unified, sourced answers.
┌─────────────────┐
│ User Query │
└────────┬────────┘
│
┌────────▼────────┐
│ ROUTER AGENT │
│ (Intent Classifier│
│ + Query Planner) │
└──┬─────┬─────┬──┘
│ │ │
┌────────────┘ │ └────────────┐
│ │ │
┌────────▼────────┐ ┌──────▼──────┐ ┌────────▼────────┐
│ SQL AGENT │ │ DOC AGENT │ │ HYBRID PATH │
│ │ │ │ │ (Both Agents │
│ • Schema Lookup │ │ • ChromaDB │ │ in parallel) │
│ • SQL Generation │ │ • Retrieval │ │ │
│ • SQL Validation │ │ • Re-ranking│ │ │
│ • Execution │ │ • Citation │ │ │
└────────┬────────┘ └──────┬──────┘ └────────┬────────┘
│ │ │
└──────────┬───────┘───────────────────┘
│
┌────────▼────────┐
│ SYNTHESIZER │
│ │
│ • Merge Results │
│ • Resolve Conflicts│
│ • Format Response│
│ • Add Citations │
└────────┬────────┘
│
┌────────▼────────┐
│ Final Answer │
│ (with sources) │
└─────────────────┘
- Intelligent Query Routing: Automatically classifies queries as SQL-only, document-only, or hybrid using LLM-based intent detection.
- SQL Agent with Guardrails: Generates, validates (via
sqlparse), and executes SQL with automatic retry on failure. Supports schema-aware generation. - Document Agent with Re-ranking: Retrieves from ChromaDB vector store with cross-encoder re-ranking for precision.
- Conflict Resolution: When SQL data and documents disagree, the synthesizer flags discrepancies with source attribution.
- Conversation Memory: Maintains context across follow-up questions using LangGraph's built-in state management.
- Observability: Full agent decision traces via LangSmith integration + structured JSON logging.
- Evaluation Suite: Built-in accuracy measurement for SQL generation and retrieval quality.
- Dual LLM Support: OpenAI GPT-4o-mini as primary, Ollama (local) as fallback for cost-conscious or offline usage.
mada/
├── README.md
├── PROJECT_SPEC.md # Detailed implementation spec
├── pyproject.toml # Package config + dependencies
├── .env.example # Environment variable template
├── Makefile # Common commands (setup, test, run)
│
├── src/
│ └── mada/
│ ├── __init__.py
│ ├── config.py # Settings, env vars, model configs
│ │
│ ├── agents/
│ │ ├── __init__.py
│ │ ├── router.py # Intent classification + query planning
│ │ ├── sql_agent.py # Text-to-SQL generation + execution
│ │ ├── doc_agent.py # Document retrieval + re-ranking
│ │ └── synthesizer.py # Result merging + conflict resolution
│ │
│ ├── graph/
│ │ ├── __init__.py
│ │ ├── state.py # LangGraph state schema definition
│ │ ├── nodes.py # Graph node functions
│ │ └── builder.py # Graph construction + compilation
│ │
│ ├── tools/
│ │ ├── __init__.py
│ │ ├── sql_tool.py # SQL execution tool (LangChain Tool)
│ │ ├── retriever_tool.py # ChromaDB retriever tool
│ │ └── validators.py # SQL validation, output parsing
│ │
│ ├── memory/
│ │ ├── __init__.py
│ │ └── conversation.py # Conversation state + memory management
│ │
│ ├── evaluation/
│ │ ├── __init__.py
│ │ ├── sql_eval.py # SQL accuracy evaluation
│ │ ├── retrieval_eval.py # Retrieval precision/recall metrics
│ │ └── e2e_eval.py # End-to-end system evaluation
│ │
│ └── utils/
│ ├── __init__.py
│ ├── logging_config.py # Structured logging setup
│ ├── callbacks.py # LangChain callbacks for tracing
│ └── prompts.py # All prompt templates (centralized)
│
├── data/
│ ├── ecommerce.db # SQLite e-commerce database (bundled)
│ ├── seed_database.py # Script to create + populate the DB
│ ├── documents/ # Sample unstructured documents
│ │ ├── product_reviews.md
│ │ ├── quarterly_report_q3.md
│ │ ├── return_policy.md
│ │ ├── shipping_guidelines.md
│ │ └── customer_complaints_summary.md
│ └── eval/
│ ├── sql_test_questions.json # Ground truth SQL Q&A pairs
│ ├── retrieval_test.json # Ground truth retrieval Q&A pairs
│ └── hybrid_test.json # Hybrid query test cases
│
├── notebooks/
│ ├── 01_data_exploration.ipynb # Explore DB schema + documents
│ ├── 02_sql_agent_dev.ipynb # Develop & test SQL agent standalone
│ ├── 03_doc_agent_dev.ipynb # Develop & test doc agent standalone
│ ├── 04_langgraph_orchestration.ipynb # Build + visualize the graph
│ ├── 05_evaluation_analysis.ipynb # Run evals, analyze results, charts
│ └── 06_demo_walkthrough.ipynb # End-to-end demo for presentations
│
├── api/
│ ├── __init__.py
│ ├── main.py # FastAPI application
│ ├── schemas.py # Pydantic request/response models
│ └── routes.py # API endpoints
│
├── streamlit_app/
│ └── app.py # Streamlit chat interface
│
├── tests/
│ ├── __init__.py
│ ├── conftest.py # Shared fixtures (DB, vector store)
│ ├── test_router.py
│ ├── test_sql_agent.py
│ ├── test_doc_agent.py
│ ├── test_synthesizer.py
│ ├── test_graph.py
│ └── test_api.py
│
├── docker/
│ ├── Dockerfile
│ └── docker-compose.yml # App + Ollama + ChromaDB
│
└── .github/
└── workflows/
├── test.yml # Run tests on PR
└── lint.yml # Ruff + mypy checks
The project ships with a realistic e-commerce SQLite database for immediate testing:
-- Core tables
CREATE TABLE customers (
customer_id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
email TEXT UNIQUE,
city TEXT,
state TEXT,
signup_date DATE,
customer_segment TEXT -- 'premium', 'regular', 'new'
);
CREATE TABLE products (
product_id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
category TEXT, -- 'Electronics', 'Clothing', 'Home', 'Sports', 'Books'
subcategory TEXT,
price DECIMAL(10,2),
cost DECIMAL(10,2), -- For margin calculations
brand TEXT,
stock_quantity INTEGER,
rating DECIMAL(2,1),
created_at DATE
);
CREATE TABLE orders (
order_id INTEGER PRIMARY KEY,
customer_id INTEGER REFERENCES customers(customer_id),
order_date DATE,
status TEXT, -- 'completed', 'returned', 'cancelled', 'processing'
total_amount DECIMAL(10,2),
discount_amount DECIMAL(10,2),
shipping_city TEXT,
shipping_state TEXT,
payment_method TEXT -- 'credit_card', 'upi', 'cod', 'wallet'
);
CREATE TABLE order_items (
item_id INTEGER PRIMARY KEY,
order_id INTEGER REFERENCES orders(order_id),
product_id INTEGER REFERENCES products(product_id),
quantity INTEGER,
unit_price DECIMAL(10,2),
discount_percent DECIMAL(5,2)
);
CREATE TABLE customer_reviews (
review_id INTEGER PRIMARY KEY,
product_id INTEGER REFERENCES products(product_id),
customer_id INTEGER REFERENCES customers(customer_id),
rating INTEGER CHECK(rating BETWEEN 1 AND 5),
review_text TEXT,
review_date DATE,
sentiment TEXT -- 'positive', 'negative', 'neutral'
);- 500 customers across 15 Indian cities
- 200 products across 5 categories
- 5,000 orders spanning 12 months (2024)
- 12,000 order items
- 2,000 customer reviews with sentiment labels
Purpose: Explore the database schema and document corpus before building agents.
- Connect to SQLite DB, list all tables, inspect schemas
- Run sample queries (revenue by category, top customers, monthly trends)
- Load and preview all markdown documents
- Identify query patterns that need SQL, documents, or both
- Output: Schema diagram, sample query results, document summaries
Purpose: Develop and iterate on the SQL agent in isolation.
- Schema serialization strategy (table descriptions, column metadata, sample rows)
- Prompt engineering for SQL generation (few-shot examples, chain-of-thought)
- SQL validation pipeline using
sqlparse - Error handling: retry with error context on SQL execution failure
- Test against
sql_test_questions.jsonground truth - Latency benchmarking: measure time per query
- Key Decision Points: Why schema-in-prompt vs. RAG-based schema retrieval, handling ambiguous column names
Purpose: Develop the document retrieval agent.
- Document chunking strategy (by section headers, 500 token chunks with 50 token overlap)
- ChromaDB collection creation with
sentence-transformers/all-MiniLM-L6-v2embeddings - Retrieval testing: top-k tuning (k=3 vs k=5 vs k=10)
- Cross-encoder re-ranking using
cross-encoder/ms-marco-MiniLM-L-6-v2 - Citation generation: map retrieved chunks back to source documents
- Test against
retrieval_test.json - Key Decision Points: Why ChromaDB over FAISS here, chunk size trade-offs, when re-ranking helps vs. hurts latency
Purpose: Build the LangGraph state machine that orchestrates all agents.
- Define the
GraphStateTypedDict (query, intent, sql_result, doc_result, final_answer, sources, error) - Build router node with intent classification
- Wire SQL agent, doc agent, and synthesizer as graph nodes
- Add conditional edges based on router output
- Visualize the graph using LangGraph's built-in
.get_graph().draw_mermaid() - Test with example queries from each category (SQL-only, doc-only, hybrid)
- Demonstrate conversation memory across follow-up questions
- Key Decision Points: Why LangGraph over AgentExecutor, state design choices, parallel vs. sequential execution
Purpose: Run comprehensive evaluation and generate metrics.
- SQL accuracy: exact match, execution match, result match
- Retrieval quality: precision@k, recall@k, MRR
- End-to-end: answer correctness scored by LLM-as-judge
- Latency analysis: p50, p95, p99 response times per query type
- Error analysis: categorize failure modes (wrong table, wrong join, empty retrieval, hallucination)
- Generate charts: accuracy by query complexity, latency distribution, error breakdown
- Output: Evaluation report with charts suitable for README and presentations
Purpose: A clean, presentation-ready notebook for live demos.
- 5 carefully chosen queries that showcase each routing path
- Query 1 (SQL-only): "What is the total revenue by product category for Q3 2024?"
- Query 2 (Doc-only): "What is our return policy for electronics?"
- Query 3 (Hybrid): "What are the top 3 products by revenue, and what do customer reviews say about them?"
- Query 4 (Follow-up): "How does that compare to Q2?" (tests memory)
- Query 5 (Edge case): "Show me the sales for AirPods" (product doesn't exist — tests graceful failure)
- Each query shows: routing decision → agent traces → final answer with sources
- Clear markdown cells explaining what's happening at each step
- Python 3.10+
- (Optional) Ollama installed locally for local LLM fallback
- (Optional) Docker for containerized deployment
# Clone the repository
git clone https://github.com/ajaymaurya/mada.git
cd mada
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/Mac
# .venv\Scripts\activate # Windows
# Install dependencies
pip install -e ".[dev]"
# Set up environment variables
cp .env.example .env
# Edit .env with your API keys
# Seed the database with sample data
python data/seed_database.py
# Run the Streamlit app
streamlit run streamlit_app/app.py
# OR run the FastAPI server
uvicorn api.main:app --reload# .env.example
OPENAI_API_KEY=sk-... # Required for GPT-4o-mini
LANGCHAIN_API_KEY=ls-... # Optional: LangSmith tracing
LANGCHAIN_TRACING_V2=true # Optional: Enable LangSmith
LANGCHAIN_PROJECT=mada # Optional: LangSmith project name
OLLAMA_BASE_URL=http://localhost:11434 # Optional: Local LLM fallback
DEFAULT_LLM_PROVIDER=openai # 'openai' or 'ollama'
DEFAULT_MODEL=gpt-4o-mini # Model name
CHROMA_PERSIST_DIR=./data/chroma_db # ChromaDB storage path
DATABASE_URL=sqlite:///./data/ecommerce.db # Database connection
LOG_LEVEL=INFO # Logging levelfrom mada.graph.builder import build_graph
# Build and compile the graph
graph = build_graph()
# Ask a question
result = graph.invoke({
"query": "What are the top 5 products by revenue and what do reviews say about them?",
"chat_history": []
})
print(result["final_answer"])
print(result["sources"]) # List of sources (SQL tables, document names)
print(result["intent"]) # 'sql_only', 'doc_only', or 'hybrid'
print(result["agent_trace"]) # Decision log for observability# Ask a question
POST /api/query
{
"query": "Total revenue by category this quarter",
"session_id": "user-123" # For conversation memory
}
# Get agent trace for last query
GET /api/trace/{session_id}
# Health check
GET /api/healthInteractive chat UI with:
- Real-time agent routing visualization (shows which agent is active)
- SQL query display with syntax highlighting
- Document source citations with expandable chunks
- Conversation history with follow-up support
- Toggle between OpenAI and Ollama
To be populated after running
05_evaluation_analysis.ipynb
| Metric | SQL Agent | Doc Agent | Hybrid (E2E) |
|---|---|---|---|
| Accuracy | -% | -% | -% |
| Avg Latency | -s | -s | -s |
| P95 Latency | -s | -s | -s |
| Route | Precision | Recall |
|---|---|---|
| SQL-only | -% | -% |
| Doc-only | -% | -% |
| Hybrid | -% | -% |
| Decision | Choice | Why |
|---|---|---|
| Orchestration | LangGraph over AgentExecutor | Need conditional routing + state persistence; AgentExecutor is too linear |
| Vector DB | ChromaDB | Lightweight, embedded, sufficient for document corpus of this size. FAISS lacks metadata filtering. |
| Embeddings | all-MiniLM-L6-v2 | Best speed/quality trade-off for retrieval. 384-dim keeps ChromaDB fast. |
| Re-ranker | ms-marco-MiniLM-L-6-v2 | Cross-encoder precision boost on top-k results without major latency hit |
| SQL Validation | sqlparse + dry-run | Catch syntax errors before execution; dry-run catches schema mismatches |
| LLM | GPT-4o-mini (primary) | Best cost/quality for structured output tasks. Ollama fallback for offline. |
| API | FastAPI | Async support, auto-docs, Pydantic integration — fills a resume gap |
| State Design | TypedDict over Pydantic | LangGraph native, simpler serialization, sufficient for this use case |
# Run all tests
pytest tests/ -v
# Run specific test suite
pytest tests/test_sql_agent.py -v
# Run with coverage
pytest tests/ --cov=src/mada --cov-report=html
# Lint
ruff check src/ tests/
mypy src/mada/# Build and run all services
docker-compose -f docker/docker-compose.yml up --build
# Services:
# - mada-api: FastAPI on port 8000
# - mada-ui: Streamlit on port 8501
# - ollama: Local LLM on port 11434 (optional)- Add PostgreSQL support alongside SQLite
- Integrate sql-eval framework for automated SQL benchmarking
- Add streaming responses via FastAPI WebSocket
- Implement caching layer (Redis) for repeated queries
- Add LangSmith evaluation dataset integration
- Support multi-database routing (query across multiple DBs)
- Add authentication + rate limiting for API
- Deploy to GCP Cloud Run
- sql-eval — Open-source Text-to-SQL evaluation framework (by the same author)
MIT License — see LICENSE for details.
Ajay Maurya — AI Engineer