🤖 Multi-Agent Data Analyst (MADA)

A LangGraph-powered multi-agent system that answers complex business questions by intelligently orchestrating SQL generation, document retrieval, and answer synthesis — all through natural language.

🎯 Problem Statement

Business users frequently ask questions that span both structured databases AND unstructured documents. For example:

"What were our top 5 products by revenue last quarter, and what did customer reviews say about each?"

Traditional chatbots handle either SQL OR document search — never both intelligently. MADA solves this by deploying specialized agents that collaborate to deliver unified, sourced answers.

🏗️ Architecture Overview

                        ┌─────────────────┐
                        │   User Query     │
                        └────────┬────────┘
                                 │
                        ┌────────▼────────┐
                        │  ROUTER AGENT    │
                        │ (Intent Classifier│
                        │  + Query Planner) │
                        └──┬─────┬─────┬──┘
                           │     │     │
              ┌────────────┘     │     └────────────┐
              │                  │                   │
     ┌────────▼────────┐ ┌──────▼──────┐  ┌────────▼────────┐
     │   SQL AGENT      │ │  DOC AGENT  │  │ HYBRID PATH     │
     │                  │ │             │  │ (Both Agents     │
     │ • Schema Lookup  │ │ • ChromaDB  │  │  in parallel)    │
     │ • SQL Generation │ │ • Retrieval │  │                  │
     │ • SQL Validation │ │ • Re-ranking│  │                  │
     │ • Execution      │ │ • Citation  │  │                  │
     └────────┬────────┘ └──────┬──────┘  └────────┬────────┘
              │                  │                   │
              └──────────┬───────┘───────────────────┘
                         │
                ┌────────▼────────┐
                │ SYNTHESIZER      │
                │                  │
                │ • Merge Results  │
                │ • Resolve Conflicts│
                │ • Format Response│
                │ • Add Citations  │
                └────────┬────────┘
                         │
                ┌────────▼────────┐
                │  Final Answer    │
                │  (with sources)  │
                └─────────────────┘

🔑 Key Features

Intelligent Query Routing: Automatically classifies queries as SQL-only, document-only, or hybrid using LLM-based intent detection.
SQL Agent with Guardrails: Generates, validates (via sqlparse), and executes SQL with automatic retry on failure. Supports schema-aware generation.
Document Agent with Re-ranking: Retrieves from ChromaDB vector store with cross-encoder re-ranking for precision.
Conflict Resolution: When SQL data and documents disagree, the synthesizer flags discrepancies with source attribution.
Conversation Memory: Maintains context across follow-up questions using LangGraph's built-in state management.
Observability: Full agent decision traces via LangSmith integration + structured JSON logging.
Evaluation Suite: Built-in accuracy measurement for SQL generation and retrieval quality.
Dual LLM Support: OpenAI GPT-4o-mini as primary, Ollama (local) as fallback for cost-conscious or offline usage.

📁 Project Structure

mada/
├── README.md
├── PROJECT_SPEC.md                  # Detailed implementation spec
├── pyproject.toml                   # Package config + dependencies
├── .env.example                     # Environment variable template
├── Makefile                         # Common commands (setup, test, run)
│
├── src/
│   └── mada/
│       ├── __init__.py
│       ├── config.py                # Settings, env vars, model configs
│       │
│       ├── agents/
│       │   ├── __init__.py
│       │   ├── router.py            # Intent classification + query planning
│       │   ├── sql_agent.py         # Text-to-SQL generation + execution
│       │   ├── doc_agent.py         # Document retrieval + re-ranking
│       │   └── synthesizer.py       # Result merging + conflict resolution
│       │
│       ├── graph/
│       │   ├── __init__.py
│       │   ├── state.py             # LangGraph state schema definition
│       │   ├── nodes.py             # Graph node functions
│       │   └── builder.py           # Graph construction + compilation
│       │
│       ├── tools/
│       │   ├── __init__.py
│       │   ├── sql_tool.py          # SQL execution tool (LangChain Tool)
│       │   ├── retriever_tool.py    # ChromaDB retriever tool
│       │   └── validators.py        # SQL validation, output parsing
│       │
│       ├── memory/
│       │   ├── __init__.py
│       │   └── conversation.py      # Conversation state + memory management
│       │
│       ├── evaluation/
│       │   ├── __init__.py
│       │   ├── sql_eval.py          # SQL accuracy evaluation
│       │   ├── retrieval_eval.py    # Retrieval precision/recall metrics
│       │   └── e2e_eval.py          # End-to-end system evaluation
│       │
│       └── utils/
│           ├── __init__.py
│           ├── logging_config.py    # Structured logging setup
│           ├── callbacks.py         # LangChain callbacks for tracing
│           └── prompts.py           # All prompt templates (centralized)
│
├── data/
│   ├── ecommerce.db                 # SQLite e-commerce database (bundled)
│   ├── seed_database.py             # Script to create + populate the DB
│   ├── documents/                   # Sample unstructured documents
│   │   ├── product_reviews.md
│   │   ├── quarterly_report_q3.md
│   │   ├── return_policy.md
│   │   ├── shipping_guidelines.md
│   │   └── customer_complaints_summary.md
│   └── eval/
│       ├── sql_test_questions.json  # Ground truth SQL Q&A pairs
│       ├── retrieval_test.json      # Ground truth retrieval Q&A pairs
│       └── hybrid_test.json         # Hybrid query test cases
│
├── notebooks/
│   ├── 01_data_exploration.ipynb    # Explore DB schema + documents
│   ├── 02_sql_agent_dev.ipynb       # Develop & test SQL agent standalone
│   ├── 03_doc_agent_dev.ipynb       # Develop & test doc agent standalone
│   ├── 04_langgraph_orchestration.ipynb  # Build + visualize the graph
│   ├── 05_evaluation_analysis.ipynb # Run evals, analyze results, charts
│   └── 06_demo_walkthrough.ipynb    # End-to-end demo for presentations
│
├── api/
│   ├── __init__.py
│   ├── main.py                      # FastAPI application
│   ├── schemas.py                   # Pydantic request/response models
│   └── routes.py                    # API endpoints
│
├── streamlit_app/
│   └── app.py                       # Streamlit chat interface
│
├── tests/
│   ├── __init__.py
│   ├── conftest.py                  # Shared fixtures (DB, vector store)
│   ├── test_router.py
│   ├── test_sql_agent.py
│   ├── test_doc_agent.py
│   ├── test_synthesizer.py
│   ├── test_graph.py
│   └── test_api.py
│
├── docker/
│   ├── Dockerfile
│   └── docker-compose.yml           # App + Ollama + ChromaDB
│
└── .github/
    └── workflows/
        ├── test.yml                 # Run tests on PR
        └── lint.yml                 # Ruff + mypy checks

📊 Bundled E-Commerce Dataset

The project ships with a realistic e-commerce SQLite database for immediate testing:

Database Schema

-- Core tables
CREATE TABLE customers (
    customer_id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    email TEXT UNIQUE,
    city TEXT,
    state TEXT,
    signup_date DATE,
    customer_segment TEXT  -- 'premium', 'regular', 'new'
);

CREATE TABLE products (
    product_id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    category TEXT,          -- 'Electronics', 'Clothing', 'Home', 'Sports', 'Books'
    subcategory TEXT,
    price DECIMAL(10,2),
    cost DECIMAL(10,2),     -- For margin calculations
    brand TEXT,
    stock_quantity INTEGER,
    rating DECIMAL(2,1),
    created_at DATE
);

CREATE TABLE orders (
    order_id INTEGER PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(customer_id),
    order_date DATE,
    status TEXT,            -- 'completed', 'returned', 'cancelled', 'processing'
    total_amount DECIMAL(10,2),
    discount_amount DECIMAL(10,2),
    shipping_city TEXT,
    shipping_state TEXT,
    payment_method TEXT     -- 'credit_card', 'upi', 'cod', 'wallet'
);

CREATE TABLE order_items (
    item_id INTEGER PRIMARY KEY,
    order_id INTEGER REFERENCES orders(order_id),
    product_id INTEGER REFERENCES products(product_id),
    quantity INTEGER,
    unit_price DECIMAL(10,2),
    discount_percent DECIMAL(5,2)
);

CREATE TABLE customer_reviews (
    review_id INTEGER PRIMARY KEY,
    product_id INTEGER REFERENCES products(product_id),
    customer_id INTEGER REFERENCES customers(customer_id),
    rating INTEGER CHECK(rating BETWEEN 1 AND 5),
    review_text TEXT,
    review_date DATE,
    sentiment TEXT          -- 'positive', 'negative', 'neutral'
);

Data Volume

500 customers across 15 Indian cities
200 products across 5 categories
5,000 orders spanning 12 months (2024)
12,000 order items
2,000 customer reviews with sentiment labels

📓 Notebook Descriptions

`01_data_exploration.ipynb` — Know Your Data

Purpose: Explore the database schema and document corpus before building agents.

Connect to SQLite DB, list all tables, inspect schemas
Run sample queries (revenue by category, top customers, monthly trends)
Load and preview all markdown documents
Identify query patterns that need SQL, documents, or both
Output: Schema diagram, sample query results, document summaries

`02_sql_agent_dev.ipynb` — Build the SQL Brain

Purpose: Develop and iterate on the SQL agent in isolation.

Schema serialization strategy (table descriptions, column metadata, sample rows)
Prompt engineering for SQL generation (few-shot examples, chain-of-thought)
SQL validation pipeline using sqlparse
Error handling: retry with error context on SQL execution failure
Test against sql_test_questions.json ground truth
Latency benchmarking: measure time per query
Key Decision Points: Why schema-in-prompt vs. RAG-based schema retrieval, handling ambiguous column names

`03_doc_agent_dev.ipynb` — Build the Knowledge Brain

Purpose: Develop the document retrieval agent.

Document chunking strategy (by section headers, 500 token chunks with 50 token overlap)
ChromaDB collection creation with sentence-transformers/all-MiniLM-L6-v2 embeddings
Retrieval testing: top-k tuning (k=3 vs k=5 vs k=10)
Cross-encoder re-ranking using cross-encoder/ms-marco-MiniLM-L-6-v2
Citation generation: map retrieved chunks back to source documents
Test against retrieval_test.json
Key Decision Points: Why ChromaDB over FAISS here, chunk size trade-offs, when re-ranking helps vs. hurts latency

`04_langgraph_orchestration.ipynb` — Wire It All Together

Purpose: Build the LangGraph state machine that orchestrates all agents.

Define the GraphState TypedDict (query, intent, sql_result, doc_result, final_answer, sources, error)
Build router node with intent classification
Wire SQL agent, doc agent, and synthesizer as graph nodes
Add conditional edges based on router output
Visualize the graph using LangGraph's built-in .get_graph().draw_mermaid()
Test with example queries from each category (SQL-only, doc-only, hybrid)
Demonstrate conversation memory across follow-up questions
Key Decision Points: Why LangGraph over AgentExecutor, state design choices, parallel vs. sequential execution

`05_evaluation_analysis.ipynb` — Prove It Works

Purpose: Run comprehensive evaluation and generate metrics.

SQL accuracy: exact match, execution match, result match
Retrieval quality: precision@k, recall@k, MRR
End-to-end: answer correctness scored by LLM-as-judge
Latency analysis: p50, p95, p99 response times per query type
Error analysis: categorize failure modes (wrong table, wrong join, empty retrieval, hallucination)
Generate charts: accuracy by query complexity, latency distribution, error breakdown
Output: Evaluation report with charts suitable for README and presentations

`06_demo_walkthrough.ipynb` — The Interview Demo

Purpose: A clean, presentation-ready notebook for live demos.

5 carefully chosen queries that showcase each routing path
Query 1 (SQL-only): "What is the total revenue by product category for Q3 2024?"
Query 2 (Doc-only): "What is our return policy for electronics?"
Query 3 (Hybrid): "What are the top 3 products by revenue, and what do customer reviews say about them?"
Query 4 (Follow-up): "How does that compare to Q2?" (tests memory)
Query 5 (Edge case): "Show me the sales for AirPods" (product doesn't exist — tests graceful failure)
Each query shows: routing decision → agent traces → final answer with sources
Clear markdown cells explaining what's happening at each step

🔧 Setup & Installation

Prerequisites

Python 3.10+
(Optional) Ollama installed locally for local LLM fallback
(Optional) Docker for containerized deployment

Quick Start

# Clone the repository
git clone https://github.com/ajaymaurya/mada.git
cd mada

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Linux/Mac
# .venv\Scripts\activate   # Windows

# Install dependencies
pip install -e ".[dev]"

# Set up environment variables
cp .env.example .env
# Edit .env with your API keys

# Seed the database with sample data
python data/seed_database.py

# Run the Streamlit app
streamlit run streamlit_app/app.py

# OR run the FastAPI server
uvicorn api.main:app --reload

Environment Variables

# .env.example
OPENAI_API_KEY=sk-...                    # Required for GPT-4o-mini
LANGCHAIN_API_KEY=ls-...                 # Optional: LangSmith tracing
LANGCHAIN_TRACING_V2=true               # Optional: Enable LangSmith
LANGCHAIN_PROJECT=mada                   # Optional: LangSmith project name
OLLAMA_BASE_URL=http://localhost:11434   # Optional: Local LLM fallback
DEFAULT_LLM_PROVIDER=openai             # 'openai' or 'ollama'
DEFAULT_MODEL=gpt-4o-mini               # Model name
CHROMA_PERSIST_DIR=./data/chroma_db     # ChromaDB storage path
DATABASE_URL=sqlite:///./data/ecommerce.db  # Database connection
LOG_LEVEL=INFO                           # Logging level

🚀 Usage

Python SDK

from mada.graph.builder import build_graph

# Build and compile the graph
graph = build_graph()

# Ask a question
result = graph.invoke({
    "query": "What are the top 5 products by revenue and what do reviews say about them?",
    "chat_history": []
})

print(result["final_answer"])
print(result["sources"])       # List of sources (SQL tables, document names)
print(result["intent"])        # 'sql_only', 'doc_only', or 'hybrid'
print(result["agent_trace"])   # Decision log for observability

FastAPI Endpoints

# Ask a question
POST /api/query
{
    "query": "Total revenue by category this quarter",
    "session_id": "user-123"   # For conversation memory
}

# Get agent trace for last query
GET /api/trace/{session_id}

# Health check
GET /api/health

Streamlit Chat Interface

Interactive chat UI with:

Real-time agent routing visualization (shows which agent is active)
SQL query display with syntax highlighting
Document source citations with expandable chunks
Conversation history with follow-up support
Toggle between OpenAI and Ollama

📈 Evaluation Results

To be populated after running 05_evaluation_analysis.ipynb

Metric	SQL Agent	Doc Agent	Hybrid (E2E)
Accuracy	-%	-%	-%
Avg Latency	-s	-s	-s
P95 Latency	-s	-s	-s

Query Routing Accuracy

Route	Precision	Recall
SQL-only	-%	-%
Doc-only	-%	-%
Hybrid	-%	-%

🧠 Key Technical Decisions

Decision	Choice	Why
Orchestration	LangGraph over AgentExecutor	Need conditional routing + state persistence; AgentExecutor is too linear
Vector DB	ChromaDB	Lightweight, embedded, sufficient for document corpus of this size. FAISS lacks metadata filtering.
Embeddings	all-MiniLM-L6-v2	Best speed/quality trade-off for retrieval. 384-dim keeps ChromaDB fast.
Re-ranker	ms-marco-MiniLM-L-6-v2	Cross-encoder precision boost on top-k results without major latency hit
SQL Validation	sqlparse + dry-run	Catch syntax errors before execution; dry-run catches schema mismatches
LLM	GPT-4o-mini (primary)	Best cost/quality for structured output tasks. Ollama fallback for offline.
API	FastAPI	Async support, auto-docs, Pydantic integration — fills a resume gap
State Design	TypedDict over Pydantic	LangGraph native, simpler serialization, sufficient for this use case

🧪 Testing

# Run all tests
pytest tests/ -v

# Run specific test suite
pytest tests/test_sql_agent.py -v

# Run with coverage
pytest tests/ --cov=src/mada --cov-report=html

# Lint
ruff check src/ tests/
mypy src/mada/

🐳 Docker

# Build and run all services
docker-compose -f docker/docker-compose.yml up --build

# Services:
# - mada-api:     FastAPI on port 8000
# - mada-ui:      Streamlit on port 8501
# - ollama:       Local LLM on port 11434 (optional)

🗺️ Roadmap

Add PostgreSQL support alongside SQLite
Integrate sql-eval framework for automated SQL benchmarking
Add streaming responses via FastAPI WebSocket
Implement caching layer (Redis) for repeated queries
Add LangSmith evaluation dataset integration
Support multi-database routing (query across multiple DBs)
Add authentication + rate limiting for API
Deploy to GCP Cloud Run

📚 Related Projects

sql-eval — Open-source Text-to-SQL evaluation framework (by the same author)

📄 License

MIT License — see LICENSE for details.

👤 Author

Ajay Maurya — AI Engineer

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.claude		.claude
.github/workflows		.github/workflows
api		api
data		data
docker		docker
evaluation		evaluation
src		src
streamlit_app		streamlit_app
tests		tests
.gitignore		.gitignore
Makefile		Makefile
PROJECT_SPEC.md		PROJECT_SPEC.md
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

🤖 Multi-Agent Data Analyst (MADA)

🎯 Problem Statement

🏗️ Architecture Overview

🔑 Key Features

📁 Project Structure

📊 Bundled E-Commerce Dataset

Database Schema

Data Volume

📓 Notebook Descriptions

01_data_exploration.ipynb — Know Your Data

02_sql_agent_dev.ipynb — Build the SQL Brain

03_doc_agent_dev.ipynb — Build the Knowledge Brain

04_langgraph_orchestration.ipynb — Wire It All Together

05_evaluation_analysis.ipynb — Prove It Works

06_demo_walkthrough.ipynb — The Interview Demo

🔧 Setup & Installation

Prerequisites

Quick Start

Environment Variables

🚀 Usage

Python SDK

FastAPI Endpoints

Streamlit Chat Interface

📈 Evaluation Results

Query Routing Accuracy

🧠 Key Technical Decisions

🧪 Testing

🐳 Docker

🗺️ Roadmap

📚 Related Projects

📄 License

👤 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`01_data_exploration.ipynb` — Know Your Data

`02_sql_agent_dev.ipynb` — Build the SQL Brain

`03_doc_agent_dev.ipynb` — Build the Knowledge Brain

`04_langgraph_orchestration.ipynb` — Wire It All Together

`05_evaluation_analysis.ipynb` — Prove It Works

`06_demo_walkthrough.ipynb` — The Interview Demo

Packages