Skip to content

ajaymauryabbn/multi-agent-data-analyst

Repository files navigation

🤖 Multi-Agent Data Analyst (MADA)

A LangGraph-powered multi-agent system that answers complex business questions by intelligently orchestrating SQL generation, document retrieval, and answer synthesis — all through natural language.

Python 3.10+ LangChain LangGraph License: MIT


🎯 Problem Statement

Business users frequently ask questions that span both structured databases AND unstructured documents. For example:

"What were our top 5 products by revenue last quarter, and what did customer reviews say about each?"

Traditional chatbots handle either SQL OR document search — never both intelligently. MADA solves this by deploying specialized agents that collaborate to deliver unified, sourced answers.


🏗️ Architecture Overview

                        ┌─────────────────┐
                        │   User Query     │
                        └────────┬────────┘
                                 │
                        ┌────────▼────────┐
                        │  ROUTER AGENT    │
                        │ (Intent Classifier│
                        │  + Query Planner) │
                        └──┬─────┬─────┬──┘
                           │     │     │
              ┌────────────┘     │     └────────────┐
              │                  │                   │
     ┌────────▼────────┐ ┌──────▼──────┐  ┌────────▼────────┐
     │   SQL AGENT      │ │  DOC AGENT  │  │ HYBRID PATH     │
     │                  │ │             │  │ (Both Agents     │
     │ • Schema Lookup  │ │ • ChromaDB  │  │  in parallel)    │
     │ • SQL Generation │ │ • Retrieval │  │                  │
     │ • SQL Validation │ │ • Re-ranking│  │                  │
     │ • Execution      │ │ • Citation  │  │                  │
     └────────┬────────┘ └──────┬──────┘  └────────┬────────┘
              │                  │                   │
              └──────────┬───────┘───────────────────┘
                         │
                ┌────────▼────────┐
                │ SYNTHESIZER      │
                │                  │
                │ • Merge Results  │
                │ • Resolve Conflicts│
                │ • Format Response│
                │ • Add Citations  │
                └────────┬────────┘
                         │
                ┌────────▼────────┐
                │  Final Answer    │
                │  (with sources)  │
                └─────────────────┘

🔑 Key Features

  • Intelligent Query Routing: Automatically classifies queries as SQL-only, document-only, or hybrid using LLM-based intent detection.
  • SQL Agent with Guardrails: Generates, validates (via sqlparse), and executes SQL with automatic retry on failure. Supports schema-aware generation.
  • Document Agent with Re-ranking: Retrieves from ChromaDB vector store with cross-encoder re-ranking for precision.
  • Conflict Resolution: When SQL data and documents disagree, the synthesizer flags discrepancies with source attribution.
  • Conversation Memory: Maintains context across follow-up questions using LangGraph's built-in state management.
  • Observability: Full agent decision traces via LangSmith integration + structured JSON logging.
  • Evaluation Suite: Built-in accuracy measurement for SQL generation and retrieval quality.
  • Dual LLM Support: OpenAI GPT-4o-mini as primary, Ollama (local) as fallback for cost-conscious or offline usage.

📁 Project Structure

mada/
├── README.md
├── PROJECT_SPEC.md                  # Detailed implementation spec
├── pyproject.toml                   # Package config + dependencies
├── .env.example                     # Environment variable template
├── Makefile                         # Common commands (setup, test, run)
│
├── src/
│   └── mada/
│       ├── __init__.py
│       ├── config.py                # Settings, env vars, model configs
│       │
│       ├── agents/
│       │   ├── __init__.py
│       │   ├── router.py            # Intent classification + query planning
│       │   ├── sql_agent.py         # Text-to-SQL generation + execution
│       │   ├── doc_agent.py         # Document retrieval + re-ranking
│       │   └── synthesizer.py       # Result merging + conflict resolution
│       │
│       ├── graph/
│       │   ├── __init__.py
│       │   ├── state.py             # LangGraph state schema definition
│       │   ├── nodes.py             # Graph node functions
│       │   └── builder.py           # Graph construction + compilation
│       │
│       ├── tools/
│       │   ├── __init__.py
│       │   ├── sql_tool.py          # SQL execution tool (LangChain Tool)
│       │   ├── retriever_tool.py    # ChromaDB retriever tool
│       │   └── validators.py        # SQL validation, output parsing
│       │
│       ├── memory/
│       │   ├── __init__.py
│       │   └── conversation.py      # Conversation state + memory management
│       │
│       ├── evaluation/
│       │   ├── __init__.py
│       │   ├── sql_eval.py          # SQL accuracy evaluation
│       │   ├── retrieval_eval.py    # Retrieval precision/recall metrics
│       │   └── e2e_eval.py          # End-to-end system evaluation
│       │
│       └── utils/
│           ├── __init__.py
│           ├── logging_config.py    # Structured logging setup
│           ├── callbacks.py         # LangChain callbacks for tracing
│           └── prompts.py           # All prompt templates (centralized)
│
├── data/
│   ├── ecommerce.db                 # SQLite e-commerce database (bundled)
│   ├── seed_database.py             # Script to create + populate the DB
│   ├── documents/                   # Sample unstructured documents
│   │   ├── product_reviews.md
│   │   ├── quarterly_report_q3.md
│   │   ├── return_policy.md
│   │   ├── shipping_guidelines.md
│   │   └── customer_complaints_summary.md
│   └── eval/
│       ├── sql_test_questions.json  # Ground truth SQL Q&A pairs
│       ├── retrieval_test.json      # Ground truth retrieval Q&A pairs
│       └── hybrid_test.json         # Hybrid query test cases
│
├── notebooks/
│   ├── 01_data_exploration.ipynb    # Explore DB schema + documents
│   ├── 02_sql_agent_dev.ipynb       # Develop & test SQL agent standalone
│   ├── 03_doc_agent_dev.ipynb       # Develop & test doc agent standalone
│   ├── 04_langgraph_orchestration.ipynb  # Build + visualize the graph
│   ├── 05_evaluation_analysis.ipynb # Run evals, analyze results, charts
│   └── 06_demo_walkthrough.ipynb    # End-to-end demo for presentations
│
├── api/
│   ├── __init__.py
│   ├── main.py                      # FastAPI application
│   ├── schemas.py                   # Pydantic request/response models
│   └── routes.py                    # API endpoints
│
├── streamlit_app/
│   └── app.py                       # Streamlit chat interface
│
├── tests/
│   ├── __init__.py
│   ├── conftest.py                  # Shared fixtures (DB, vector store)
│   ├── test_router.py
│   ├── test_sql_agent.py
│   ├── test_doc_agent.py
│   ├── test_synthesizer.py
│   ├── test_graph.py
│   └── test_api.py
│
├── docker/
│   ├── Dockerfile
│   └── docker-compose.yml           # App + Ollama + ChromaDB
│
└── .github/
    └── workflows/
        ├── test.yml                 # Run tests on PR
        └── lint.yml                 # Ruff + mypy checks

📊 Bundled E-Commerce Dataset

The project ships with a realistic e-commerce SQLite database for immediate testing:

Database Schema

-- Core tables
CREATE TABLE customers (
    customer_id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    email TEXT UNIQUE,
    city TEXT,
    state TEXT,
    signup_date DATE,
    customer_segment TEXT  -- 'premium', 'regular', 'new'
);

CREATE TABLE products (
    product_id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    category TEXT,          -- 'Electronics', 'Clothing', 'Home', 'Sports', 'Books'
    subcategory TEXT,
    price DECIMAL(10,2),
    cost DECIMAL(10,2),     -- For margin calculations
    brand TEXT,
    stock_quantity INTEGER,
    rating DECIMAL(2,1),
    created_at DATE
);

CREATE TABLE orders (
    order_id INTEGER PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(customer_id),
    order_date DATE,
    status TEXT,            -- 'completed', 'returned', 'cancelled', 'processing'
    total_amount DECIMAL(10,2),
    discount_amount DECIMAL(10,2),
    shipping_city TEXT,
    shipping_state TEXT,
    payment_method TEXT     -- 'credit_card', 'upi', 'cod', 'wallet'
);

CREATE TABLE order_items (
    item_id INTEGER PRIMARY KEY,
    order_id INTEGER REFERENCES orders(order_id),
    product_id INTEGER REFERENCES products(product_id),
    quantity INTEGER,
    unit_price DECIMAL(10,2),
    discount_percent DECIMAL(5,2)
);

CREATE TABLE customer_reviews (
    review_id INTEGER PRIMARY KEY,
    product_id INTEGER REFERENCES products(product_id),
    customer_id INTEGER REFERENCES customers(customer_id),
    rating INTEGER CHECK(rating BETWEEN 1 AND 5),
    review_text TEXT,
    review_date DATE,
    sentiment TEXT          -- 'positive', 'negative', 'neutral'
);

Data Volume

  • 500 customers across 15 Indian cities
  • 200 products across 5 categories
  • 5,000 orders spanning 12 months (2024)
  • 12,000 order items
  • 2,000 customer reviews with sentiment labels

📓 Notebook Descriptions

01_data_exploration.ipynb — Know Your Data

Purpose: Explore the database schema and document corpus before building agents.

  • Connect to SQLite DB, list all tables, inspect schemas
  • Run sample queries (revenue by category, top customers, monthly trends)
  • Load and preview all markdown documents
  • Identify query patterns that need SQL, documents, or both
  • Output: Schema diagram, sample query results, document summaries

02_sql_agent_dev.ipynb — Build the SQL Brain

Purpose: Develop and iterate on the SQL agent in isolation.

  • Schema serialization strategy (table descriptions, column metadata, sample rows)
  • Prompt engineering for SQL generation (few-shot examples, chain-of-thought)
  • SQL validation pipeline using sqlparse
  • Error handling: retry with error context on SQL execution failure
  • Test against sql_test_questions.json ground truth
  • Latency benchmarking: measure time per query
  • Key Decision Points: Why schema-in-prompt vs. RAG-based schema retrieval, handling ambiguous column names

03_doc_agent_dev.ipynb — Build the Knowledge Brain

Purpose: Develop the document retrieval agent.

  • Document chunking strategy (by section headers, 500 token chunks with 50 token overlap)
  • ChromaDB collection creation with sentence-transformers/all-MiniLM-L6-v2 embeddings
  • Retrieval testing: top-k tuning (k=3 vs k=5 vs k=10)
  • Cross-encoder re-ranking using cross-encoder/ms-marco-MiniLM-L-6-v2
  • Citation generation: map retrieved chunks back to source documents
  • Test against retrieval_test.json
  • Key Decision Points: Why ChromaDB over FAISS here, chunk size trade-offs, when re-ranking helps vs. hurts latency

04_langgraph_orchestration.ipynb — Wire It All Together

Purpose: Build the LangGraph state machine that orchestrates all agents.

  • Define the GraphState TypedDict (query, intent, sql_result, doc_result, final_answer, sources, error)
  • Build router node with intent classification
  • Wire SQL agent, doc agent, and synthesizer as graph nodes
  • Add conditional edges based on router output
  • Visualize the graph using LangGraph's built-in .get_graph().draw_mermaid()
  • Test with example queries from each category (SQL-only, doc-only, hybrid)
  • Demonstrate conversation memory across follow-up questions
  • Key Decision Points: Why LangGraph over AgentExecutor, state design choices, parallel vs. sequential execution

05_evaluation_analysis.ipynb — Prove It Works

Purpose: Run comprehensive evaluation and generate metrics.

  • SQL accuracy: exact match, execution match, result match
  • Retrieval quality: precision@k, recall@k, MRR
  • End-to-end: answer correctness scored by LLM-as-judge
  • Latency analysis: p50, p95, p99 response times per query type
  • Error analysis: categorize failure modes (wrong table, wrong join, empty retrieval, hallucination)
  • Generate charts: accuracy by query complexity, latency distribution, error breakdown
  • Output: Evaluation report with charts suitable for README and presentations

06_demo_walkthrough.ipynb — The Interview Demo

Purpose: A clean, presentation-ready notebook for live demos.

  • 5 carefully chosen queries that showcase each routing path
  • Query 1 (SQL-only): "What is the total revenue by product category for Q3 2024?"
  • Query 2 (Doc-only): "What is our return policy for electronics?"
  • Query 3 (Hybrid): "What are the top 3 products by revenue, and what do customer reviews say about them?"
  • Query 4 (Follow-up): "How does that compare to Q2?" (tests memory)
  • Query 5 (Edge case): "Show me the sales for AirPods" (product doesn't exist — tests graceful failure)
  • Each query shows: routing decision → agent traces → final answer with sources
  • Clear markdown cells explaining what's happening at each step

🔧 Setup & Installation

Prerequisites

  • Python 3.10+
  • (Optional) Ollama installed locally for local LLM fallback
  • (Optional) Docker for containerized deployment

Quick Start

# Clone the repository
git clone https://github.com/ajaymaurya/mada.git
cd mada

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Linux/Mac
# .venv\Scripts\activate   # Windows

# Install dependencies
pip install -e ".[dev]"

# Set up environment variables
cp .env.example .env
# Edit .env with your API keys

# Seed the database with sample data
python data/seed_database.py

# Run the Streamlit app
streamlit run streamlit_app/app.py

# OR run the FastAPI server
uvicorn api.main:app --reload

Environment Variables

# .env.example
OPENAI_API_KEY=sk-...                    # Required for GPT-4o-mini
LANGCHAIN_API_KEY=ls-...                 # Optional: LangSmith tracing
LANGCHAIN_TRACING_V2=true               # Optional: Enable LangSmith
LANGCHAIN_PROJECT=mada                   # Optional: LangSmith project name
OLLAMA_BASE_URL=http://localhost:11434   # Optional: Local LLM fallback
DEFAULT_LLM_PROVIDER=openai             # 'openai' or 'ollama'
DEFAULT_MODEL=gpt-4o-mini               # Model name
CHROMA_PERSIST_DIR=./data/chroma_db     # ChromaDB storage path
DATABASE_URL=sqlite:///./data/ecommerce.db  # Database connection
LOG_LEVEL=INFO                           # Logging level

🚀 Usage

Python SDK

from mada.graph.builder import build_graph

# Build and compile the graph
graph = build_graph()

# Ask a question
result = graph.invoke({
    "query": "What are the top 5 products by revenue and what do reviews say about them?",
    "chat_history": []
})

print(result["final_answer"])
print(result["sources"])       # List of sources (SQL tables, document names)
print(result["intent"])        # 'sql_only', 'doc_only', or 'hybrid'
print(result["agent_trace"])   # Decision log for observability

FastAPI Endpoints

# Ask a question
POST /api/query
{
    "query": "Total revenue by category this quarter",
    "session_id": "user-123"   # For conversation memory
}

# Get agent trace for last query
GET /api/trace/{session_id}

# Health check
GET /api/health

Streamlit Chat Interface

Interactive chat UI with:

  • Real-time agent routing visualization (shows which agent is active)
  • SQL query display with syntax highlighting
  • Document source citations with expandable chunks
  • Conversation history with follow-up support
  • Toggle between OpenAI and Ollama

📈 Evaluation Results

To be populated after running 05_evaluation_analysis.ipynb

Metric SQL Agent Doc Agent Hybrid (E2E)
Accuracy -% -% -%
Avg Latency -s -s -s
P95 Latency -s -s -s

Query Routing Accuracy

Route Precision Recall
SQL-only -% -%
Doc-only -% -%
Hybrid -% -%

🧠 Key Technical Decisions

Decision Choice Why
Orchestration LangGraph over AgentExecutor Need conditional routing + state persistence; AgentExecutor is too linear
Vector DB ChromaDB Lightweight, embedded, sufficient for document corpus of this size. FAISS lacks metadata filtering.
Embeddings all-MiniLM-L6-v2 Best speed/quality trade-off for retrieval. 384-dim keeps ChromaDB fast.
Re-ranker ms-marco-MiniLM-L-6-v2 Cross-encoder precision boost on top-k results without major latency hit
SQL Validation sqlparse + dry-run Catch syntax errors before execution; dry-run catches schema mismatches
LLM GPT-4o-mini (primary) Best cost/quality for structured output tasks. Ollama fallback for offline.
API FastAPI Async support, auto-docs, Pydantic integration — fills a resume gap
State Design TypedDict over Pydantic LangGraph native, simpler serialization, sufficient for this use case

🧪 Testing

# Run all tests
pytest tests/ -v

# Run specific test suite
pytest tests/test_sql_agent.py -v

# Run with coverage
pytest tests/ --cov=src/mada --cov-report=html

# Lint
ruff check src/ tests/
mypy src/mada/

🐳 Docker

# Build and run all services
docker-compose -f docker/docker-compose.yml up --build

# Services:
# - mada-api:     FastAPI on port 8000
# - mada-ui:      Streamlit on port 8501
# - ollama:       Local LLM on port 11434 (optional)

🗺️ Roadmap

  • Add PostgreSQL support alongside SQLite
  • Integrate sql-eval framework for automated SQL benchmarking
  • Add streaming responses via FastAPI WebSocket
  • Implement caching layer (Redis) for repeated queries
  • Add LangSmith evaluation dataset integration
  • Support multi-database routing (query across multiple DBs)
  • Add authentication + rate limiting for API
  • Deploy to GCP Cloud Run

📚 Related Projects

  • sql-eval — Open-source Text-to-SQL evaluation framework (by the same author)

📄 License

MIT License — see LICENSE for details.


👤 Author

Ajay Maurya — AI Engineer

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages