A LangGraph-based conversational agent for automating insurance claim processing.
- Policy Search: Vector + BM25 hybrid search over policy documents
- Damage Assessment: CLIP embeddings + BLIP-2 captions for damage photo analysis
- Claim Processing: Structured claim drafting with memory persistence across conversation turns. The memory node performs entity extraction and slot-filling across conversation turns. Single-turn benchmark performance is unaffected; multi-turn evaluation is planned for the next eval cycle.
- Multi-Modal: Handle text queries, images, and claim status lookups
- Evaluation Framework: MLflow + pandas for variant testing and metrics
- Python 3.11+
- Ollama (for local LLM inference)
- Java (for Spark-based evaluation)
# Clone repo
git clone https://github.com/tyreecepaul/insurance-agent.git
cd insurance-agent
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # on Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt # for testing# Start Ollama first
ollama serve
# In another terminal, run the agent
python src/agent.py
# Or in Python:
python main.py# Unit tests only (fast, <10s)
pytest tests/unit -m unit -v
# Integration tests (comprehensive, ~30s)
pytest tests/integration -m integration -v
# All tests with coverage
pytest tests/ --cov=src --cov-report=html
open htmlcov/index.html- Testing Guide — Test structure, fixtures, best practices
- Agent Architecture — Graph design and node logic (if available)
- Configuration — Model names, API endpoints, paths
Tests run automatically on:
- Every push to
mainordevelop - Every PR to these branches
- View results: Actions tab
├── src/
│ ├── agent.py # LangGraph agent with memory/router/retrieval/generator nodes
│ ├── ingest.py # ChromaDB indexing for policies/damage/claims
│ ├── eval.py # MLflow evaluation harness with variant configs
│ ├── tools.py # Hybrid search, retrieval functions
│ └── test_tools.py # Utilities for testing and debugging
├── tests/
│ ├── unit/ # Fast unit tests (all mocked)
│ ├── integration/ # E2E tests (mocked external services)
│ ├── fixtures/ # Test data
│ └── conftest.py # Pytest fixtures and configuration
├── data/
│ ├── policy_docs/ # PDF files to index
│ ├── damage_photos/ # Images for CLIP embedding
│ └── claims_data/ # JSON claims records
├── chroma_db/ # Persistent vector database
├── config.json # Model configs and API endpoints
├── requirements.txt # Production dependencies
├── requirements-dev.txt # Development and test dependencies
└── main.py # CLI entry point
- Create test in
tests/unit/ortests/integration/ - Use fixtures from
conftest.py - Mark with
@pytest.mark.unitor@pytest.mark.integration - Run:
pytest tests/path/to/test.py -v
See Testing Guide for examples.
Format code before commit:
# Optional linting
flake8 src/ tests/All tests must pass before merging (enforced by CI/CD).
Insurance customers filing claims must navigate dense policy documents, describe damage accurately, and track claim status — all while under stress. Mistakes at any step produce rejected or delayed claims, repeat contacts with the insurer, and increased handling cost. This agent addresses all three friction points in a single conversational interface.
The evaluation harness tests four system variants from a plain LLM baseline (A1)
to the full agent (A4) across 11 benchmark queries in four families. Run
python src/eval.py && mlflow ui to reproduce.
| Variant | System | Avg judge score (1–5) | Avg recall@5 |
|---|---|---|---|
| A1 | LLM only, no retrieval | run eval to populate | run eval |
| A2 | Text retrieval, no image index | run eval to populate | run eval |
| A3 | All indices, no query router | run eval to populate | run eval |
| A4 | Full system (router + all indices) | run eval to populate | run eval |
Translating metrics to outcomes:
-
Judge score Δ (A4 vs A1) — a 1-point improvement on a 5-point scale represents the difference between a mostly-wrong coverage answer and a mostly- correct one. In production, each mis-handled coverage decision either results in an incorrect rejection (customer harm) or an incorrect approval (claim leakage). Reducing this rate directly lowers re-work and escalation volume.
-
Recall@5 on cross-modal queries — the gap between A2 (text-only) and A4 (with CLIP image index) quantifies how much grounding quality degrades when the damage photo pipeline is absent. Damage queries account for a large share of real claim interactions; missing this modality means the agent answers from general knowledge rather than indexed evidence.
-
Latency (A3 vs A4) — A3 blasts all three indices on every query. The router in A4 targets only the relevant index per query type, reducing average token consumption and response latency. Lower latency directly maps to higher conversation completion rates.
-
Token cost (A4 full run) —
avg_total_tokenslogged by MLflow is the per-query LLM cost proxy. At scale, routing queries to targeted indices rather than passing all retrieved context to the generator yields a measurable reduction in cost per handled claim.
src/analytics.py exposes four aggregate queries over the claims dataset,
demonstrating the structured-data layer that sits alongside the vector pipeline:
python src/analytics.py # all four queries
python src/analytics.py --query approval # approval rate by insurance type
python src/analytics.py --query pipeline # pipeline exposure summary# 1. Start Ollama on the host
ollama serve
# 2. Index data once (run outside Docker — chroma_db is volume-mounted)
python src/ingest.py
# 3. Build and start the API container
docker compose up --build
# 4. Verify
curl http://localhost:8000/healthThe /health endpoint returns uptime, active session count, and the configured
Ollama URL — wired into both the Dockerfile HEALTHCHECK and the
docker-compose.yml healthcheck so container orchestrators get a live
readiness signal.
Volume mounts (configured in docker-compose.yml):
| Host path | Container path | Purpose |
|---|---|---|
./chroma_db |
/app/chroma_db |
Pre-built vector indices |
./data |
/app/data (read-only) |
Policy PDFs, damage photos, claims JSON |
./uploads |
/app/uploads |
Runtime image uploads |
MIT