Skip to content

itprodirect/exai-insurance-intel

Repository files navigation

exai-insurance-intel

Exa-powered insurance intelligence toolkit for CAT-loss, claims, expert, contractor, and market research workflows.

What This Repo Is Today

This repo is an installable Python package and CLI workflow engine with a thin FastAPI backend and a pilot Next.js web UI for insurance intelligence.

What exists:

  • Python package + CLI workflows for search, answer, research, structured extraction, find-similar, evaluation, comparison, and budget inspection
  • FastAPI backend wrapping the shipped workflows as JSON endpoints
  • Next.js frontend with Search, Answer, Research, and My Work surfaces
  • Pilot auth and request-boundary controls for bearer auth, per-user scoping, and bounded usage
  • SQLite caching with budget enforcement and cost ledger
  • Additive persistence adapters for local SQLite plus pilot S3/Postgres backends
  • Evaluation taxonomy with benchmark suites (55+ insurance queries)
  • Artifact export system (JSON/JSONL/CSV/Markdown)
  • Smoke + live execution modes with CI on every push
  • Comprehensive test suite (~3,300 lines)

What does not exist yet:

  • Container or cloud deployment (no Docker, no Terraform)
  • Deployed cloud persistence/infrastructure baseline for S3/Postgres-backed pilot environments
  • Production-hardened external access model

Near-term direction: Build a controlled pilot web product layer on top of the existing workflow engine. See docs/roadmap.md for phased tracks and docs/pilot-architecture-decision.md for architectural defaults.

This repo stays intentionally lean while supporting repeatable notebook and CLI workflows for insurance research, cited answers, structured extraction, and report generation.

Current Local Status (Validated 2026-04-12)

This session revalidated the local smoke/mock path end to end.

Validated now:

  • Isolated virtual environment rebuild
  • python -m pip install --no-user -e '.[dev,api]'
  • python -m pytest -q
  • python -m ruff check .
  • python scripts/run_live_validation.py --mode smoke
  • Local FastAPI startup with uvicorn exa_demo.api:app --reload
  • Local backend checks at http://127.0.0.1:8000/health and http://127.0.0.1:8000/docs
  • Local frontend startup from frontend/ with npm install and npm run dev
  • Browser validation for Search, Answer, Research, and My Work against the local backend

Still not validated in this session:

  • --mode live or real Exa API traffic
  • S3 artifact storage or Postgres-backed usage/run persistence
  • Production readiness or deployed environments

Use docs/local-validation.md for the exact reproduction steps.

Core local resources:

  • one core notebook: exa_people_search_eval.ipynb
  • one local env file: .env (not committed)
  • one local sqlite cache: exa_cache.sqlite (not committed)

Quick Navigation

Feature Matrix

Capability Status Primary interface Primary artifact/output Notes
Ranked search Done python -m exa_demo search results.jsonl, summary.json Supports additive deep-search controls
Benchmark evaluation Done python -m exa_demo eval queries.jsonl, results.jsonl, summary.json Includes taxonomy scoring and grouped comparison context
Search-type comparison Done python -m exa_demo compare-search-types comparison.md plus paired run artifacts Compares deep vs deep-reasoning end to end
Cited answers Done python -m exa_demo answer answer.json Separate workflow from ranked-search evaluation
Research reports Done python -m exa_demo research research.json Separate report-style workflow with citations
Structured extraction Done python -m exa_demo structured-search structured_output.json Uses outputSchema for schema-driven extraction
Seed-URL discovery Done python -m exa_demo find-similar find_similar.json Separate /findSimilar workflow and normalization path
Cache and budget ledger Done Notebook + CLI exa_cache.sqlite, summary.json Prevents re-billing on cache hits
API server Done uvicorn exa_demo.api:app JSON responses Thin FastAPI wrapper over CLI workflows; smoke mode first-class
Frontend UI Done npm run dev in frontend/ Browser UI Next.js + TypeScript + Tailwind + shadcn/ui; search, answer, research, My Work
Smoke CI and local tests Done GitHub Actions + pytest -q CI runs and local test suite Default workflow runs pytest and notebook smoke

Architecture

flowchart TD
    U["Notebook or CLI user flow"] --> CLI["CLI / notebook orchestration"]
    CLI --> CFG["Config + runtime state"]
    CLI --> CACHE["SQLite cache + budget ledger"]
    CLI --> CLIENT["Exa client adapters"]
    CLIENT --> EXA["Exa endpoints\n/search, /answer, /research, /findSimilar"]
    CLIENT --> MOCK["Smoke-mode mocked responses"]
    CLIENT --> MODELS["Typed models + normalization"]
    MODELS --> EVAL["Evaluation + taxonomy + grouped comparison"]
    MODELS --> ART["Artifact writer"]
    EVAL --> ART
    ART --> EXP["experiments/<run-id>/ artifacts"]
    EXP --> DOCS["README, roadmap, issue tracker, session notes"]
Loading

Repo Map

  • src/exa_demo/cli.py: thin CLI entrypoint; parser/runtime/compare helpers live in cli_parser.py, cli_runtime.py, and cli_eval.py
  • src/exa_demo/client.py: stable transport facade; payload builders and smoke responders live in client_payloads.py and client_smoke.py
  • src/exa_demo/models.py: stable model facade; shared API/result records live in api_models.py
  • src/exa_demo/reporting.py: summary/research facade; comparison analysis/rendering lives in comparison_reporting.py and comparison_analysis.py
  • src/exa_demo/artifacts.py: experiment writer; manifest/serialization helpers live in artifact_manifest.py
  • src/exa_demo/: remaining reusable package modules for config, evaluation, workflows, cache, and safety
  • tests/: CLI, client, model, artifact, script, and evaluation coverage
  • benchmarks/insurance_cat_queries.json: named query suites used by notebook and CLI evaluation
  • experiments/: runtime artifact root for workflow commands; local runs stay untracked by default
  • docs/roadmap.md: canonical backlog and delivery phases
  • docs/issue-tracker.md: GitHub issue-to-roadmap mapping
  • docs/sessions/: durable session history

What This Repo Evaluates

  • Query relevance for CAT-loss / insurance professional discovery
  • Cost per uncached query and projected spend at scale
  • Repeatability via sqlite caching (reruns should not re-bill)
  • Safe usage posture (public/professional info only)

Recommended Cheap Defaults

In Cell 2 (CONFIG), start with:

  • use_highlights=True
  • use_text=False
  • use_summary=False
  • num_results=5

Optional for before/after reporting in notebook Cell 9:

  • CONFIG["compare_to_run_id"] = "<baseline-run-id>"
  • CONFIG["compare_base_dir"] = "experiments"

Local Setup (PowerShell or Git Bash, Python 3.10+)

  1. Create the virtual environment:
python -m venv .venv
  1. Activate it.

PowerShell:

.\.venv\Scripts\Activate.ps1

Git Bash:

source .venv/Scripts/activate
  1. Install the package, CLI, API extra, and dev tooling:
python -m pip install --upgrade pip
python -m pip install --no-user -e '.[dev,api]'
  1. Optional local hooks:
pre-commit install

If package installation or editable installs land outside .venv, reactivate the environment and prefer python -m pip ... over bare pip .... The safest quick check is python -c "import sys; print(sys.executable)", which should point at .venv.

Configure Environment

Create .env from template:

PowerShell:

Copy-Item .env.example .env

Git Bash:

cp .env.example .env

Edit .env:

EXA_API_KEY=your_real_exa_api_key
EXA_SMOKE_NO_NETWORK=0
# Optional: set a run id label for grouped budget metrics
# EXA_RUN_ID=demo-2026-03-05

Notes:

  • .env is ignored by git.
  • The validated local path in this README stays on --mode smoke, so a real API key is not required unless you intentionally switch to live mode.
  • EXA_SMOKE_NO_NETWORK=1 is the safest env-level default if you want no-network behavior even when a command omits --mode smoke.

Run Notebook

Open exa_people_search_eval.ipynb in Jupyter Lab or VS Code and run top-to-bottom.

Notebook flow is intentionally fixed to 9 cells:

  1. Install/import + env load
  2. Config
  3. Exa call wrapper
  4. Cache wrapper (sqlite) + budget enforcement
  5. Single query demo
  6. Batch query suite
  7. Summary + qualitative notes
  8. Cost projections
  9. Decision rubric + integration recommendations

CLI Commands

python -m exa_demo search "forensic engineer insurance expert witness" --mode smoke --json
python -m exa_demo answer "What is the Florida appraisal clause dispute process?" --mode smoke --json
python -m exa_demo research "Summarize the Florida CAT market outlook." --mode smoke --json
python -m exa_demo structured-search "independent adjuster florida catastrophe claims" --schema-file path/to/structured-schema.json --mode smoke --json
python -m exa_demo find-similar "https://example.com/florida-appraisal-decision" --mode smoke --json
python -m exa_demo search "Florida property insurance appraisal clause" --type deep --additional-query "Florida appraisal dispute statute" --start-published-date 2025-01-01 --livecrawl --json
python -m exa_demo eval --mode smoke --suite forensic_and_damage_engineering --limit 5 --json
python -m exa_demo compare-search-types --mode smoke --suite forensic_and_damage_engineering --baseline-type deep --candidate-type deep-reasoning --limit 5 --json
python -m exa_demo eval --mode smoke --limit 5 --compare-to-run-id 20260310T033256Z --json
python -m exa_demo budget --run-id demo-2026-03 --json

The search and eval commands write the same experiments/<RUN_ID>/ artifact bundle as the notebook flow. The answer command writes the same run directory and adds an answer.json artifact containing the cited-answer payload. The research command writes the same run directory and adds research.json plus research.md artifacts for the research-report payload. The structured-search command runs a schema-driven deep search and writes a structured_output.json artifact containing the extracted structured payload. The find-similar command runs a seed-URL discovery workflow and writes a find_similar.json artifact containing the similar-result payload. The endpoint-style workflows (answer, research, structured-search, and find-similar) also emit a reusable report.md companion artifact for human review. Eval workflows also emit additive export companions such as results.csv, comparison.json, grouped_query_outcomes.csv, and manifest.json without changing the existing JSON/JSONL contracts. Deep-search-oriented request shaping is now exposed directly in the CLI with additive flags such as --additional-query, --start-published-date, --end-published-date, and --livecrawl. Search cost estimation can also be overridden from the CLI for search-type experiments with flags such as --deep-search-cost-1-25 and --deep-reasoning-search-cost-1-25.

Eval output now includes a taxonomy scorecard (relevance, credibility, actionability, confidence) and per-query failure reasons (no_results, off_domain, low_confidence). Use --compare-to-run-id for before/after deltas across quality and failure rates when both runs share query text. When comparison is enabled, the run also writes a human-readable experiments/<RUN_ID>/comparison.md report with grouped query outcomes when suite context is available. compare-search-types is the end-to-end workflow for running the same suite against two search types and emitting the grouped comparison bundle in one command. answer is a separate cited-answer workflow and intentionally does not reuse the ranked-search evaluation taxonomy. structured-search is a separate schema-driven extraction workflow and intentionally stores the raw structured payload outside the ranked-search results.jsonl path. find-similar is a separate discovery workflow and intentionally stores the similar-result payload outside the ranked-search evaluation path.

API Server

The API is a thin FastAPI wrapper over the same workflow functions the CLI uses. If you followed the local setup above, the api extra is already installed.

Start the server:

uvicorn exa_demo.api:app --reload

The server runs on http://127.0.0.1:8000 by default.

Useful local URLs:

  • http://127.0.0.1:8000/health
  • http://127.0.0.1:8000/docs

Endpoints

Method Path Request body Description
GET /health Health check with configured run/artifact backend labels
POST /api/search {"query": "...", "mode": "smoke"} Ranked search with evaluation
POST /api/answer {"query": "...", "mode": "smoke"} Cited-answer workflow
POST /api/research {"query": "...", "mode": "smoke"} Research report workflow
POST /api/find-similar {"url": "...", "mode": "smoke"} Seed-URL discovery
POST /api/structured-search {"query": "...", "output_schema": {...}, "mode": "smoke"} Schema-driven extraction

All endpoints default to smoke mode. Set "mode": "live" for real Exa API calls (requires EXA_API_KEY).

Quick smoke test

curl -X POST http://127.0.0.1:8000/api/search -H "Content-Type: application/json" -d "{\"query\": \"forensic engineer insurance\", \"mode\": \"smoke\"}"

Frontend

The frontend is a Next.js app in frontend/ that calls the FastAPI backend through a server-side proxy (no CORS issues).

Setup

cd frontend
npm install

Copy the frontend env file:

PowerShell:

Copy-Item .env.local.example .env.local

Git Bash:

cp .env.local.example .env.local

Local Development (backend + frontend)

Terminal 1 — start the API server:

uvicorn exa_demo.api:app --reload

Terminal 2 — start the frontend:

cd frontend
npm run dev

Open http://localhost:3000. The frontend proxies API calls to the backend at http://127.0.0.1:8000 (configurable via BACKEND_URL in frontend/.env.local).

What the frontend provides

  • Health indicator showing backend connection status
  • Tab-based workflow selector (Search, Answer, Research)
  • My Work view showing newly created runs
  • Search: query input, result count, taxonomy scores, result cards with highlights
  • Answer: question input, cited answer display with source links
  • Research: topic input, report display with sources
  • Loading spinners and error handling on all workflows

The local UI validation in this session stayed on the smoke/mock path. Live Exa mode, S3 artifact storage, and Postgres-backed persistence were not exercised through the frontend.

Local Validation Runbook

Follow docs/local-validation.md for the exact smoke-path reproduction flow used in this session.

Benchmark Fixture

The packaged regression-style query fixture lives at benchmarks/insurance_cat_queries.json. The fixture now supports named suites while preserving the aggregate default set, and suites can be authored as plain query arrays or richer objects with suite metadata plus mixed string/object query entries. Use --suite all for the full benchmark or pick a named segment such as forensic_and_damage_engineering, restoration_and_mitigation, carrier_tpa_and_vendor_ecosystem, or regulatory_legislative_and_market_news. The notebook still owns execution and presentation, but Cell 6 now loads this fixture so the query set is reusable in tests and future CLI flows.

Experiment Artifacts

Each notebook run now writes a versioned artifact bundle under experiments/<RUN_ID>/:

  • config.json
  • queries.jsonl
  • results.jsonl
  • summary.json

Workflow-specific commands may also add:

  • answer.json
  • report.md
  • research.json
  • research.md
  • find_similar.json
  • structured_output.json
  • results.csv
  • comparison.json
  • grouped_query_outcomes.csv
  • manifest.json

Smoke-mode runs keep the same artifact shape, but with mocked results and zero spend. Every run also records runtime execution metadata in config.json, summary.json, and manifest.json so downstream review can distinguish smoke artifacts from live runs. Local runtime outputs under experiments/ are intentionally gitignored by default; the repo may still keep a small curated sample artifact set when needed for documentation or regression context.

Demo Gallery

Use docs/demo-gallery.md as the top-level walkthrough for the shipped workflows:

  • search for ranked discovery and evaluation
  • answer for cited lookup questions
  • research for report-style market or regulatory summaries
  • structured-search for schema-driven extraction
  • find-similar for seed-URL expansion
  • compare-search-types for quality/cost tradeoff analysis

The gallery is deliberately command-first. It points at the exact entrypoint and artifact for each workflow without changing the project core.

Integration Boundaries

Use docs/integration-boundaries.md for the current smoke-vs-live execution contract, artifact expectations, and delivery rules.

Manual Live Validation

Use the manual GitHub Actions workflow in .github/workflows/live-validation.yml when you want a deliberate real-API validation pass against the shipped CLI workflows.

Local smoke dry run:

python scripts/run_live_validation.py --mode smoke

Guidance:

  • default to --mode smoke locally unless you are intentionally validating live API behavior
  • the current local validation captured in this README did not exercise live mode
  • the manual workflow is bounded by design and uploads runtime artifacts for review
  • --include-comparison is optional because it is materially more expensive than the default endpoint checks

Cache + Budget Behavior

  • Requests are cached in exa_cache.sqlite by payload hash.
  • Repeated requests return cache hits and should not re-bill.
  • Budget hard stop applies to uncached calls in current RUN_ID.
  • Metrics still include all-time totals for visibility.

Reset cache safely:

python scripts/reset_cache.py

Skip prompt:

python scripts/reset_cache.py --yes

Smoke Runner (nbclient)

python scripts/run_notebook_smoke.py --mode smoke

This notebook smoke runner remains available, but it was not part of the local frontend/backend validation path rechecked on 2026-04-12.

Local Quality Gate

Validated in this session:

  • python -m ruff check .
  • python -m pytest -q
  • python scripts/run_live_validation.py --mode smoke
  • uvicorn exa_demo.api:app --reload
  • npm install and npm run dev in frontend/

Optional extra check:

  • pre-commit run --all-files

Modes:

  • --mode smoke: forced no-network run (default)
  • --mode live: real API calls (requires EXA_API_KEY)
  • --mode auto: live if key exists, otherwise smoke

Recommended boundary:

  • default to --mode smoke for development, CI, and docs validation
  • use --mode live only for deliberate manual validation with a real key and human review

Optional timeout override:

python scripts/run_notebook_smoke.py --mode smoke --timeout 180

Safe Cost Tuning

In Cell 2 (CONFIG), adjust only these first:

  • num_results
  • use_highlights
  • use_text
  • use_summary

Guidance:

  • Keep num_results low for first pass
  • Keep text/summary off unless needed for second-pass review
  • Estimator intentionally rejects unsupported high num_results ranges until pricing tiers are updated

GitHub Sync

If needed:

git init
git branch -M main
git add .
git commit -m "feat: minimal exa people search eval harness"
git remote add origin https://github.com/itprodirect/exai-insurance-intel.git
git push -u origin main

If remote already exists:

git remote set-url origin https://github.com/itprodirect/exai-insurance-intel.git
git push -u origin main

Deep-Dive Rebuild Review

For a from-scratch architecture critique and refactor roadmap, see docs/rebuild_review.md.

Roadmap and Delivery History

Guardrails

  • Public/professional info only
  • No address hunting / doxxing
  • No contact harvesting
  • Redaction stays enabled in notebook output
  • Human review required before operational use

About

AI-powered insurance intelligence toolkit for carrier research, expert lookup, contractor vetting, and CAT-loss investigation.

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors