exai-insurance-intel

Exa-powered insurance intelligence toolkit for CAT-loss, claims, expert, contractor, and market research workflows.

What This Repo Is Today

This repo is an installable Python package and CLI workflow engine with a thin FastAPI backend and a pilot Next.js web UI for insurance intelligence.

What exists:

Python package + CLI workflows for search, answer, research, structured extraction, find-similar, evaluation, comparison, and budget inspection
FastAPI backend wrapping the shipped workflows as JSON endpoints
Next.js frontend with Search, Answer, Research, and My Work surfaces
Pilot auth and request-boundary controls for bearer auth, per-user scoping, and bounded usage
SQLite caching with budget enforcement and cost ledger
Additive persistence adapters for local SQLite plus pilot S3/Postgres backends
Evaluation taxonomy with benchmark suites (55+ insurance queries)
Artifact export system (JSON/JSONL/CSV/Markdown)
Smoke + live execution modes with CI on every push
Comprehensive test suite (~3,300 lines)

What does not exist yet:

Container or cloud deployment (no Docker, no Terraform)
Deployed cloud persistence/infrastructure baseline for S3/Postgres-backed pilot environments
Production-hardened external access model

Near-term direction: Build a controlled pilot web product layer on top of the existing workflow engine. See docs/roadmap.md for phased tracks and docs/pilot-architecture-decision.md for architectural defaults.

This repo stays intentionally lean while supporting repeatable notebook and CLI workflows for insurance research, cited answers, structured extraction, and report generation.

Current Local Status (Validated 2026-04-12)

This session revalidated the local smoke/mock path end to end.

Validated now:

Isolated virtual environment rebuild
python -m pip install --no-user -e '.[dev,api]'
python -m pytest -q
python -m ruff check .
python scripts/run_live_validation.py --mode smoke
Local FastAPI startup with uvicorn exa_demo.api:app --reload
Local backend checks at http://127.0.0.1:8000/health and http://127.0.0.1:8000/docs
Local frontend startup from frontend/ with npm install and npm run dev
Browser validation for Search, Answer, Research, and My Work against the local backend

Still not validated in this session:

--mode live or real Exa API traffic
S3 artifact storage or Postgres-backed usage/run persistence
Production readiness or deployed environments

Use docs/local-validation.md for the exact reproduction steps.

Core local resources:

one core notebook: exa_people_search_eval.ipynb
one local env file: .env (not committed)
one local sqlite cache: exa_cache.sqlite (not committed)

Feature Matrix

Capability	Status	Primary interface	Primary artifact/output	Notes
Ranked search	Done	`python -m exa_demo search`	`results.jsonl`, `summary.json`	Supports additive deep-search controls
Benchmark evaluation	Done	`python -m exa_demo eval`	`queries.jsonl`, `results.jsonl`, `summary.json`	Includes taxonomy scoring and grouped comparison context
Search-type comparison	Done	`python -m exa_demo compare-search-types`	`comparison.md` plus paired run artifacts	Compares `deep` vs `deep-reasoning` end to end
Cited answers	Done	`python -m exa_demo answer`	`answer.json`	Separate workflow from ranked-search evaluation
Research reports	Done	`python -m exa_demo research`	`research.json`	Separate report-style workflow with citations
Structured extraction	Done	`python -m exa_demo structured-search`	`structured_output.json`	Uses `outputSchema` for schema-driven extraction
Seed-URL discovery	Done	`python -m exa_demo find-similar`	`find_similar.json`	Separate `/findSimilar` workflow and normalization path
Cache and budget ledger	Done	Notebook + CLI	`exa_cache.sqlite`, `summary.json`	Prevents re-billing on cache hits
API server	Done	`uvicorn exa_demo.api:app`	JSON responses	Thin FastAPI wrapper over CLI workflows; smoke mode first-class
Frontend UI	Done	`npm run dev` in `frontend/`	Browser UI	Next.js + TypeScript + Tailwind + shadcn/ui; search, answer, research, My Work
Smoke CI and local tests	Done	GitHub Actions + `pytest -q`	CI runs and local test suite	Default workflow runs `pytest` and notebook smoke

Architecture

flowchart TD
    U["Notebook or CLI user flow"] --> CLI["CLI / notebook orchestration"]
    CLI --> CFG["Config + runtime state"]
    CLI --> CACHE["SQLite cache + budget ledger"]
    CLI --> CLIENT["Exa client adapters"]
    CLIENT --> EXA["Exa endpoints\n/search, /answer, /research, /findSimilar"]
    CLIENT --> MOCK["Smoke-mode mocked responses"]
    CLIENT --> MODELS["Typed models + normalization"]
    MODELS --> EVAL["Evaluation + taxonomy + grouped comparison"]
    MODELS --> ART["Artifact writer"]
    EVAL --> ART
    ART --> EXP["experiments/<run-id>/ artifacts"]
    EXP --> DOCS["README, roadmap, issue tracker, session notes"]

Repo Map

src/exa_demo/cli.py: thin CLI entrypoint; parser/runtime/compare helpers live in cli_parser.py, cli_runtime.py, and cli_eval.py
src/exa_demo/client.py: stable transport facade; payload builders and smoke responders live in client_payloads.py and client_smoke.py
src/exa_demo/models.py: stable model facade; shared API/result records live in api_models.py
src/exa_demo/reporting.py: summary/research facade; comparison analysis/rendering lives in comparison_reporting.py and comparison_analysis.py
src/exa_demo/artifacts.py: experiment writer; manifest/serialization helpers live in artifact_manifest.py
src/exa_demo/: remaining reusable package modules for config, evaluation, workflows, cache, and safety
tests/: CLI, client, model, artifact, script, and evaluation coverage
benchmarks/insurance_cat_queries.json: named query suites used by notebook and CLI evaluation
experiments/: runtime artifact root for workflow commands; local runs stay untracked by default
docs/roadmap.md: canonical backlog and delivery phases
docs/issue-tracker.md: GitHub issue-to-roadmap mapping
docs/sessions/: durable session history

What This Repo Evaluates

Query relevance for CAT-loss / insurance professional discovery
Cost per uncached query and projected spend at scale
Repeatability via sqlite caching (reruns should not re-bill)
Safe usage posture (public/professional info only)

Recommended Cheap Defaults

In Cell 2 (CONFIG), start with:

use_highlights=True
use_text=False
use_summary=False
num_results=5

Optional for before/after reporting in notebook Cell 9:

CONFIG["compare_to_run_id"] = "<baseline-run-id>"
CONFIG["compare_base_dir"] = "experiments"

Local Setup (PowerShell or Git Bash, Python 3.10+)

Create the virtual environment:

python -m venv .venv

Activate it.

PowerShell:

.\.venv\Scripts\Activate.ps1

Git Bash:

source .venv/Scripts/activate

Install the package, CLI, API extra, and dev tooling:

python -m pip install --upgrade pip
python -m pip install --no-user -e '.[dev,api]'

Optional local hooks:

pre-commit install

If package installation or editable installs land outside .venv, reactivate the environment and prefer python -m pip ... over bare pip .... The safest quick check is python -c "import sys; print(sys.executable)", which should point at .venv.

Configure Environment

Create .env from template:

PowerShell:

Copy-Item .env.example .env

Git Bash:

cp .env.example .env

Edit .env:

EXA_API_KEY=your_real_exa_api_key
EXA_SMOKE_NO_NETWORK=0
# Optional: set a run id label for grouped budget metrics
# EXA_RUN_ID=demo-2026-03-05

Notes:

.env is ignored by git.
The validated local path in this README stays on --mode smoke, so a real API key is not required unless you intentionally switch to live mode.
EXA_SMOKE_NO_NETWORK=1 is the safest env-level default if you want no-network behavior even when a command omits --mode smoke.

Run Notebook

Open exa_people_search_eval.ipynb in Jupyter Lab or VS Code and run top-to-bottom.

Notebook flow is intentionally fixed to 9 cells:

Install/import + env load
Config
Exa call wrapper
Cache wrapper (sqlite) + budget enforcement
Single query demo
Batch query suite
Summary + qualitative notes
Cost projections
Decision rubric + integration recommendations

CLI Commands

python -m exa_demo search "forensic engineer insurance expert witness" --mode smoke --json
python -m exa_demo answer "What is the Florida appraisal clause dispute process?" --mode smoke --json
python -m exa_demo research "Summarize the Florida CAT market outlook." --mode smoke --json
python -m exa_demo structured-search "independent adjuster florida catastrophe claims" --schema-file path/to/structured-schema.json --mode smoke --json
python -m exa_demo find-similar "https://example.com/florida-appraisal-decision" --mode smoke --json
python -m exa_demo search "Florida property insurance appraisal clause" --type deep --additional-query "Florida appraisal dispute statute" --start-published-date 2025-01-01 --livecrawl --json
python -m exa_demo eval --mode smoke --suite forensic_and_damage_engineering --limit 5 --json
python -m exa_demo compare-search-types --mode smoke --suite forensic_and_damage_engineering --baseline-type deep --candidate-type deep-reasoning --limit 5 --json
python -m exa_demo eval --mode smoke --limit 5 --compare-to-run-id 20260310T033256Z --json
python -m exa_demo budget --run-id demo-2026-03 --json

The search and eval commands write the same experiments/<RUN_ID>/ artifact bundle as the notebook flow. The answer command writes the same run directory and adds an answer.json artifact containing the cited-answer payload. The research command writes the same run directory and adds research.json plus research.md artifacts for the research-report payload. The structured-search command runs a schema-driven deep search and writes a structured_output.json artifact containing the extracted structured payload. The find-similar command runs a seed-URL discovery workflow and writes a find_similar.json artifact containing the similar-result payload. The endpoint-style workflows (answer, research, structured-search, and find-similar) also emit a reusable report.md companion artifact for human review. Eval workflows also emit additive export companions such as results.csv, comparison.json, grouped_query_outcomes.csv, and manifest.json without changing the existing JSON/JSONL contracts. Deep-search-oriented request shaping is now exposed directly in the CLI with additive flags such as --additional-query, --start-published-date, --end-published-date, and --livecrawl. Search cost estimation can also be overridden from the CLI for search-type experiments with flags such as --deep-search-cost-1-25 and --deep-reasoning-search-cost-1-25.

Eval output now includes a taxonomy scorecard (relevance, credibility, actionability, confidence) and per-query failure reasons (no_results, off_domain, low_confidence). Use --compare-to-run-id for before/after deltas across quality and failure rates when both runs share query text. When comparison is enabled, the run also writes a human-readable experiments/<RUN_ID>/comparison.md report with grouped query outcomes when suite context is available. compare-search-types is the end-to-end workflow for running the same suite against two search types and emitting the grouped comparison bundle in one command. answer is a separate cited-answer workflow and intentionally does not reuse the ranked-search evaluation taxonomy. structured-search is a separate schema-driven extraction workflow and intentionally stores the raw structured payload outside the ranked-search results.jsonl path. find-similar is a separate discovery workflow and intentionally stores the similar-result payload outside the ranked-search evaluation path.

API Server

The API is a thin FastAPI wrapper over the same workflow functions the CLI uses. If you followed the local setup above, the api extra is already installed.

Start the server:

uvicorn exa_demo.api:app --reload

The server runs on http://127.0.0.1:8000 by default.

Useful local URLs:

http://127.0.0.1:8000/health
http://127.0.0.1:8000/docs

Endpoints

Method	Path	Request body	Description
`GET`	`/health`	—	Health check with configured run/artifact backend labels
`POST`	`/api/search`	`{"query": "...", "mode": "smoke"}`	Ranked search with evaluation
`POST`	`/api/answer`	`{"query": "...", "mode": "smoke"}`	Cited-answer workflow
`POST`	`/api/research`	`{"query": "...", "mode": "smoke"}`	Research report workflow
`POST`	`/api/find-similar`	`{"url": "...", "mode": "smoke"}`	Seed-URL discovery
`POST`	`/api/structured-search`	`{"query": "...", "output_schema": {...}, "mode": "smoke"}`	Schema-driven extraction

All endpoints default to smoke mode. Set "mode": "live" for real Exa API calls (requires EXA_API_KEY).

Quick smoke test

curl -X POST http://127.0.0.1:8000/api/search -H "Content-Type: application/json" -d "{\"query\": \"forensic engineer insurance\", \"mode\": \"smoke\"}"

Frontend

The frontend is a Next.js app in frontend/ that calls the FastAPI backend through a server-side proxy (no CORS issues).

Setup

cd frontend
npm install

Copy the frontend env file:

PowerShell:

Copy-Item .env.local.example .env.local

Git Bash:

cp .env.local.example .env.local

Local Development (backend + frontend)

Terminal 1 — start the API server:

uvicorn exa_demo.api:app --reload

Terminal 2 — start the frontend:

cd frontend
npm run dev

Open http://localhost:3000. The frontend proxies API calls to the backend at http://127.0.0.1:8000 (configurable via BACKEND_URL in frontend/.env.local).

What the frontend provides

Health indicator showing backend connection status
Tab-based workflow selector (Search, Answer, Research)
My Work view showing newly created runs
Search: query input, result count, taxonomy scores, result cards with highlights
Answer: question input, cited answer display with source links
Research: topic input, report display with sources
Loading spinners and error handling on all workflows

The local UI validation in this session stayed on the smoke/mock path. Live Exa mode, S3 artifact storage, and Postgres-backed persistence were not exercised through the frontend.

Local Validation Runbook

Follow docs/local-validation.md for the exact smoke-path reproduction flow used in this session.

Benchmark Fixture

The packaged regression-style query fixture lives at benchmarks/insurance_cat_queries.json. The fixture now supports named suites while preserving the aggregate default set, and suites can be authored as plain query arrays or richer objects with suite metadata plus mixed string/object query entries. Use --suite all for the full benchmark or pick a named segment such as forensic_and_damage_engineering, restoration_and_mitigation, carrier_tpa_and_vendor_ecosystem, or regulatory_legislative_and_market_news. The notebook still owns execution and presentation, but Cell 6 now loads this fixture so the query set is reusable in tests and future CLI flows.

Experiment Artifacts

Each notebook run now writes a versioned artifact bundle under experiments/<RUN_ID>/:

config.json
queries.jsonl
results.jsonl
summary.json

Workflow-specific commands may also add:

answer.json
report.md
research.json
research.md
find_similar.json
structured_output.json
results.csv
comparison.json
grouped_query_outcomes.csv
manifest.json

Smoke-mode runs keep the same artifact shape, but with mocked results and zero spend. Every run also records runtime execution metadata in config.json, summary.json, and manifest.json so downstream review can distinguish smoke artifacts from live runs. Local runtime outputs under experiments/ are intentionally gitignored by default; the repo may still keep a small curated sample artifact set when needed for documentation or regression context.

Demo Gallery

Use docs/demo-gallery.md as the top-level walkthrough for the shipped workflows:

search for ranked discovery and evaluation
answer for cited lookup questions
research for report-style market or regulatory summaries
structured-search for schema-driven extraction
find-similar for seed-URL expansion
compare-search-types for quality/cost tradeoff analysis

The gallery is deliberately command-first. It points at the exact entrypoint and artifact for each workflow without changing the project core.

Integration Boundaries

Use docs/integration-boundaries.md for the current smoke-vs-live execution contract, artifact expectations, and delivery rules.

Manual Live Validation

Use the manual GitHub Actions workflow in .github/workflows/live-validation.yml when you want a deliberate real-API validation pass against the shipped CLI workflows.

Local smoke dry run:

python scripts/run_live_validation.py --mode smoke

Guidance:

default to --mode smoke locally unless you are intentionally validating live API behavior
the current local validation captured in this README did not exercise live mode
the manual workflow is bounded by design and uploads runtime artifacts for review
--include-comparison is optional because it is materially more expensive than the default endpoint checks

Cache + Budget Behavior

Requests are cached in exa_cache.sqlite by payload hash.
Repeated requests return cache hits and should not re-bill.
Budget hard stop applies to uncached calls in current RUN_ID.
Metrics still include all-time totals for visibility.

Reset cache safely:

python scripts/reset_cache.py

Skip prompt:

python scripts/reset_cache.py --yes

Smoke Runner (nbclient)

python scripts/run_notebook_smoke.py --mode smoke

This notebook smoke runner remains available, but it was not part of the local frontend/backend validation path rechecked on 2026-04-12.

Local Quality Gate

Validated in this session:

python -m ruff check .
python -m pytest -q
python scripts/run_live_validation.py --mode smoke
uvicorn exa_demo.api:app --reload
npm install and npm run dev in frontend/

Optional extra check:

pre-commit run --all-files

Modes:

--mode smoke: forced no-network run (default)
--mode live: real API calls (requires EXA_API_KEY)
--mode auto: live if key exists, otherwise smoke

Recommended boundary:

default to --mode smoke for development, CI, and docs validation
use --mode live only for deliberate manual validation with a real key and human review

Optional timeout override:

python scripts/run_notebook_smoke.py --mode smoke --timeout 180

Safe Cost Tuning

In Cell 2 (CONFIG), adjust only these first:

num_results
use_highlights
use_text
use_summary

Guidance:

Keep num_results low for first pass
Keep text/summary off unless needed for second-pass review
Estimator intentionally rejects unsupported high num_results ranges until pricing tiers are updated

GitHub Sync

If needed:

git init
git branch -M main
git add .
git commit -m "feat: minimal exa people search eval harness"
git remote add origin https://github.com/itprodirect/exai-insurance-intel.git
git push -u origin main

If remote already exists:

git remote set-url origin https://github.com/itprodirect/exai-insurance-intel.git
git push -u origin main

Deep-Dive Rebuild Review

For a from-scratch architecture critique and refactor roadmap, see docs/rebuild_review.md.

Roadmap and Delivery History

Canonical roadmap: docs/roadmap.md
GitHub issue tracker mapping: docs/issue-tracker.md
ADR index: docs/adr/README.md
Session note template: docs/sessions/README.md
Latest implementation session: docs/sessions/2026-04-12-local-smoke-validation-doc-sync.md

Guardrails

Public/professional info only
No address hunting / doxxing
No contact harvesting
Redaction stays enabled in notebook output
Human review required before operational use

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
.github		.github
assets		assets
benchmarks		benchmarks
docs		docs
experiments/20260310T033256Z		experiments/20260310T033256Z
frontend		frontend
memory		memory
scripts		scripts
src/exa_demo		src/exa_demo
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
AGENTS.md		AGENTS.md
HEARTBEAT.md		HEARTBEAT.md
LICENSE		LICENSE
MEMORY.md		MEMORY.md
README.md		README.md
exa_people_search_eval.ipynb		exa_people_search_eval.ipynb
heartbeat.json		heartbeat.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

exai-insurance-intel

What This Repo Is Today

Current Local Status (Validated 2026-04-12)

Quick Navigation

Feature Matrix

Architecture

Repo Map

What This Repo Evaluates

Recommended Cheap Defaults

Local Setup (PowerShell or Git Bash, Python 3.10+)

Configure Environment

Run Notebook

CLI Commands

API Server

Endpoints

Quick smoke test

Frontend

Setup

Local Development (backend + frontend)

What the frontend provides

Local Validation Runbook

Benchmark Fixture

Experiment Artifacts

Demo Gallery

Integration Boundaries

Manual Live Validation

Cache + Budget Behavior

Smoke Runner (nbclient)

Local Quality Gate

Safe Cost Tuning

GitHub Sync

Deep-Dive Rebuild Review

Roadmap and Delivery History

Guardrails

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages