Exa-powered insurance intelligence toolkit for CAT-loss, claims, expert, contractor, and market research workflows.
This repo is an installable Python package and CLI workflow engine with a thin FastAPI backend and a pilot Next.js web UI for insurance intelligence.
What exists:
- Python package + CLI workflows for search, answer, research, structured extraction, find-similar, evaluation, comparison, and budget inspection
- FastAPI backend wrapping the shipped workflows as JSON endpoints
- Next.js frontend with Search, Answer, Research, and My Work surfaces
- Pilot auth and request-boundary controls for bearer auth, per-user scoping, and bounded usage
- SQLite caching with budget enforcement and cost ledger
- Additive persistence adapters for local SQLite plus pilot S3/Postgres backends
- Evaluation taxonomy with benchmark suites (55+ insurance queries)
- Artifact export system (JSON/JSONL/CSV/Markdown)
- Smoke + live execution modes with CI on every push
- Comprehensive test suite (~3,300 lines)
What does not exist yet:
- Container or cloud deployment (no Docker, no Terraform)
- Deployed cloud persistence/infrastructure baseline for S3/Postgres-backed pilot environments
- Production-hardened external access model
Near-term direction: Build a controlled pilot web product layer on top of the existing workflow engine. See docs/roadmap.md for phased tracks and docs/pilot-architecture-decision.md for architectural defaults.
This repo stays intentionally lean while supporting repeatable notebook and CLI workflows for insurance research, cited answers, structured extraction, and report generation.
This session revalidated the local smoke/mock path end to end.
Validated now:
- Isolated virtual environment rebuild
python -m pip install --no-user -e '.[dev,api]'python -m pytest -qpython -m ruff check .python scripts/run_live_validation.py --mode smoke- Local FastAPI startup with
uvicorn exa_demo.api:app --reload - Local backend checks at
http://127.0.0.1:8000/healthandhttp://127.0.0.1:8000/docs - Local frontend startup from
frontend/withnpm installandnpm run dev - Browser validation for Search, Answer, Research, and My Work against the local backend
Still not validated in this session:
--mode liveor real Exa API traffic- S3 artifact storage or Postgres-backed usage/run persistence
- Production readiness or deployed environments
Use docs/local-validation.md for the exact reproduction steps.
Core local resources:
- one core notebook:
exa_people_search_eval.ipynb - one local env file:
.env(not committed) - one local sqlite cache:
exa_cache.sqlite(not committed)
- What this repo evaluates
- Feature matrix
- Architecture
- Current local status
- Local setup
- CLI commands
- API server
- Frontend
- Local validation runbook
- Experiment artifacts
- Demo gallery
- Integration boundaries
- Manual Live Validation
- Roadmap and delivery history
- Guardrails
| Capability | Status | Primary interface | Primary artifact/output | Notes |
|---|---|---|---|---|
| Ranked search | Done | python -m exa_demo search |
results.jsonl, summary.json |
Supports additive deep-search controls |
| Benchmark evaluation | Done | python -m exa_demo eval |
queries.jsonl, results.jsonl, summary.json |
Includes taxonomy scoring and grouped comparison context |
| Search-type comparison | Done | python -m exa_demo compare-search-types |
comparison.md plus paired run artifacts |
Compares deep vs deep-reasoning end to end |
| Cited answers | Done | python -m exa_demo answer |
answer.json |
Separate workflow from ranked-search evaluation |
| Research reports | Done | python -m exa_demo research |
research.json |
Separate report-style workflow with citations |
| Structured extraction | Done | python -m exa_demo structured-search |
structured_output.json |
Uses outputSchema for schema-driven extraction |
| Seed-URL discovery | Done | python -m exa_demo find-similar |
find_similar.json |
Separate /findSimilar workflow and normalization path |
| Cache and budget ledger | Done | Notebook + CLI | exa_cache.sqlite, summary.json |
Prevents re-billing on cache hits |
| API server | Done | uvicorn exa_demo.api:app |
JSON responses | Thin FastAPI wrapper over CLI workflows; smoke mode first-class |
| Frontend UI | Done | npm run dev in frontend/ |
Browser UI | Next.js + TypeScript + Tailwind + shadcn/ui; search, answer, research, My Work |
| Smoke CI and local tests | Done | GitHub Actions + pytest -q |
CI runs and local test suite | Default workflow runs pytest and notebook smoke |
flowchart TD
U["Notebook or CLI user flow"] --> CLI["CLI / notebook orchestration"]
CLI --> CFG["Config + runtime state"]
CLI --> CACHE["SQLite cache + budget ledger"]
CLI --> CLIENT["Exa client adapters"]
CLIENT --> EXA["Exa endpoints\n/search, /answer, /research, /findSimilar"]
CLIENT --> MOCK["Smoke-mode mocked responses"]
CLIENT --> MODELS["Typed models + normalization"]
MODELS --> EVAL["Evaluation + taxonomy + grouped comparison"]
MODELS --> ART["Artifact writer"]
EVAL --> ART
ART --> EXP["experiments/<run-id>/ artifacts"]
EXP --> DOCS["README, roadmap, issue tracker, session notes"]
src/exa_demo/cli.py: thin CLI entrypoint; parser/runtime/compare helpers live incli_parser.py,cli_runtime.py, andcli_eval.pysrc/exa_demo/client.py: stable transport facade; payload builders and smoke responders live inclient_payloads.pyandclient_smoke.pysrc/exa_demo/models.py: stable model facade; shared API/result records live inapi_models.pysrc/exa_demo/reporting.py: summary/research facade; comparison analysis/rendering lives incomparison_reporting.pyandcomparison_analysis.pysrc/exa_demo/artifacts.py: experiment writer; manifest/serialization helpers live inartifact_manifest.pysrc/exa_demo/: remaining reusable package modules for config, evaluation, workflows, cache, and safetytests/: CLI, client, model, artifact, script, and evaluation coveragebenchmarks/insurance_cat_queries.json: named query suites used by notebook and CLI evaluationexperiments/: runtime artifact root for workflow commands; local runs stay untracked by defaultdocs/roadmap.md: canonical backlog and delivery phasesdocs/issue-tracker.md: GitHub issue-to-roadmap mappingdocs/sessions/: durable session history
- Query relevance for CAT-loss / insurance professional discovery
- Cost per uncached query and projected spend at scale
- Repeatability via sqlite caching (reruns should not re-bill)
- Safe usage posture (public/professional info only)
In Cell 2 (CONFIG), start with:
use_highlights=Trueuse_text=Falseuse_summary=Falsenum_results=5
Optional for before/after reporting in notebook Cell 9:
CONFIG["compare_to_run_id"] = "<baseline-run-id>"CONFIG["compare_base_dir"] = "experiments"
- Create the virtual environment:
python -m venv .venv- Activate it.
PowerShell:
.\.venv\Scripts\Activate.ps1Git Bash:
source .venv/Scripts/activate- Install the package, CLI, API extra, and dev tooling:
python -m pip install --upgrade pip
python -m pip install --no-user -e '.[dev,api]'- Optional local hooks:
pre-commit installIf package installation or editable installs land outside .venv, reactivate the environment and prefer python -m pip ... over bare pip .... The safest quick check is python -c "import sys; print(sys.executable)", which should point at .venv.
Create .env from template:
PowerShell:
Copy-Item .env.example .envGit Bash:
cp .env.example .envEdit .env:
EXA_API_KEY=your_real_exa_api_key
EXA_SMOKE_NO_NETWORK=0
# Optional: set a run id label for grouped budget metrics
# EXA_RUN_ID=demo-2026-03-05Notes:
.envis ignored by git.- The validated local path in this README stays on
--mode smoke, so a real API key is not required unless you intentionally switch to live mode. EXA_SMOKE_NO_NETWORK=1is the safest env-level default if you want no-network behavior even when a command omits--mode smoke.
Open exa_people_search_eval.ipynb in Jupyter Lab or VS Code and run top-to-bottom.
Notebook flow is intentionally fixed to 9 cells:
- Install/import + env load
- Config
- Exa call wrapper
- Cache wrapper (sqlite) + budget enforcement
- Single query demo
- Batch query suite
- Summary + qualitative notes
- Cost projections
- Decision rubric + integration recommendations
python -m exa_demo search "forensic engineer insurance expert witness" --mode smoke --json
python -m exa_demo answer "What is the Florida appraisal clause dispute process?" --mode smoke --json
python -m exa_demo research "Summarize the Florida CAT market outlook." --mode smoke --json
python -m exa_demo structured-search "independent adjuster florida catastrophe claims" --schema-file path/to/structured-schema.json --mode smoke --json
python -m exa_demo find-similar "https://example.com/florida-appraisal-decision" --mode smoke --json
python -m exa_demo search "Florida property insurance appraisal clause" --type deep --additional-query "Florida appraisal dispute statute" --start-published-date 2025-01-01 --livecrawl --json
python -m exa_demo eval --mode smoke --suite forensic_and_damage_engineering --limit 5 --json
python -m exa_demo compare-search-types --mode smoke --suite forensic_and_damage_engineering --baseline-type deep --candidate-type deep-reasoning --limit 5 --json
python -m exa_demo eval --mode smoke --limit 5 --compare-to-run-id 20260310T033256Z --json
python -m exa_demo budget --run-id demo-2026-03 --jsonThe search and eval commands write the same experiments/<RUN_ID>/ artifact bundle as the notebook flow.
The answer command writes the same run directory and adds an answer.json artifact containing the cited-answer payload.
The research command writes the same run directory and adds research.json plus research.md artifacts for the research-report payload.
The structured-search command runs a schema-driven deep search and writes a structured_output.json artifact containing the extracted structured payload.
The find-similar command runs a seed-URL discovery workflow and writes a find_similar.json artifact containing the similar-result payload.
The endpoint-style workflows (answer, research, structured-search, and find-similar) also emit a reusable report.md companion artifact for human review.
Eval workflows also emit additive export companions such as results.csv, comparison.json, grouped_query_outcomes.csv, and manifest.json without changing the existing JSON/JSONL contracts.
Deep-search-oriented request shaping is now exposed directly in the CLI with additive flags such as --additional-query, --start-published-date, --end-published-date, and --livecrawl.
Search cost estimation can also be overridden from the CLI for search-type experiments with flags such as --deep-search-cost-1-25 and --deep-reasoning-search-cost-1-25.
Eval output now includes a taxonomy scorecard (relevance, credibility, actionability, confidence) and per-query failure reasons (no_results, off_domain, low_confidence).
Use --compare-to-run-id for before/after deltas across quality and failure rates when both runs share query text.
When comparison is enabled, the run also writes a human-readable experiments/<RUN_ID>/comparison.md report with grouped query outcomes when suite context is available.
compare-search-types is the end-to-end workflow for running the same suite against two search types and emitting the grouped comparison bundle in one command.
answer is a separate cited-answer workflow and intentionally does not reuse the ranked-search evaluation taxonomy.
structured-search is a separate schema-driven extraction workflow and intentionally stores the raw structured payload outside the ranked-search results.jsonl path.
find-similar is a separate discovery workflow and intentionally stores the similar-result payload outside the ranked-search evaluation path.
The API is a thin FastAPI wrapper over the same workflow functions the CLI uses. If you followed the local setup above, the api extra is already installed.
Start the server:
uvicorn exa_demo.api:app --reloadThe server runs on http://127.0.0.1:8000 by default.
Useful local URLs:
http://127.0.0.1:8000/healthhttp://127.0.0.1:8000/docs
| Method | Path | Request body | Description |
|---|---|---|---|
GET |
/health |
— | Health check with configured run/artifact backend labels |
POST |
/api/search |
{"query": "...", "mode": "smoke"} |
Ranked search with evaluation |
POST |
/api/answer |
{"query": "...", "mode": "smoke"} |
Cited-answer workflow |
POST |
/api/research |
{"query": "...", "mode": "smoke"} |
Research report workflow |
POST |
/api/find-similar |
{"url": "...", "mode": "smoke"} |
Seed-URL discovery |
POST |
/api/structured-search |
{"query": "...", "output_schema": {...}, "mode": "smoke"} |
Schema-driven extraction |
All endpoints default to smoke mode. Set "mode": "live" for real Exa API calls (requires EXA_API_KEY).
curl -X POST http://127.0.0.1:8000/api/search -H "Content-Type: application/json" -d "{\"query\": \"forensic engineer insurance\", \"mode\": \"smoke\"}"The frontend is a Next.js app in frontend/ that calls the FastAPI backend through a server-side proxy (no CORS issues).
cd frontend
npm installCopy the frontend env file:
PowerShell:
Copy-Item .env.local.example .env.localGit Bash:
cp .env.local.example .env.localTerminal 1 — start the API server:
uvicorn exa_demo.api:app --reloadTerminal 2 — start the frontend:
cd frontend
npm run devOpen http://localhost:3000. The frontend proxies API calls to the backend at http://127.0.0.1:8000 (configurable via BACKEND_URL in frontend/.env.local).
- Health indicator showing backend connection status
- Tab-based workflow selector (Search, Answer, Research)
- My Work view showing newly created runs
- Search: query input, result count, taxonomy scores, result cards with highlights
- Answer: question input, cited answer display with source links
- Research: topic input, report display with sources
- Loading spinners and error handling on all workflows
The local UI validation in this session stayed on the smoke/mock path. Live Exa mode, S3 artifact storage, and Postgres-backed persistence were not exercised through the frontend.
Follow docs/local-validation.md for the exact smoke-path reproduction flow used in this session.
The packaged regression-style query fixture lives at benchmarks/insurance_cat_queries.json.
The fixture now supports named suites while preserving the aggregate default set, and suites can be authored as plain query arrays or richer objects with suite metadata plus mixed string/object query entries.
Use --suite all for the full benchmark or pick a named segment such as forensic_and_damage_engineering, restoration_and_mitigation, carrier_tpa_and_vendor_ecosystem, or regulatory_legislative_and_market_news.
The notebook still owns execution and presentation, but Cell 6 now loads this fixture so the query set is reusable in tests and future CLI flows.
Each notebook run now writes a versioned artifact bundle under experiments/<RUN_ID>/:
config.jsonqueries.jsonlresults.jsonlsummary.json
Workflow-specific commands may also add:
answer.jsonreport.mdresearch.jsonresearch.mdfind_similar.jsonstructured_output.jsonresults.csvcomparison.jsongrouped_query_outcomes.csvmanifest.json
Smoke-mode runs keep the same artifact shape, but with mocked results and zero spend.
Every run also records runtime execution metadata in config.json, summary.json, and manifest.json so downstream review can distinguish smoke artifacts from live runs.
Local runtime outputs under experiments/ are intentionally gitignored by default; the repo may still keep a small curated sample artifact set when needed for documentation or regression context.
Use docs/demo-gallery.md as the top-level walkthrough for the shipped workflows:
searchfor ranked discovery and evaluationanswerfor cited lookup questionsresearchfor report-style market or regulatory summariesstructured-searchfor schema-driven extractionfind-similarfor seed-URL expansioncompare-search-typesfor quality/cost tradeoff analysis
The gallery is deliberately command-first. It points at the exact entrypoint and artifact for each workflow without changing the project core.
Use docs/integration-boundaries.md for the current smoke-vs-live execution contract, artifact expectations, and delivery rules.
Use the manual GitHub Actions workflow in .github/workflows/live-validation.yml when you want a deliberate real-API validation pass against the shipped CLI workflows.
Local smoke dry run:
python scripts/run_live_validation.py --mode smokeGuidance:
- default to
--mode smokelocally unless you are intentionally validating live API behavior - the current local validation captured in this README did not exercise live mode
- the manual workflow is bounded by design and uploads runtime artifacts for review
--include-comparisonis optional because it is materially more expensive than the default endpoint checks
- Requests are cached in
exa_cache.sqliteby payload hash. - Repeated requests return cache hits and should not re-bill.
- Budget hard stop applies to uncached calls in current
RUN_ID. - Metrics still include all-time totals for visibility.
Reset cache safely:
python scripts/reset_cache.pySkip prompt:
python scripts/reset_cache.py --yespython scripts/run_notebook_smoke.py --mode smokeThis notebook smoke runner remains available, but it was not part of the local frontend/backend validation path rechecked on 2026-04-12.
Validated in this session:
python -m ruff check .python -m pytest -qpython scripts/run_live_validation.py --mode smokeuvicorn exa_demo.api:app --reloadnpm installandnpm run devinfrontend/
Optional extra check:
pre-commit run --all-files
Modes:
--mode smoke: forced no-network run (default)--mode live: real API calls (requiresEXA_API_KEY)--mode auto: live if key exists, otherwise smoke
Recommended boundary:
- default to
--mode smokefor development, CI, and docs validation - use
--mode liveonly for deliberate manual validation with a real key and human review
Optional timeout override:
python scripts/run_notebook_smoke.py --mode smoke --timeout 180In Cell 2 (CONFIG), adjust only these first:
num_resultsuse_highlightsuse_textuse_summary
Guidance:
- Keep
num_resultslow for first pass - Keep text/summary off unless needed for second-pass review
- Estimator intentionally rejects unsupported high
num_resultsranges until pricing tiers are updated
If needed:
git init
git branch -M main
git add .
git commit -m "feat: minimal exa people search eval harness"
git remote add origin https://github.com/itprodirect/exai-insurance-intel.git
git push -u origin mainIf remote already exists:
git remote set-url origin https://github.com/itprodirect/exai-insurance-intel.git
git push -u origin mainFor a from-scratch architecture critique and refactor roadmap, see docs/rebuild_review.md.
- Canonical roadmap: docs/roadmap.md
- GitHub issue tracker mapping: docs/issue-tracker.md
- ADR index: docs/adr/README.md
- Session note template: docs/sessions/README.md
- Latest implementation session: docs/sessions/2026-04-12-local-smoke-validation-doc-sync.md
- Public/professional info only
- No address hunting / doxxing
- No contact harvesting
- Redaction stays enabled in notebook output
- Human review required before operational use