feat: test harness with LLM judge, replacing smartest framework by yanekyuk · Pull Request #562 · indexnetwork/index

yanekyuk · 2026-03-24T00:26:11Z

Summary

New protocol/src/lib/test-harness/ library that replaces the smartest framework with a leaner approach that feels like standard bun tests
createTestHarness() factory wires real adapters (Drizzle DB, EmbedderAdapter, RedisCacheAdapter) against a test database with setup/reset/teardown lifecycle
assertMatchesSchema(value, zodSchema) for deterministic Zod validation
assertLLMEvaluate(value, config) for scored semantic criteria via LLM judge — supports per-criterion required flag, per-criterion min threshold, and overall minScore
LLM judge uses google/gemini-2.5-flash via OpenRouter with structured output (Zod response format)
Migrated 2 existing smartest test files (opportunity evaluator stress test, direct-connection graph test)
Graceful no-op when OPENROUTER_API_KEY is missing (warns, doesn't fail)

New Features

createTestHarness() — real adapter injection for integration tests
assertMatchesSchema() — Zod schema validation assertion
assertLLMEvaluate() — scored semantic criteria with LLM judge
callJudge() — core LLM judge function with structured output

Refactors

Migrated opportunity.evaluator.smartest.spec.ts → stress test block in opportunity.evaluator.spec.ts
Migrated opportunity.graph.direct-connection.smartest.spec.ts → opportunity.graph.direct-connection.spec.ts

Documentation

Updated .env.example with DATABASE_TEST_URL and TEST_JUDGE_MODEL (replacing SMARTEST_VERIFIER_MODEL/SMARTEST_GENERATOR_MODEL)

Deferred

Smartest framework (lib/smartest/) stays until remaining 7 test files are migrated (follow-up)
queue and graphs properties on TestHarness deferred until graph-level integration tests need them

Test plan

Run bun test src/lib/test-harness/tests/judge.spec.ts — 2 tests (LLM scoring, missing API key)
Run bun test src/lib/test-harness/tests/assertions.spec.ts — 4 tests (schema valid/invalid, LLM pass/fail)
Run bun test src/lib/test-harness/tests/harness.spec.ts — 3 tests (db connection, embedder, reset)
Run bun test src/lib/protocol/agents/tests/opportunity.evaluator.spec.ts — verify stress test block
Run bun test src/lib/protocol/graphs/tests/opportunity.graph.direct-connection.spec.ts — verify migrated test
Verify tsc --noEmit passes

…pter wiring

…st to test-harness

…harness

Replace SMARTEST_VERIFIER_MODEL/SMARTEST_GENERATOR_MODEL with DATABASE_TEST_URL and TEST_JUDGE_MODEL.

…rface - assertLLMEvaluate returns passing no-op result instead of throwing when OPENROUTER_API_KEY is not set (prevents CI failures) - TestHarness.cache typed as Cache interface, not RedisCacheAdapter

…p docs - Add console.warn when criterion matching falls back to positional index, making unreliable LLM judge responses visible in test output - Remove stale path comment from judge.prompt.ts - Expand TSDoc on assertLLMEvaluate documenting the no-op skip behavior and recommending describe.skipIf for LLM-primary tests

coderabbitai · 2026-03-24T00:26:22Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7373f34a-148e-4c55-9c11-b9e1c1e3c72f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

yanekyuk added 10 commits March 24, 2026 02:20

feat(test-harness): add LLM judge prompt and response schema

89621ef

feat(test-harness): implement LLM judge function

62b9093

feat(test-harness): implement assertMatchesSchema and assertLLMEvaluate

03a4cba

feat(test-harness): implement createTestHarness factory with real ada…

98bc6dc

…pter wiring

feat(test-harness): add barrel export

07c7ea7

refactor(test): migrate opportunity evaluator stress test from smarte…

b2cbc8a

…st to test-harness

refactor(test): migrate direct-connection test from smartest to test-…

aeff332

…harness

docs: update env example with test harness variables

8667227

Replace SMARTEST_VERIFIER_MODEL/SMARTEST_GENERATOR_MODEL with DATABASE_TEST_URL and TEST_JUDGE_MODEL.

fix(test-harness): graceful skip when API key missing, use Cache inte…

88c1640

…rface - assertLLMEvaluate returns passing no-op result instead of throwing when OPENROUTER_API_KEY is not set (prevents CI failures) - TestHarness.cache typed as Cache interface, not RedisCacheAdapter

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: test harness with LLM judge, replacing smartest framework#562

feat: test harness with LLM judge, replacing smartest framework#562
yanekyuk wants to merge 10 commits intoindexnetwork:devfrom
yanekyuk:feat/test-harness

yanekyuk commented Mar 24, 2026

Uh oh!

coderabbitai bot commented Mar 24, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yanekyuk commented Mar 24, 2026

Summary

New Features

Refactors

Documentation

Deferred

Test plan

Uh oh!

coderabbitai bot commented Mar 24, 2026

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant