Overview

EvalCI is a GitHub Action that runs your @eval_case suites on every pull request and blocks merge if quality drops below threshold.

No infrastructure. No backend. 2-minute setup.

→ Full docs: https://synapsekit.github.io/synapsekit-docs/docs/evalci/overview

Why EvalCI

LLM applications degrade silently. A prompt change, a model update, a retrieval tweak — any of these can drop quality by 10–20% without a single test failure. EvalCI gives you a quality gate that catches this before it ships.

Without EvalCI	With EvalCI
Quality regressions ship to production	Blocked at PR review
Manual eval runs, inconsistent	Automatic on every PR
No visibility into cost/latency trends	Score, cost, latency per case on every PR
Requires external tooling	Works in your existing GitHub Actions

How it works

pip install synapsekit[{extras}] on the runner
synapsekit test {path} --format json --threshold {threshold} — discovers all @eval_case functions, runs them, outputs JSON
Parse JSON results
Post results table as a PR comment
Set Action outputs: passed, failed, total, mean-score
Exit 0 (all pass) or 1 (any failure)

PR comment

## EvalCI Results

|   | Test                   | Score | Cost    | Latency |
|---|------------------------|-------|---------|---------|
| ✅ | eval_rag_relevancy     | 0.850 | $0.0050 | 1200ms  |
| ❌ | eval_rag_faithfulness  | 0.650 | $0.0120 | 2500ms  |

**1/2 passed** · Threshold: `0.80`

File discovery

EvalCI discovers files matching eval_*.py or *_eval.py recursively under path.

tests/
└── evals/
    ├── eval_rag.py        ✅ discovered
    ├── eval_agents.py     ✅ discovered
    ├── rag_eval.py        ✅ discovered
    └── test_rag.py        ❌ not discovered

Next steps

Quickstart — set up in 5 minutes
Writing-Eval-Cases — write @eval_case functions
Action-Reference — all inputs and outputs
Examples — real-world workflows

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Overview

Overview

Why EvalCI

How it works

PR comment

File discovery

Next steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally