-
-
Notifications
You must be signed in to change notification settings - Fork 0
Overview
gaafa edited this page Apr 9, 2026
·
1 revision
EvalCI is a GitHub Action that runs your @eval_case suites on every pull request and blocks merge if quality drops below threshold.
No infrastructure. No backend. 2-minute setup.
→ Full docs: https://synapsekit.github.io/synapsekit-docs/docs/evalci/overview
LLM applications degrade silently. A prompt change, a model update, a retrieval tweak — any of these can drop quality by 10–20% without a single test failure. EvalCI gives you a quality gate that catches this before it ships.
| Without EvalCI | With EvalCI |
|---|---|
| Quality regressions ship to production | Blocked at PR review |
| Manual eval runs, inconsistent | Automatic on every PR |
| No visibility into cost/latency trends | Score, cost, latency per case on every PR |
| Requires external tooling | Works in your existing GitHub Actions |
-
pip install synapsekit[{extras}]on the runner -
synapsekit test {path} --format json --threshold {threshold}— discovers all@eval_casefunctions, runs them, outputs JSON - Parse JSON results
- Post results table as a PR comment
- Set Action outputs:
passed,failed,total,mean-score - Exit
0(all pass) or1(any failure)
## EvalCI Results
| | Test | Score | Cost | Latency |
|---|------------------------|-------|---------|---------|
| ✅ | eval_rag_relevancy | 0.850 | $0.0050 | 1200ms |
| ❌ | eval_rag_faithfulness | 0.650 | $0.0120 | 2500ms |
**1/2 passed** · Threshold: `0.80`
EvalCI discovers files matching eval_*.py or *_eval.py recursively under path.
tests/
└── evals/
├── eval_rag.py ✅ discovered
├── eval_agents.py ✅ discovered
├── rag_eval.py ✅ discovered
└── test_rag.py ❌ not discovered
- Quickstart — set up in 5 minutes
-
Writing-Eval-Cases — write
@eval_casefunctions - Action-Reference — all inputs and outputs
- Examples — real-world workflows