EvalCI by SynapseKit

LLM quality gates for every PR. Run your @eval_case suites automatically and block merge if quality drops below threshold.

Zero infrastructure — runs entirely in GitHub Actions
2-minute setup
Works with any LLM provider (OpenAI, Anthropic, Gemini, and 30+ more)
Posts a formatted results table as a PR comment
Sets Action outputs for downstream steps

Quickstart

Add .github/workflows/eval.yml to your repo:

name: EvalCI

on:
  pull_request:

jobs:
  eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: SynapseKit/evalci@v1
        with:
          path: tests/evals
          threshold: "0.80"
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

That's it. EvalCI will:

Install synapsekit into the runner
Discover and run all @eval_case-decorated functions under tests/evals/
Post a results table as a PR comment
Fail the check if any case scores below threshold

Example eval file

# tests/evals/test_rag.py
from synapsekit.testing import eval_case

@eval_case(min_score=0.80, max_cost_usd=0.01, max_latency_ms=3000)
def test_rag_relevancy(eval_context):
    result = my_rag_pipeline("What is SynapseKit?")
    return eval_context.score_relevancy(result, reference="SynapseKit is a Python library...")

@eval_case(min_score=0.75)
def test_rag_faithfulness(eval_context):
    result = my_rag_pipeline("How do I install SynapseKit?")
    return eval_context.score_faithfulness(result, context=retrieved_docs)

PR Comment

EvalCI posts a comment like this on every PR:

EvalCI Results

Test Score Cost Latency

✅ test_rag_relevancy 0.850 $0.0050 1200ms

❌ test_rag_faithfulness 0.650 $0.0120 2500ms

1/2 passed · Threshold: 0.80 · SynapseKit EvalCI

Inputs

Input	Description	Default
`path`	Path to eval files or directory	`.`
`threshold`	Global minimum score (0.0–1.0)	`0.7`
`extras`	pip extras for synapsekit (e.g. `openai,anthropic`)	`openai`
`synapsekit-version`	synapsekit version to install, or `latest`	`latest`
`github-token`	Token for posting PR comments	`${{ github.token }}`
`fail-on-regression`	Fail if score regresses vs. baseline	`false`
`token`	EvalCI backend API token (future)	—

Outputs

Output	Description
`passed`	Number of eval cases that passed
`failed`	Number of eval cases that failed
`total`	Total number of eval cases run
`mean-score`	Mean score across all eval cases

Using outputs in downstream steps

- uses: SynapseKit/evalci@v1
  id: eval
  with:
    path: tests/evals
- run: |
    echo "Passed: ${{ steps.eval.outputs.passed }}/${{ steps.eval.outputs.total }}"
    echo "Mean score: ${{ steps.eval.outputs.mean-score }}"

Multiple providers

- uses: SynapseKit/evalci@v1
  with:
    extras: "openai,anthropic"
    threshold: "0.75"
  env:
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

Badge

[![EvalCI](https://github.com/{owner}/{repo}/actions/workflows/eval.yml/badge.svg)](https://github.com/{owner}/{repo}/actions/workflows/eval.yml)

Documentation

Full documentation is available at synapsekit.github.io/synapsekit-docs/docs/evalci/overview


Overview	What EvalCI is and how it works
Quickstart	Set up in 5 minutes
Writing eval cases	How to write `@eval_case` functions
Action reference	All inputs, outputs, and configuration
Examples	RAG, agents, multi-provider workflows

About

EvalCI is built on SynapseKit — a Python library for building LLM applications with 30+ provider integrations and a built-in evaluation framework.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github		.github
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
action.yml		action.yml
entrypoint.py		entrypoint.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EvalCI by SynapseKit

Quickstart

Example eval file

PR Comment

EvalCI Results

Inputs

Outputs

Using outputs in downstream steps

Multiple providers

Badge

Documentation

About

About

Uh oh!

Releases 1

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

	Test	Score	Cost	Latency
✅	test_rag_relevancy	0.850	$0.0050	1200ms
❌	test_rag_faithfulness	0.650	$0.0120	2500ms

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

EvalCI by SynapseKit

Quickstart

Example eval file

PR Comment

EvalCI Results

Inputs

Outputs

Using outputs in downstream steps

Multiple providers

Badge

Documentation

About

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages