Examples

→ Full docs: https://synapsekit.github.io/synapsekit-docs/docs/evalci/examples

RAG quality gate

- uses: SynapseKit/evalci@v1
  with:
    path: tests/evals
    threshold: "0.80"
  env:
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

# tests/evals/eval_rag.py
from synapsekit import eval_case, RelevancyMetric
from myapp.rag import pipeline, llm

@eval_case(min_score=0.80, max_cost_usd=0.02)
async def eval_rag_relevancy():
    result = await pipeline.ask("What is the return policy?")
    metric = RelevancyMetric(llm=llm)
    score = await metric.score(question="What is the return policy?", answer=result.answer)
    return {"score": score, "cost_usd": result.cost_usd, "latency_ms": result.latency_ms}

Multi-provider

- uses: SynapseKit/evalci@v1
  with:
    extras: "openai,anthropic"
    threshold: "0.75"
  env:
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
    ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

Strict production gate

- uses: SynapseKit/evalci@v1
  with:
    threshold: "0.90"
    synapsekit-version: "1.5.2"
    path: tests/evals
  env:
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

Using outputs

- uses: SynapseKit/evalci@v1
  id: eval
  with:
    path: tests/evals

- name: Notify Slack on pass
  if: success()
  run: |
    curl -s -X POST "${{ secrets.SLACK_WEBHOOK }}" \
      -H "Content-Type: application/json" \
      -d "{\"text\":\"✅ EvalCI: ${{ steps.eval.outputs.passed }}/${{ steps.eval.outputs.total }} passed\"}"

Agent evaluation

# tests/evals/eval_agents.py
from synapsekit import eval_case
from myapp.agents import support_agent

@eval_case(min_score=0.85, max_latency_ms=8000)
async def eval_password_reset():
    response = await support_agent.run("How do I reset my password?")
    keywords = ["reset", "password", "email", "link"]
    score = sum(1 for k in keywords if k in response.content.lower()) / len(keywords)
    return {"score": score, "latency_ms": response.latency_ms}

Local run

pip install synapsekit[openai]
synapsekit test tests/evals/ --threshold 0.80
synapsekit test tests/evals/ --format json
synapsekit test tests/evals/ --tag rag

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Examples

Examples

RAG quality gate

Multi-provider

Strict production gate

Using outputs

Agent evaluation

Local run

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally