Common issues and solutions for AgentEval.
The --agent flag expects module:function format:
# Wrong
agenteval run --suite suite.yaml --agent my_agent
# Correct
agenteval run --suite suite.yaml --agent my_module:run_agentEnsure the module is importable from your current directory:
# Your agent file must be in the current directory or on PYTHONPATH
ls my_module.py # Should exist
# Or install your package
pip install -e .Check that the function name after : matches an exported function in the module.
Install the distributed extra:
pip install agentevalkit[distributed]Install the stats extra for Welch's t-test:
pip install agentevalkit[stats]
# or: pip install scipyAgentEval falls back to a pure-Python implementation if scipy is unavailable.
Install the appropriate extra:
pip install agentevalkit[langchain] # LangChain adapter
pip install agentevalkit[crewai] # CrewAI adapter
pip install agentevalkit[autogen] # AutoGen adapterThe llm-judge grader requires an OpenAI API key (or compatible API):
export OPENAI_API_KEY=sk-...You can also configure a custom API base in the grader config:
grader: llm-judge
grader_config:
model: gpt-4o-mini
api_base: https://your-api.com/v1The compare command accepts two formats:
# Two single runs
agenteval compare RUN_ID_A RUN_ID_B
# Two groups (comma-separated, with 'vs')
agenteval compare RUN_A1,RUN_A2 vs RUN_B1,RUN_B2Run IDs are the short hex IDs shown by agenteval list.
Check available runs with:
agenteval list --limit 20Ensure the path is correct:
agenteval run --suite ./suites/my_suite.yamlCheck your YAML syntax. Common issues:
- Missing
namefield - Missing
caseslist - Incorrect indentation
- Using tabs instead of spaces
Minimal valid suite:
name: my-tests
agent: my_module:my_fn
cases:
- name: test-1
input: "Hello"
expected:
output_contains: ["hello"]
grader: containsCheck that the directory exists and is writable:
# Default location
ls -la agenteval.db
# Custom location
agenteval run --suite suite.yaml --db /path/to/results.dbDelete and re-run evaluations:
rm agenteval.db
agenteval run --suite suite.yamlEnsure workers are running and connected to the same Redis instance:
# Check Redis connectivity
redis-cli -u redis://localhost:6379 ping
# Should return: PONG
# Start a worker
agenteval worker --broker redis://localhost:6379 --agent my_module:my_fnUse an authenticated URL:
agenteval run --suite suite.yaml --workers redis://:password@host:6379For production, use TLS-encrypted connections:
# Use rediss:// scheme for TLS
agenteval worker --broker rediss://:password@host:6380
# With custom CA certificate
export REDIS_CA_CERT=/path/to/ca.pem0— All cases passed1— One or more cases failed (or regressions detected with--fail-on-regression)
Check your token permissions:
export GITHUB_TOKEN=ghp_... # Needs 'pull_requests: write' permission
agenteval github-comment --run-id RUN_ID --repo owner/repo --pr 123See docs/github-actions.md for full CI setup.