Run these examples from the repository root after installing TraceLens:
uv pip install -e ".[dev]"| Step | File | What It Shows |
|---|---|---|
| 1 | hello_world.py |
The smallest possible local eval: task, adapter, grader, runner. |
| 2 | contract_eval.py |
Generate graders from a behavior contract. |
| 3 | http_agent_eval.py |
Evaluate an agent exposed as an HTTP JSON endpoint. |
| 4 | noise_aware_regression.py |
Compare runs with different infrastructure fingerprints. |
python examples/hello_world.py
tracelens report --results examples/reports/hello_world_report.json --format markdownExpected output:
tracelens hello-world
--------------------
trials run : 9
pass rate : 100%
report json: examples/reports/hello_world_report.json
sample md : examples/reports/hello_world_report.md
Use this file as the template when you want to evaluate a normal Python
function or local agent loop. The generated sample report at
examples/reports/hello_world_report.md shows tasks, trials, pass@k,
pass^k, graders, baseline comparison, regression result, and CI summary.
python examples/http_agent_eval.pyThis starts a local stdlib HTTP server, evaluates it with HTTPAPIAdapter, and
grades the JSON response shape.
python examples/contract_eval.pyThis is the fastest way to encode strict output rules without writing every grader by hand.
python examples/noise_aware_regression.pyThis demonstrates how TraceLens separates agent regressions from small infrastructure-driven differences.
These four examples are intentionally small and dependency-light. They are enough to teach the core framework and support the first public release.
Future examples should focus on scenarios that are documented but not yet represented as runnable scripts:
- LLM-as-judge using a fake or recorded provider.
- Multi-step tool-use transcript review.
- Human calibration against grader output.
- Downstream project CI that installs TraceLens from PyPI.