M1: Runner API, canonical artifacts, CLI, and notebook#5
M1: Runner API, canonical artifacts, CLI, and notebook#5guru-code-expert wants to merge 8 commits intoAgentOpt:mainfrom
Conversation
Implements the M1 milestone for Trace-Bench: CLI surface: - trace-bench list-tasks, list-trainers, validate --config --strict, run, ui - Strict validation: trainer kwarg checking, optimizer/guide/logger resolution, trainable parameter detection, matrix expansion with manifest output Runner & training: - BenchRunner with deterministic SHA256-based job IDs - Algorithm-aware kwarg mapping (PrioritySearch vs GEPA-Base/UCB/Beam) - DummyLLM stub mode for offline testing - Training error capture in feedback field Canonical artifact layout: - meta/config.snapshot.yaml, manifest.json, env.json (redacted), git.json - Per-job: job_meta.json, results.json, events.jsonl, artifacts/, tb/ - Run-level: results.csv (16 columns) + summary.json Task coverage: - 4 internal types (code_param, numeric_param, multi_param, non_trainable) - trace_examples:greeting_stub - llm4ad:circle_packing (bounded timeout) - veribench:smoke_placeholder (NotImplementedError stub) Trainer coverage: - PrioritySearch + GEPA-Base exercised in real mode - GEPA-UCB + GEPA-Beam configured (M4 scope) Tests: 30 pass, 2 skipped (m0 smoke, m1 artifacts, matrix e2e, internal tasks, opentrace examples, trainer config, veribench CLI) Notebook: 01_m1_minimal_api.ipynb with Colab badge, auto-detect API key (real/stub mode), 2x2 matrix smoke (4/4 ok), executed outputs committed.
This reverts commit 51622f2.
| return names | ||
|
|
||
|
|
||
| def discover_trainers() -> List[TrainerSpec]: |
There was a problem hiding this comment.
We only have a finite number of trainers. This is a very automated and cool way to automatically grab and sync with the opto-trace package, but it grabbed:
=== List trainers ===
AggregatedUpdate available
BasicSearchAlgorithm available
BeamSearch available
BeamsearchAlgorithm available
BeamsearchHistoryAlgorithm available
GEPA-Base available
GEPA-Beam available
GEPA-UCB available
Minibatch available
MinibatchAlgorithm available
PrioritySearch available
PrioritySearch_with_Regressor available
SearchTemplate available
SequentialSearch available
SequentialUpdate available
StreamingPrioritySearch available
UCBSearchAlgorithm available
SearchTemplate is not a full trainer.
@doxav can we maybe add a list of actual trainers to a __init__.py in the OpenTrace package (under experimental branch)?
There was a problem hiding this comment.
Maybe, we could rather exclude some trainer class because the others are trainers ? We could simply start by an arg to discover_trainers() like excluded = ["SearchTemplate"]. Fixing trainers in a init.py would force to maintain a list and then prevent to automatically discover new trainers.
| "status", | ||
| "score_initial", | ||
| "score_final", | ||
| "score_best", |
There was a problem hiding this comment.
We actually need full list of scores for each iteration. Do we track those somewhere?
time_seconds will be tracked for each step as well
Does tb_logdir work? Can Colab show how I load and display this?
| return getattr(model, "data", model) | ||
|
|
||
|
|
||
| def _evaluate_bundle(bundle: Dict[str, Any]) -> Dict[str, Any]: |
There was a problem hiding this comment.
Can we rename these? bundle is a reserved keyword in Trace meaning an abstraction for custom operation. Can we name this something else?
Implements the M1 milestone for Trace-Bench:
CLI surface:
Runner & training:
Canonical artifact layout:
Task coverage:
Trainer coverage:
Tests: 30 pass, 2 skipped (m0 smoke, m1 artifacts, matrix e2e, internal tasks, opentrace examples, trainer config, veribench CLI)
Notebook: 01_m1_minimal_api.ipynb with Colab badge, auto-detect API key (real/stub mode), 2x2 matrix smoke (4/4 ok), executed outputs committed.