Conversation
- Featured / Progressive disclosure: dropped 'benchflow's Harbor #1316 parity answer'. Replaced with the actual selling point — 'No second LLM, no sidecar containers — just an in-process Python callback'. - Audience list: removed 'Existing Harbor users migrating' row. Added 'Multi-turn / multi-agent eval authors' with the same two doc links (use-cases + progressive-disclosure).
Mirror back the cleaned versions from www.benchflow.ai/src/content/docs/ benchflow/ — same fixes already shipped to the public docs site: - docs/concepts.md - Environment row: 'Backed by Harbor — Docker locally...' -> just 'Docker locally, Daytona for cloud.' - Trial lifecycle ASCII art: 'create Harbor env handle' -> 'create sandbox env handle'. - docs/task-authoring.md - 'Harbor copies tests/...' -> 'The BenchFlow runtime copies tests/...'. - docs/progressive-disclosure.md - Drop 'parity answer to Harbor simulated-user proposal #1316' from the intro. Replaced with 'other agent-eval frameworks use a sidecar; benchflow is in-process Python'. - '## Comparison with multi-agent simulated user (Harbor #1316 parity)' -> '## Comparison with multi-agent simulated user'. - Internal link target updated (was /docs/use-cases#1-harbor-1316-... in the mdx version, now relative). - docs/use-cases.md - Intro reframed (not 'researchers migrating from Harbor'). - Section heading '## 1. Interactive User Simulation (Harbor #1316 equivalent)' -> '## 1. Interactive User Simulation'. - 'Why this is better than Harbor #1316' -> 'Why this design'. - 'Why this beats Harbor' -> 'Why this design'. - 'How it works vs Harbor' (services section) -> 'How services run in BenchFlow'. - Removed entire 'Migration from Harbor' section + 'Porting a Harbor task' subsection. - docs/reference/python-api.md - Drop dangling 'Harbor PR #1462 mapping' link in the 0.3 limitations block. $ grep -rci harbor docs/*.md docs/reference/*.md | grep -v ':0$' (empty)
…ow task directory' Single-line update in the Requirements docstring.
Two doc-comment swaps. The version label 'harbor-orig' stays because the lab is an explicit A/B between benchflow and original Harbor — that's the comparison's meaning. - _runner.py: 'task directory (Harbor format)' -> 'task directory (BenchFlow task format)'. - run_matrix.py module docstring: 'three Harbor-format benchmarks' -> 'three BenchFlow-task-format benchmarks'. - run_matrix.py inline comment: 'not a Harbor mount' -> 'not a host bind-mount' (since 'Harbor' here was just shorthand for the filesystem mount style, not the framework).
Six files where 'Harbor format' was used to describe the on-disk task
layout. Renamed to 'BenchFlow task format' / 'BenchFlow-format' since
the layout is benchflow's own task structure now, not a borrowed one.
- .claude/skills/benchflow/references/create-task.md
- .claude/skills/benchflow/tasks/create-simple-task/environment/
benchflow/references/create-task.md
- .claude/skills/benchflow/tasks/benchflow-knowledge/environment/
benchflow/references/create-task.md
- .claude/dev-docs/architecture.md
- src/benchflow/skill_eval.py docstring
- src/benchflow/sdk.py docstring (load_task arg)
No code-behavior change — pure docstring / reference-doc swap.
| ], | ||
| )], | ||
| environment="daytona", | ||
| backend="daytona", |
There was a problem hiding this comment.
🟡 TrialConfig example uses nonexistent backend parameter instead of environment
The PR changes environment="daytona" to backend="daytona" in a TrialConfig(...) code example. However, TrialConfig (src/benchflow/trial.py:172) defines the field as environment: str = "docker", not backend. The backend parameter belongs to Environment.from_task() (src/benchflow/runtime.py:61), not TrialConfig. Users copying this example would get an unexpected-keyword-argument error at runtime.
| backend="daytona", | |
| environment="daytona", |
Was this helpful? React with 👍 or 👎 to provide feedback.
| ```python | ||
| from pathlib import Path | ||
|
|
||
| from benchflow.trial import TrialConfig, Scene, Role, Turn |
There was a problem hiding this comment.
🟡 Removed from pathlib import Path but Path(...) still used in code example
The PR removes from pathlib import Path from the TrialConfig code example block, but Path("tasks/my-task") is still used on lines 38 and 46 within the same code block. Users copying this example would get a NameError: name 'Path' is not defined.
| from benchflow.trial import TrialConfig, Scene, Role, Turn | |
| from pathlib import Path | |
| from benchflow.trial import TrialConfig, Scene, Role, Turn |
Was this helpful? React with 👍 or 👎 to provide feedback.
Mirror of the cleanup we shipped on www.benchflow.ai's mirrored docs.
What changed
harbor-origretained — that's the lab's intentional A/B comparison.Out of scope (intentionally)
harbor==0.3.0dependency.ref/harbor/examples/tasks).