migrate mirrored docs by xdotli · Pull Request #228 · benchflow-ai/benchflow

xdotli · 2026-05-01T21:29:04Z

Mirror of the cleanup we shipped on www.benchflow.ai's mirrored docs.

What changed

File	Change
README.md	Featured / Audience: drop 'Harbor #1316 parity answer' framing; Audience row 'Existing Harbor users migrating' replaced with 'Multi-turn / multi-agent eval authors'.
docs/concepts.md	Environment row drops 'Backed by Harbor — '; lifecycle ASCII 'create Harbor env handle' -> 'create sandbox env handle'.
docs/task-authoring.md	'Harbor copies tests/' -> 'The BenchFlow runtime copies tests/'.
docs/progressive-disclosure.md	Drop 'parity answer to Harbor #1316' from intro + comparison table.
docs/use-cases.md	Drop migrating-from-Harbor framing. Section heading 'Interactive User Simulation (Harbor #1316 equivalent)' -> 'Interactive User Simulation'. 'Why this is better than Harbor #1316' / 'Why this beats Harbor' / 'How it works vs Harbor' -> 'Why this design' / 'How services run in BenchFlow'. Removed entire 'Migration from Harbor' section.
docs/reference/python-api.md	Drop dangling 'Harbor PR #1462 mapping' link.
docs/examples/coder-reviewer-demo.py	'Harbor-format task directory' -> 'BenchFlow task directory'.
labs/reward-hack-matrix/_runner.py	'task directory (Harbor format)' -> '(BenchFlow task format)'. Version label `harbor-orig` retained — that's the lab's intentional A/B comparison.
labs/reward-hack-matrix/run_matrix.py	Module docstring + inline comment Harbor framing dropped.

Out of scope (intentionally)

What	Why
pyproject.toml `harbor==0.3.0` dependency	Real code dependency. Removing breaks imports. Separate engineering decision.
CHANGELOG.md historical entries	Don't rewrite history.
tests/test_task_download.py / tests/conftest.py	Real test paths/URLs (`.ref/harbor/examples/tasks`).
labs/reward-hack-matrix/_worker.py SDK comment	Talks about subprocess re-import overhead — Harbor is one of the SDKs. Accurate.
docs/examples/scene-patterns.ipynb / swebench_pro_progressive_disclosure.ipynb	Notebooks — separate review.

$ grep -rci 'Harbor framework\|Harbor format\|Harbor-format' README.md docs labs --include='*.md' --include='*.py' | grep -v ':0$'
(empty)

- Featured / Progressive disclosure: dropped 'benchflow's Harbor #1316 parity answer'. Replaced with the actual selling point — 'No second LLM, no sidecar containers — just an in-process Python callback'. - Audience list: removed 'Existing Harbor users migrating' row. Added 'Multi-turn / multi-agent eval authors' with the same two doc links (use-cases + progressive-disclosure).

Mirror back the cleaned versions from www.benchflow.ai/src/content/docs/ benchflow/ — same fixes already shipped to the public docs site: - docs/concepts.md - Environment row: 'Backed by Harbor — Docker locally...' -> just 'Docker locally, Daytona for cloud.' - Trial lifecycle ASCII art: 'create Harbor env handle' -> 'create sandbox env handle'. - docs/task-authoring.md - 'Harbor copies tests/...' -> 'The BenchFlow runtime copies tests/...'. - docs/progressive-disclosure.md - Drop 'parity answer to Harbor simulated-user proposal #1316' from the intro. Replaced with 'other agent-eval frameworks use a sidecar; benchflow is in-process Python'. - '## Comparison with multi-agent simulated user (Harbor #1316 parity)' -> '## Comparison with multi-agent simulated user'. - Internal link target updated (was /docs/use-cases#1-harbor-1316-... in the mdx version, now relative). - docs/use-cases.md - Intro reframed (not 'researchers migrating from Harbor'). - Section heading '## 1. Interactive User Simulation (Harbor #1316 equivalent)' -> '## 1. Interactive User Simulation'. - 'Why this is better than Harbor #1316' -> 'Why this design'. - 'Why this beats Harbor' -> 'Why this design'. - 'How it works vs Harbor' (services section) -> 'How services run in BenchFlow'. - Removed entire 'Migration from Harbor' section + 'Porting a Harbor task' subsection. - docs/reference/python-api.md - Drop dangling 'Harbor PR #1462 mapping' link in the 0.3 limitations block. $ grep -rci harbor docs/*.md docs/reference/*.md | grep -v ':0$' (empty)

…ow task directory' Single-line update in the Requirements docstring.

Two doc-comment swaps. The version label 'harbor-orig' stays because the lab is an explicit A/B between benchflow and original Harbor — that's the comparison's meaning. - _runner.py: 'task directory (Harbor format)' -> 'task directory (BenchFlow task format)'. - run_matrix.py module docstring: 'three Harbor-format benchmarks' -> 'three BenchFlow-task-format benchmarks'. - run_matrix.py inline comment: 'not a Harbor mount' -> 'not a host bind-mount' (since 'Harbor' here was just shorthand for the filesystem mount style, not the framework).

…ment

Six files where 'Harbor format' was used to describe the on-disk task layout. Renamed to 'BenchFlow task format' / 'BenchFlow-format' since the layout is benchflow's own task structure now, not a borrowed one. - .claude/skills/benchflow/references/create-task.md - .claude/skills/benchflow/tasks/create-simple-task/environment/ benchflow/references/create-task.md - .claude/skills/benchflow/tasks/benchflow-knowledge/environment/ benchflow/references/create-task.md - .claude/dev-docs/architecture.md - src/benchflow/skill_eval.py docstring - src/benchflow/sdk.py docstring (load_task arg) No code-behavior change — pure docstring / reference-doc swap.

devin-ai-integration

Devin Review found 2 potential issues.

View 3 additional findings in Devin Review.

devin-ai-integration · 2026-05-01T21:32:35Z

        ],
    )],
-    environment="daytona",
+    backend="daytona",


🟡 TrialConfig example uses nonexistent backend parameter instead of environment

The PR changes environment="daytona" to backend="daytona" in a TrialConfig(...) code example. However, TrialConfig (src/benchflow/trial.py:172) defines the field as environment: str = "docker", not backend. The backend parameter belongs to Environment.from_task() (src/benchflow/runtime.py:61), not TrialConfig. Users copying this example would get an unexpected-keyword-argument error at runtime.

Suggested change

backend="daytona",

environment="daytona",

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-05-01T21:32:36Z

 ```python
-from pathlib import Path
-
 from benchflow.trial import TrialConfig, Scene, Role, Turn


🟡 Removed from pathlib import Path but Path(...) still used in code example

The PR removes from pathlib import Path from the TrialConfig code example block, but Path("tasks/my-task") is still used on lines 38 and 46 within the same code block. Users copying this example would get a NameError: name 'Path' is not defined.

Suggested change

from benchflow.trial import TrialConfig, Scene, Role, Turn

from pathlib import Path

from benchflow.trial import TrialConfig, Scene, Role, Turn

Was this helpful? React with 👍 or 👎 to provide feedback.

xdotli added 6 commits May 1, 2026 17:26

docs(coder-reviewer-demo): 'Harbor-format task directory' -> 'BenchFl…

d9dbb15

…ow task directory' Single-line update in the Requirements docstring.

labs(run_matrix): drop Harbor refs from module docstring + inline com…

bb6aa43

…ment

devin-ai-integration Bot reviewed May 1, 2026

View reviewed changes

xdotli changed the title ~~Drop Harbor framework framing~~ migrate mirrored docs May 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

migrate mirrored docs#228

migrate mirrored docs#228
xdotli wants to merge 6 commits intomainfrom
cleanup/remove-harbor

xdotli commented May 1, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot May 1, 2026

Uh oh!

devin-ai-integration Bot May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xdotli commented May 1, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Out of scope (intentionally)

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xdotli commented May 1, 2026 •

edited by devin-ai-integration Bot

Loading