Skip to content

feat: add webshop benchmark and update description#1337

Open
Jensen246 wants to merge 504 commits intomainfrom
webshop
Open

feat: add webshop benchmark and update description#1337
Jensen246 wants to merge 504 commits intomainfrom
webshop

Conversation

@Jensen246
Copy link
Collaborator

@Jensen246 Jensen246 commented Mar 4, 2026

Description

Motivation and Context

How Has This Been Tested?

  • If you are adding a new feature, test on your own test scripts.

Screenshots of Test Results (if appropriate):

  1. Your own tests:

Types of changes

  • Fix bugs
  • Add new feature
  • Update documentation

📚 Documentation preview 📚: https://RDAgent--1337.org.readthedocs.build/en/1337/

Jensen246 and others added 30 commits December 21, 2025 09:33
- Add chemcot dataset to DATASETS registry using new DatasetConfig structure
- Keep CoT quality guidelines from chemcot branch in prompts.yaml
- Migrate chemcot from old dict-based interface to DatasetConfig
- Remove legacy consolidation logic (datasets lib handles this)
couragec and others added 27 commits February 28, 2026 10:59
Fallback to common miniconda paths when conda is not in PATH.
Fixes B200 pod startup failure (conda: command not found).

Made-with: Cursor
No more conda detection logic. Just set TRAINING_PYTHON in .env.
Fallback to conda only if not set.

Made-with: Cursor
start.sh now uses OPENHANDS_PYTHON for main.py execution,
since the parent process may be in a different conda env.

Made-with: Cursor
- Add agents/opencode/ with config.yaml, start.sh, README.md
- Include opencode-rl pipeline code (pipeline/, runner_fsm/, benchmarks/)
- Merge opencode-rl dependencies into autorl_bench requirements.txt
- Remove separate venv requirement, share main environment

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Sync opencode-rl runner_fsm with latest simplifications
- Add smith benchmarks integration
- Update opencompass configs and server with GPU support + error handling
- Document external repo architecture (opencode-rl as independent plugin)
- Add setup instructions for cloning and configuring opencode-rl
- Add architecture diagram showing RD-Agent ↔ opencode-rl interaction
- Document OPENCODE_RL_ROOT for custom paths
- Add smith/ module for dynamic benchmark discovery from rl-smith
- Add PerSampleEvaluator for per-sample scoring via vLLM
- Update utils.py to support script-based data download for smith benchmarks
- Update opencode agent config
- instructions.md: prohibit SFT, require RL (GRPO/PPO) for all benchmarks
- remove agents/opencode/opencode-rl/ (runtime uses external OPENCODE_RL_ROOT)

Made-with: Cursor
openai, httpx, python-dotenv, tenacity are for OpenCode agent's
separate environment. Keep peft and pydantic as shared deps.

Made-with: Cursor
- run.py: replace 2x nested 3-level try/except with shared
  _kill_process_group() using loop + specific exceptions
- server.py: except Exception → except (RuntimeError, ValueError, OSError)
- utils.py: except Exception → except requests.ConnectionError

Made-with: Cursor
Extract from run.py into core/utils.py so other runners
can also use it. Exported via core/__init__.py.

Made-with: Cursor
Use relative paths, forbid cd outside workspace, ignore symlink targets.

Made-with: Cursor
…CLI, remove unsupported args

Made-with: Cursor
Ensures OpenCode-FSM-Runner writes outputs into the workspace prepared
by AutoRL-Bench instead of creating its own runs/ directory.

Made-with: Cursor
Ensures LLM agent bash calls (e.g. python3 -c "from trl import ...")
resolve to the correct training environment, instead of relying on
parent shell conda activation.

Made-with: Cursor
@Jensen246 Jensen246 requested a review from couragec March 4, 2026 07:42
Base automatically changed from rl-posttraining to main March 17, 2026 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants