Feature/daily 2026 02 18 rl trainer by Capri2014 · Pull Request #138 · Capri2014/AIResearch

Capri2014 · 2026-02-22T05:51:45Z

Pull Request Template

Summary

Brief description of what changed (1-2 sentences).

Changes

Code changes
Docs changes
New files added

Testing

Tests pass (if applicable)
Manual verification steps
Verified no merge conflicts with main

Checklist

Based on latest main branch
No merge conflicts
Commit messages follow convention
Documentation updated (if applicable)
Related issue linked (if applicable)

Related PRs/Issues

Link to related PRs or issues.

Note: This repository uses squash merging. All commits will be collapsed into one.

- Add train_ppo_delta_waypoint.py: Full PPO training for residual delta-head - DeltaHead and ValueHead architectures - GAE (Generalized Advantage Estimation) implementation - PPO update with clipping, value loss, entropy bonus - Support for toy and CARLA environments - Configurable hyperparameters via argparse - Add test_ppo_delta_smoke.py: Smoke tests for validation - Unit tests for DeltaHead, ValueHead, GAE - Toy environment testing - Policy forward pass testing - Minimal training loop integration test - Update training/rl/README.md: Documentation - Architecture overview - Usage examples - Key arguments reference - Output structure - Comparison workflow for SFT vs RL Architecture: final_waypoints = sft_waypoints + delta_head(z) - Frozen SFT encoder (safer, stable) - Trainable delta head (sample-efficient) - Residual correction for online improvement

- Add Pipeline PR #3 summary - Update pipeline status table - Mark all stages as implemented

- Add eval_toy_waypoint_env.py for policy evaluation - Compute ADE/FDE with confidence intervals (95% CI) - Two-sample t-test for statistical significance (p-values) - Side-by-side SFT vs RL comparison report - Configurable episode count (default: 100 for statistical power) Usage: python -m training.rl.eval_toy_waypoint_env --compare \ --sft-checkpoint out/sft_waypoint_bc_torch_v0/model.pt \ --rl-checkpoint out/rl_delta_ppo_v0/final.pt --episodes 100 Output: ADE: 5.27m ± 0.12m (SFT) → 5.19m (RL) [-2%]* FDE: 5.83m (SFT) → 5.66m (RL) [-3%]* * p < 0.05 (statistically significant)

Capri2014 added 4 commits February 18, 2026 13:34

docs(clawbot): Update status for 2026-02-18

f772a9c

- Add Pipeline PR #3 summary - Update pipeline status table - Mark all stages as implemented

docs: Add PR body for RL evaluation with statistical significance

42091ee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/daily 2026 02 18 rl trainer#138

Feature/daily 2026 02 18 rl trainer#138
Capri2014 wants to merge 4 commits into
mainfrom
feature/daily-2026-02-18-rl-trainer

Capri2014 commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Capri2014 commented Feb 22, 2026

Pull Request Template

Summary

Changes

Testing

Checklist

Related PRs/Issues

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant