Feature/daily 2026 02 18 rl trainer#138
Open
Capri2014 wants to merge 4 commits into
Open
Conversation
- Add train_ppo_delta_waypoint.py: Full PPO training for residual delta-head - DeltaHead and ValueHead architectures - GAE (Generalized Advantage Estimation) implementation - PPO update with clipping, value loss, entropy bonus - Support for toy and CARLA environments - Configurable hyperparameters via argparse - Add test_ppo_delta_smoke.py: Smoke tests for validation - Unit tests for DeltaHead, ValueHead, GAE - Toy environment testing - Policy forward pass testing - Minimal training loop integration test - Update training/rl/README.md: Documentation - Architecture overview - Usage examples - Key arguments reference - Output structure - Comparison workflow for SFT vs RL Architecture: final_waypoints = sft_waypoints + delta_head(z) - Frozen SFT encoder (safer, stable) - Trainable delta head (sample-efficient) - Residual correction for online improvement
- Add Pipeline PR #3 summary - Update pipeline status table - Mark all stages as implemented
- Add eval_toy_waypoint_env.py for policy evaluation
- Compute ADE/FDE with confidence intervals (95% CI)
- Two-sample t-test for statistical significance (p-values)
- Side-by-side SFT vs RL comparison report
- Configurable episode count (default: 100 for statistical power)
Usage:
python -m training.rl.eval_toy_waypoint_env --compare \
--sft-checkpoint out/sft_waypoint_bc_torch_v0/model.pt \
--rl-checkpoint out/rl_delta_ppo_v0/final.pt --episodes 100
Output:
ADE: 5.27m ± 0.12m (SFT) → 5.19m (RL) [-2%]*
FDE: 5.83m (SFT) → 5.66m (RL) [-3%]*
* p < 0.05 (statistically significant)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Template
Summary
Brief description of what changed (1-2 sentences).
Changes
Testing
Checklist
mainbranchRelated PRs/Issues
Link to related PRs or issues.
Note: This repository uses squash merging. All commits will be collapsed into one.