Skip to content

feat(eval): Integrate ResAD with toy waypoint environment for ADE/FDE metrics#134

Open
Capri2014 wants to merge 2 commits into
mainfrom
feature/daily-2026-02-18-b
Open

feat(eval): Integrate ResAD with toy waypoint environment for ADE/FDE metrics#134
Capri2014 wants to merge 2 commits into
mainfrom
feature/daily-2026-02-18-b

Conversation

@Capri2014
Copy link
Copy Markdown
Owner

Pull Request Template

Summary

Brief description of what changed (1-2 sentences).

Changes

  • Code changes
  • Docs changes
  • New files added

Testing

  • Tests pass (if applicable)
  • Manual verification steps
  • Verified no merge conflicts with main

Checklist

  • Based on latest main branch
  • No merge conflicts
  • Commit messages follow convention
  • Documentation updated (if applicable)
  • Related issue linked (if applicable)

Related PRs/Issues

Link to related PRs or issues.


Note: This repository uses squash merging. All commits will be collapsed into one.

- Track policy_entropy in eval metrics for exploration quality
- Save best_entropy.pt when entropy reaches new best
- Record entropy_history.json with episode-wise records
- Enhance train_metrics.json with best_checkpoint section
- Higher entropy = more exploration = better RL generalization

Daily pipeline PR #1 (2026-02-18)
… metrics

- Add ResAD policy wrapper (policy_resad) for toy environment
- Add create_resad_policy() factory function for checkpoint loading
- Update eval_toy_waypoint_env.py with ADE/FDE metrics computation
- Add comparison mode (--policy compare) for SFT vs RL vs ResAD
- Fix ResAD tensor dimension handling for 2D features
- Compute summary statistics with mean/std for all metrics

Usage:
  python -m training.rl.eval_toy_waypoint_env --policy sft --episodes 20
  python -m training.rl.eval_toy_waypoint_env --policy resad --checkpoint resad.pt
  python -m training.rl.eval_toy_waypoint_env --policy compare --episodes 50

Output:
  out/eval/<run_id>/metrics.json with ADE/FDE summary metrics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant