Add compute_episodic_return_on_last_step by QuantuMope · Pull Request #1830 · HorizonRobotics/alf

QuantuMope · 2026-03-27T17:07:34Z

Currently, MC returns only get computed when encountering a discount of 0. This makes it so that MC returns cannot be computed if we do not terminate on success. This PR adds a flag compute_episodic_return_on_last_step to instead compute returns upon step_type.LAST instead.

emailweixu · 2026-03-27T20:54:02Z

Currently, MC returns only get computed when encountering a discount of 0. This makes it so that MC returns cannot be computed if we do not terminate on success. This PR adds a flag compute_episodic_return_on_last_step to instead compute returns upon step_type.LAST instead.

Why don't we set discount to 0 for LAST step?

Haichao-Zhang · 2026-03-27T21:00:47Z

This makes it so that MC returns cannot be computed if we do not terminate on success

keep adding more flags might not be ideal and might be a bit confusing to users?
This makes it so that MC returns cannot be computed if we do not terminate on success --> not sure what kind of use case you are considering, but typically, upon task success, we set step type as LAST + discount as 0?

QuantuMope · 2026-03-27T21:01:20Z

Currently, MC returns only get computed when encountering a discount of 0. This makes it so that MC returns cannot be computed if we do not terminate on success. This PR adds a flag compute_episodic_return_on_last_step to instead compute returns upon step_type.LAST instead.

Why don't we set discount to 0 for LAST step?

This is for RL setups where we don't terminate on success and only terminate upon timeout.
For EmbodiedGen resets, batch resetting all environments at the same step is often more efficient and sometimes the only possible strategy (e.g., if lighting randomization is turned on, cannot reset individual envs on GPU).

QuantuMope · 2026-03-27T21:02:58Z

This makes it so that MC returns cannot be computed if we do not terminate on success

keep adding more flags might not be ideal and might be a bit confusing to users?

This makes it so that MC returns cannot be computed if we do not terminate on success --> not sure what kind of use case you are considering, but typically, upon task success, we set step type as LAST + discount as 0?

Would making this the default behavior be better?
Most of our OpenVLAPPO, PPOFlow, SACFlow RL training uses timeout termination only. This would model an infinite horizon RL formulation.

emailweixu · 2026-03-28T00:18:09Z

Why can't we set discount to 0 at the time of setting step type to LAST?

I think in practice we could, but it would be less correct to do so? We're essentially modeling an infinite time horizon problem. Applying a discount of 0 at time out when there were numerous success idle steps prior may cause confusion to the model?

add compute_episodic_return_on_last_step

9ffa313

QuantuMope requested review from Haichao-Zhang and emailweixu March 27, 2026 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add compute_episodic_return_on_last_step#1830

Add compute_episodic_return_on_last_step#1830
QuantuMope wants to merge 1 commit intopytorchfrom
PR/andrew/compute_episodic_return_on_last_step

QuantuMope commented Mar 27, 2026

Uh oh!

emailweixu commented Mar 27, 2026

Uh oh!

Haichao-Zhang commented Mar 27, 2026

Uh oh!

QuantuMope commented Mar 27, 2026

Uh oh!

QuantuMope commented Mar 27, 2026 •

edited

Loading

Uh oh!

emailweixu commented Mar 28, 2026 •

edited by QuantuMope

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

QuantuMope commented Mar 27, 2026

Uh oh!

emailweixu commented Mar 27, 2026

Uh oh!

Haichao-Zhang commented Mar 27, 2026

Uh oh!

QuantuMope commented Mar 27, 2026

Uh oh!

QuantuMope commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emailweixu commented Mar 28, 2026 • edited by QuantuMope Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

QuantuMope commented Mar 27, 2026 •

edited

Loading

emailweixu commented Mar 28, 2026 •

edited by QuantuMope

Loading