Amazing repo! Question about the final learning rate

Hi, thanks a lot for the repo. It is of very high quality! I like the paper as well.

I would have two things to share:

1. In the arXiv paper, the final learning rate is indicated as $3 \times 10^{-4}$ in Page $20$, however, the code uses $1 \times 10^{-5}$:
https://github.com/DAVIAN-Robotics/SimbaV2/blob/86899c277cdc697b2b02d827243de1ea93f20a1d/configs/agent/simbaV2.yaml#L18

Which one did you use in the experiments?

By replicating the work, we found that $1 \times 10^{-5}$ was working well.

2. By replicating the code with @timfaust, we noticed that normalizing the observations and rewards in float64 instead of float32 was essential to reach the results you share. We discovered it because we were normalizing those quantities with JAX, which imposes float32 by default, instead, this code uses numpy, which uses float64 by default.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Amazing repo! Question about the final learning rate #8

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Amazing repo! Question about the final learning rate #8

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions