Hi, thanks a lot for the repo. It is of very high quality! I like the paper as well.
I would have two things to share:
- In the arXiv paper, the final learning rate is indicated as $3 \times 10^{-4}$ in Page $20$, however, the code uses $1 \times 10^{-5}$:
Which one did you use in the experiments?
By replicating the work, we found that $1 \times 10^{-5}$ was working well.
- By replicating the code with @timfaust, we noticed that normalizing the observations and rewards in float64 instead of float32 was essential to reach the results you share. We discovered it because we were normalizing those quantities with JAX, which imposes float32 by default, instead, this code uses numpy, which uses float64 by default.
Hi, thanks a lot for the repo. It is of very high quality! I like the paper as well.
I would have two things to share:
SimbaV2/configs/agent/simbaV2.yaml
Line 18 in 86899c2
Which one did you use in the experiments?
By replicating the work, we found that$1 \times 10^{-5}$ was working well.