This is my attempt for using RL to improve LLM reasoning. The GRPO training is a modified version of this nice code and the evaluation follows this.
mingyin0312/RL4LLM
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
| Name | Name | Last commit date | ||
|---|---|---|---|---|
This is my attempt for using RL to improve LLM reasoning. The GRPO training is a modified version of this nice code and the evaluation follows this.