Skip to content

mingyin0312/RL4LLM

Repository files navigation

This is my attempt for using RL to improve LLM reasoning. The GRPO training is a modified version of this nice code and the evaluation follows this.

About

RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages