official github code for "SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing"
-
Updated
May 26, 2026 - Python
official github code for "SmartPhotoCrafter: Unified Reasoning, Generation and Optimization for Automatic Photographic Image Editing"
Group-relative Trajectory-based Policy Optimization: Increasing Quality and Training Stability
RL training environments with verifiable rewards for coding agents. Works with TRL, Unsloth, verl, OpenRLHF.
An OpenEnv RL environment where an LLM agent plays the buyer and negotiates against an LLM-powered seller over real marketplace listings.
OpenEnv-based RL environment for training LLM agents in medical triage decision-making (ESI index) under partial observability. Uses GRPO (TRL) + Unsloth to optimize policies with multi-objective reward shaping (safety, accuracy, efficiency) and time-aware reasoning.
This repository contains my personal notes and hands-on implementations for fine-tuning and post-training Large Language Models (LLMs).
A reinforcement learning fine-tuned model that generates Linux terminal commands from natural language descriptions. Trained using GRPO (Group Relative Policy Optimization) on a custom terminal task environment inspired by CAMEL-AI's SETA framework.
Add a description, image, and links to the grpo-training topic page so that developers can more easily learn about it.
To associate your repository with the grpo-training topic, visit your repo's landing page and select "manage topics."