An OpenEnv benchmark testing the ability of AI agents to act as Site Reliability Engineers (SREs) by diagnosing and filtering raw production failure logs.
-
Updated
Apr 8, 2026 - Python
An OpenEnv benchmark testing the ability of AI agents to act as Site Reliability Engineers (SREs) by diagnosing and filtering raw production failure logs.
A production-grade OpenEnv environment for benchmarking RL agents on real-world data cleaning and schema engineering tasks.
An RL environment where an LLM agent learns to curate talking-head video clips for AV LoRA training. No labels exposed, rewards only.
a reinforcement learning agent built with OpenEnv and Stable-Baselines3 that learns to intelligently manage email workflows. The agent handles tasks ranging from spam filtering to drafting meeting invitations and resolving ambiguous client requests.
OpenEnv code review environment for AI agents.
ShopOps Env is a realistic OpenEnv environment that simulates daily operations of an e-commerce support and operations team. In this environment, an AI agent acts as an operations associate responsible for handling a stream of customer cases such as refund requests, delivery issues, wrong item complaints, and fraud signals. Each episode represent
Trading OpenEnv, a comprehensive reinforcement learning environment and training pipeline designed to teach Large Language Models (LLMs) how to trade stocks autonomously.
NeoVentEnv: An OpenEnv neonatal ventilator management simulator for training and evaluating RL/LLM agents on realistic NICU tasks.
Deterministic reinforcement learning environment for simulating open-source issue triage workflows
A real-world OpenEnv environment for testing AI decision-making under ambiguity using multi-step reasoning.
VeritasOps is a real-world OpenEnv benchmark for training and evaluating AI agents on misinformation moderation, claim verification, spread control, and content safety decision-making.
A production-oriented OpenEnv-style environment for evaluating tool-using agents on customer support ticket triage.
personal_finance_env is an OpenEnv simulation for training AI agents in dynamic, multi-step financial decision-making.
A cooperative multi-agent RL environment for adaptive traffic signal control across a 3x3 grid of intersections. Perfect for testing multi-agent coordination policies and demonstrating environment usage patterns.
A complete, runnable ITSM benchmark environment with 181 deterministic tasks, graded scoring, dense rewards, and a standard API plus baseline runner for structured execution.
An OpenEnv RL environment where LLM agents clean malformed JSON to match a target schema, with four difficulty levels and deterministic scoring.
An OpenEnv-compliant RL environment that simulates candidate pipeline triage, where an agent reviews synthetic developer profiles across GitHub, LeetCode, Kaggle, and resume signals to make shortlisting decisions under a step budget. Built for the OpenEnv Hackathon by Meta x Hugging Face.
Data Cleaning Agent for Cleaning Unorganised Dataset
OpenEnv-compliant RL environment for SQL query debugging. Built for META x PyTorch x SST OpenEnv Hackathon.
Real-world OpenEnv environment for ad-campaign budget pacing and customer-value-aware bidding. An agent decides how aggressively to bid across 48 half-hour auction windows in a simulated campaign day while balancing conversions, ROAS, pacing, and whale-customer capture.
Add a description, image, and links to the openenv-environment topic page so that developers can more easily learn about it.
To associate your repository with the openenv-environment topic, visit your repo's landing page and select "manage topics."