Skip to content

Yaser-123/ambiguity-resolution-openenv

Repository files navigation

title Ambiguity & Constraint-Aware Decision Environment
emoji 🧠
colorFrom blue
colorTo purple
sdk docker
pinned false
license mit

🧠 Ambiguity & Constraint-Aware Decision Environment

License: MIT Docker

Evaluating the robustness of AI decision-making under uncertainty and logical constraints.


🔗 Project Links


📌 Introduction

This environment evaluates whether AI agents can effectively handle ambiguity and adhere to complex logical constraints. In a realistic scheduling scenario, agents must:

  • Detect missing information in user instructions.
  • Navigate constraints such as time conflicts (unavailability) and hard deadlines.
  • Exercise multi-step reasoning to clarify parameters before taking action.

It focuses on decision-making under uncertainty, where guessing leads to penalties and clarification is the optimal path.


🚀 Why This Matters

Most real-world AI failures occur because agents:

  1. Act on incomplete information (e.g., scheduling a meeting without knowing the time).
  2. Ignore logical constraints (e.g., scheduling during a forbidden time slot).
  3. Hallucinate solutions that satisfy part of the prompt while violating hidden boundaries.

This project provides a realistic benchmark to quantify these failure modes.


✨ Key Features

  • Dynamic Ambiguity: Env secrets (times/teams) are randomized per session.
  • Constraint-Aware Reasoning: Handles hard-coded unavailability and temporal deadlines (e.g., "before 3 PM").
  • Multi-Step Interaction: Detailed observation -> reasoning -> action feedback loop.
  • Partial Reward Scoring: Non-binary grading rewards partial correctness while penalizing violations and inefficiency.
  • Deterministic Evaluation: A pure observation-based baseline ensures reproducibility.

🎯 Task Design

The evaluation suite contains tasks of increasing cognitive load:

Difficulty Description
Easy No ambiguity. All parameters are explicit. Tests basic execution logic.
Medium One missing field (Time or Participants). Requires a single clarification step.
Hard Multiple missing fields + constraints. Requires context retention and logical satisfaction.

📈 Baseline Performance

Across the benchmarked tasks, the current baseline achieves an Average Score of ~0.70.

  • Easy: 0.99 (Perfect execution).
  • Medium: ~0.65 (Successful targeted clarification).
  • Hard: ~0.53 (Complexity of multi-step retrieval + constraint satisfaction).

This trend confirms that the environment successfully measures increasing levels of difficulty as ambiguity and constraints rise.


⚙️ How to Run

1. Installation

git clone <repo_url>
pip install -r requirements.txt

2. Run API Server (Docker)

docker build -t ambiguity-env .
docker run -p 7860:7860 ambiguity-env

3. Run Evaluation / Inference

python inference.py

👤 Author

T Mohamed Yaser


"A realistic benchmark for evaluating decision-making under ambiguity and constraints."
Built for OpenEnv Hackathon 🚀

About

A real-world OpenEnv environment for testing AI decision-making under ambiguity using multi-step reasoning.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors