🧠 Ambiguity & Constraint-Aware Decision Environment

title	Ambiguity & Constraint-Aware Decision Environment
emoji	🧠
colorFrom	blue
colorTo	purple
sdk	docker
pinned	false
license	mit

🧠 Ambiguity & Constraint-Aware Decision Environment

Evaluating the robustness of AI decision-making under uncertainty and logical constraints.

🔗 Project Links

🎯 Live Benchmark API: Hugging Face Space
🖥️ Interactive Demo: Gradio Demo
🐙 Source Code: GitHub Repository

📌 Introduction

This environment evaluates whether AI agents can effectively handle ambiguity and adhere to complex logical constraints. In a realistic scheduling scenario, agents must:

Detect missing information in user instructions.
Navigate constraints such as time conflicts (unavailability) and hard deadlines.
Exercise multi-step reasoning to clarify parameters before taking action.

It focuses on decision-making under uncertainty, where guessing leads to penalties and clarification is the optimal path.

🚀 Why This Matters

Most real-world AI failures occur because agents:

Act on incomplete information (e.g., scheduling a meeting without knowing the time).
Ignore logical constraints (e.g., scheduling during a forbidden time slot).
Hallucinate solutions that satisfy part of the prompt while violating hidden boundaries.

This project provides a realistic benchmark to quantify these failure modes.

✨ Key Features

Dynamic Ambiguity: Env secrets (times/teams) are randomized per session.
Constraint-Aware Reasoning: Handles hard-coded unavailability and temporal deadlines (e.g., "before 3 PM").
Multi-Step Interaction: Detailed observation -> reasoning -> action feedback loop.
Partial Reward Scoring: Non-binary grading rewards partial correctness while penalizing violations and inefficiency.
Deterministic Evaluation: A pure observation-based baseline ensures reproducibility.

🎯 Task Design

The evaluation suite contains tasks of increasing cognitive load:

Difficulty	Description
Easy	No ambiguity. All parameters are explicit. Tests basic execution logic.
Medium	One missing field (Time or Participants). Requires a single clarification step.
Hard	Multiple missing fields + constraints. Requires context retention and logical satisfaction.

📈 Baseline Performance

Across the benchmarked tasks, the current baseline achieves an Average Score of ~0.70.

Easy: 0.99 (Perfect execution).
Medium: ~0.65 (Successful targeted clarification).
Hard: ~0.53 (Complexity of multi-step retrieval + constraint satisfaction).

This trend confirms that the environment successfully measures increasing levels of difficulty as ambiguity and constraints rise.

⚙️ How to Run

1. Installation

git clone <repo_url>
pip install -r requirements.txt

2. Run API Server (Docker)

docker build -t ambiguity-env .
docker run -p 7860:7860 ambiguity-env

3. Run Evaluation / Inference

python inference.py

👤 Author

T Mohamed Yaser

Solo Participant
LinkedIn: mohamedyaser08
Email: 1ammar.yaser@gmail.com

"A realistic benchmark for evaluating decision-making under ambiguity and constraints."
Built for OpenEnv Hackathon 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
env		env
grader		grader
models		models
server		server
skills		skills
tasks		tasks
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
architecture.md		architecture.md
inference.py		inference.py
instruction.md		instruction.md
openenv.yaml		openenv.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
server.py		server.py
skills.md		skills.md
test_inference_format.py		test_inference_format.py
test_log_format.py		test_log_format.py
test_server.py		test_server.py
test_step4.py		test_step4.py
test_whitebox.py		test_whitebox.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Ambiguity & Constraint-Aware Decision Environment

🔗 Project Links

📌 Introduction

🚀 Why This Matters

✨ Key Features

🎯 Task Design

📈 Baseline Performance

⚙️ How to Run

1. Installation

2. Run API Server (Docker)

3. Run Evaluation / Inference

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 Ambiguity & Constraint-Aware Decision Environment

🔗 Project Links

📌 Introduction

🚀 Why This Matters

✨ Key Features

🎯 Task Design

📈 Baseline Performance

⚙️ How to Run

1. Installation

2. Run API Server (Docker)

3. Run Evaluation / Inference

👤 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages