Summary
We need a repeatable way to group failed conversations into prompt, workflow, and infrastructure buckets before proposing fixes. OpenHands/agent-analysis is the right place for that clustering and reporting logic.
Target repo
Dependencies
- Merge after critic-scored failure datasets are available.
Scope
Add clustering and reporting code that turns failed conversations into named failure classes and emits actionable recommendations. Do not edit OpenHands prompt files in this issue.
Files to update
README.md
analysis/__main__.py
analysis/performance_gap.py
analysis/usage.py
- add a new module under
analysis/ for failure clustering
- add tests under
tests/
Acceptance criteria
- Failed conversations are grouped into named failure classes.
- The output separates prompt, workflow, and infrastructure failures.
- Each cluster includes representative conversation IDs and suggested target files.
- The report is good enough to drive follow-up issues in
OpenHands/OpenHands or other public repos.
References
OpenHands/OpenHands/openhands/agenthub/codeact_agent/prompts/system_prompt.j2
OpenHands/OpenHands/openhands/utils/prompt.py
This issue was drafted by an AI assistant (OpenHands) on behalf of the user.
Summary
We need a repeatable way to group failed conversations into prompt, workflow, and infrastructure buckets before proposing fixes.
OpenHands/agent-analysisis the right place for that clustering and reporting logic.Target repo
OpenHands/agent-analysisDependencies
Scope
Add clustering and reporting code that turns failed conversations into named failure classes and emits actionable recommendations. Do not edit OpenHands prompt files in this issue.
Files to update
README.mdanalysis/__main__.pyanalysis/performance_gap.pyanalysis/usage.pyanalysis/for failure clusteringtests/Acceptance criteria
OpenHands/OpenHandsor other public repos.References
OpenHands/OpenHands/openhands/agenthub/codeact_agent/prompts/system_prompt.j2OpenHands/OpenHands/openhands/utils/prompt.pyThis issue was drafted by an AI assistant (OpenHands) on behalf of the user.