A curated list of GUI (Graphical User Interface) compute agents - AI systems that can see, understand, and interact with graphical interfaces like humans do.
This project is maintained by Parni and Ian. Follow Supernal Intelligence for more updates.
Website: supernalintelligence.com
Join our Discord: Supernal Intelligence Discord
For more complete data and the latest information, please visit our website: supernalintelligence.com
GUI compute agents are AI systems designed to interact with graphical user interfaces just like humans do. They can:
- See and understand screen elements
- Click buttons, type text, and drag elements
- Navigate through applications and websites
- Complete complex visual workflows
- Automate GUI-based tasks through natural language instructions
| Name | Developer | Status | Key Features | Environment |
|---|---|---|---|---|
| Ace | General Agents | Upcoming (2025) | Achieved 20× human speed on UI tasks; controls full computer via screen pixels | Desktop, Browser |
| ACT-1 | Adept AI | Released (2022) | Pioneer in digital actions; self-correcting behavior | Desktop, Browser |
| CloudCruise | CloudCruise | Released | Cloud-based GUI automation; enterprise-grade | Cloud, Browser |
| Felluo AI | Felluo | Released | Vision-based GUI automation; supports both browser and desktop interactions | Desktop, Browser |
| Adaptive.AI | Adaptive AI Inc | Released | AI risk management framework; technology strategy consulting | Browser |
| AgentGPT | Reworkd | Released (2023) | User-friendly interface for creating goal-oriented agents | Browser |
| AI Agent Studio | Automation Anywhere | Released (2025) | Handles structured and unstructured data; creates AI agents for enterprise automation | Browser |
| Apple Intelligence Agents | Apple | Upcoming (2025) | Deep OS integration; privacy-focused | Phone, Desktop |
| AskUI Vision Agent | AskUI | Released | Cross-platform functionality without virtual machines | Desktop, Browser, Phone |
| Beam AI | Beam | Released | Agentic Process Automation platform for customer support, onboarding, sales proposal generation | Browser |
| Claude Agent Kit | Anthropic | Upcoming (2024) | Official toolkit for building Claude-powered agents | Browser |
| Claude Computer Use | Anthropic | Released (2024) | Works on desktop apps and browsers; AI model-based approach | Browser, Desktop, Multi-device |
| Devin | Cognition Labs | Upcoming (2025) | Full-stack programming capabilities with browser access | Desktop, Browser |
| Fuyu-Heavy | Adept AI | Released (2024) | Ranked 3rd best vision-action model behind GPT-4V and Gemini Ultra | Desktop, Browser |
| Gemini 1.5 Pro (Tool Use) | Released | Long context, tool orchestration in Workspace | Browser | |
| Google Mariner | Google DeepMind | Unreleased | High WebVoyager benchmark performance | Browser |
| Gumloop | Gumloop | Released (2023) | Visual workflow canvas; 90+ pre-built templates; Chrome extension for web automation | Browser |
| Highlight AI | Embedded Intelligence | Released (2024) | Instant Q&A and automation on desktop; strong privacy focus | Desktop, Browser |
| Hyperbrowser | Hyperbrowser.ai (YC Backed) | Released (2024) | Sub-second browser launch, 10,000+ concurrent browsers, CAPTCHA solving | Browser |
| Lindy | Lindy.ai | Released | Virtual AI assistant for daily business tasks | Browser |
| Manus | Monica AI (China) | Released (2024) | World's first general AI agent; SOTA on GAIA benchmark | Desktop, Browser, Phone |
| MultiOn (now Please AI) | Please AI | Released (2023) | Multi-step web tasks end-to-end; preference learning | Browser |
| OpenAI CUA (Operator) | OpenAI | Released (2025) | High benchmark performance; uses reasoning models tech | Browser, Desktop |
| Perplexity Comet | Perplexity AI | Upcoming (2025) | Autonomous multi-step search with citations | Browser |
| Project Jarvis | Rumored | Computer-using agent system; few details available | Desktop, Browser | |
| Proxy | Convergence AI | Released (2025) | Handles concurrent sub-tasks; cheaper alternative to Operator | Browser |
| Relay | Relay.app | Released (2021) | Clean, simple interface; extensive app integrations | Browser |
| Relevance AI | Relevance AI | Released | Drag-and-drop skill building, templates, integrations | Browser |
| ServiceNow AI Agents | ServiceNow | Released | Built-in governance, analytics, text-to-action capabilities | Browser |
| Vy | Vercept | Released (2025) | Advanced human-computer interaction; works with existing applications | Desktop |
| Name | Developer | License | Key Features | Environment |
|---|---|---|---|---|
| Agent S | Simular AI | Research License | Web research, content summarization, data extraction | Browser, Desktop |
| Agent S2 | Simular AI | Research License | OSWorld: 34.5%; AndroidWorld: 50%; outperforms OpenAI CUA/Operator | Browser, Desktop, Phone |
| AutoGen | Microsoft | MIT | Agents can converse with each other to solve tasks | Browser |
| AutoGPT | Significant Gravitas | MIT | Pioneer in autonomous GPT agents; self-prompting with memory | Browser |
| BabyAGI | Yohei Nakajima | MIT | Autonomous task creation and prioritization | Browser |
| Browser Use | Y Combinator/ETH Zurich | Proprietary | Makes websites more digestible for AI agents | Browser |
| c/ua (Computer-Use Agent) | TryCua | Open Source | High-performance virtualization; fully isolated virtual environments | Desktop, Virtual Machine |
| CogAgent | Tsinghua Univ. & Zhipu | Research License (CC BY-NC) | High-performance open model rivaling closed models | Desktop, Browser |
| CrewAI | CrewAI | Proprietary | Enables orchestration of specialized agents in teams | Browser |
| HyperAgent | FSoft-AI4Code | Apache 2.0 | Handles GitHub issue resolution, repository-level code generation | Browser, Desktop |
| LangGraph | LangChain | MIT | Framework for building stateful, multi-agent systems | Browser |
| LLM Agents | NVIDIA/Meta | Research License | Standardized evaluation for LLM agents | Browser |
| Octo | Google DeepMind | Apache 2.0 | Zero-shot generalization to new objects and tasks | Physical World |
| OpenInterpreter | Open Interpreter | Proprietary | Code interpreter for local execution | Desktop, Browser |
| OWL | Camel-AI | Proprietary | Distributed task automation | Browser |
| RooCode | Open-source | Proprietary | Autonomous coding in VS Code | Browser, Desktop |
| Simular AI | Simular | Research License | SOTA on OSWorld and AndroidWorld benchmarks | Desktop, Browser, Phone |
| Suna | Kortix | Proprietary | Highly versatile generalist agent; handles complex tasks | Browser |
| UI-TARS | ByteDance/TikTok | Research License | Autonomous GUI execution on PC/Mac/Android | Browser, Desktop, Phone |
| Vercel AI SDK Computer Use | Vercel | Open Source | Standardized API for different AI models; streaming capabilities | Browser, Web |
| WebVoyager | Hongliang He et al. | Research License | 59.1% success on 15-website benchmark | Browser |
| Felluo AI | Felluo | Proprietary | Vision-based GUI automation | Browser, Desktop |
| Name | Institution | Focus Area | Release Date |
|---|---|---|---|
| Deep Research Agent | OpenAI | Web browsing, research | 2024 |
| Gato | Google DeepMind | Multi-modal, multi-task, multi-embodiment | 2022 |
| HuggingGPT (Jarvis) | Microsoft | Orchestrates specialists for multi-modal tasks | 2023 |
| I-AFM | Microsoft Research | Multi-modal, multi-task system | 2024 |
| Magma | Microsoft Research | Vision-language-action model | 2025 |
| mlejva's Computer Agent | Vasek Mlejnsky | GUI interaction | 2024 |
| PaLM-E | Google DeepMind & Robotics at Google | Embodied multimodal language model | 2023 |
| RT-2 | Google DeepMind | Vision-language-action model | 2023 |
| SayCan | Grounded language model for robotics | 2022 | |
| SIMA | Google DeepMind | 3D virtual environments | 2024 |
| WebAgent | Google DeepMind | Autonomous web browsing and form-filling | 2024 |
Browser-based agents specialize in navigating and interacting with web interfaces:
| Name | Developer | Status | Development Type |
|---|---|---|---|
| Hyperbrowser | Hyperbrowser.ai (YC Backed) | Released | Commercial |
| Perplexity Comet | Perplexity AI | Upcoming | Commercial |
| Browser Use | Y Combinator/ETH Zurich | Released | Commercial, Open-source |
| CloudCruise | CloudCruise | Released | Commercial |
| Deep Research Agent | OpenAI | Unreleased | Commercial, Research |
| Felluo AI | Felluo | Released | Commercial |
| Google Mariner | Google DeepMind | Unreleased | Commercial, Research |
| Gumloop | Gumloop | Released | Commercial |
| MultiOn (now Please AI) | Please AI | Released | Commercial |
| Proxy | Convergence AI | Released | Commercial |
| Suna | Kortix | Released | Open-source |
| WebVoyager | Hongliang He et al. | Released | Research, Open-source |
Desktop agents interact with operating system GUIs and desktop applications:
| Name | Developer | Status | Development Type |
|---|---|---|---|
| Ace | General Agents | Upcoming | Commercial, Research |
| Claude Computer Use | Anthropic | Released | Commercial |
| Felluo AI | Felluo | Released | Commercial |
| Fuyu-Heavy | Adept AI | Released | Commercial, Research |
| Highlight AI | Embedded Intelligence | Released | Commercial |
| OpenAI CUA (Operator) | OpenAI | Released | Commercial |
| Project Jarvis | Rumored | Commercial, Research | |
| CogAgent | Tsinghua Univ. & Zhipu | Released | Research, Open-source |
| Vy | Vercept | Released | Commercial |
| c/ua (Computer-Use Agent) | TryCua | Released | Open Source |
These agents operate in 3D environments, games, and physical systems:
| Name | Developer | Status | Development Type |
|---|---|---|---|
| Gato | Google DeepMind | Released | Research |
| I-AFM | Microsoft Research | Released | Research |
| Magma | Microsoft Research | Released | Research, Open Source |
| Octo | Google DeepMind | Released | Open Source, Research |
| PaLM-E | Google DeepMind & Robotics at Google | Released | Research |
| RT-2 | Google DeepMind | Released | Research |
| SayCan | Released | Research | |
| SIMA | Google DeepMind | Released | Research |
Cloud-based agents running in remote environments:
| Name | Developer | Status | Development Type |
|---|---|---|---|
| CloudCruise | CloudCruise | Released | Commercial |
Agents that can operate across multiple device types:
| Name | Developer | Status | Supported Devices |
|---|---|---|---|
| Agent S2 | Simular AI | Released | Windows, MacOS, Linux, Android, iOS |
| AskUI Vision Agent | AskUI | Released | Windows, MacOS, Linux, Android, iOS |
| Claude Computer Use | Anthropic | Released | Windows, MacOS, Linux, Multi-device |
| Manus | Monica AI (China) | Released | Windows, MacOS, Linux, Android, iOS |
| Simular AI | Simular | Released | Windows, MacOS, Linux, Android, iOS |
| UI-TARS | ByteDance/TikTok | Released | Windows, MacOS, Linux, Android, iOS |
For a full breakdown of agents by task complexity, including Single Workflow, Multiple Workflow, and Complex Workflow Agents, please visit our website: supernalintelligence.com
- Supernal Intelligence Discord - Join our community to discuss GUI agents, share resources, and connect with others
- X/Twitter: @supernalasi - Follow for updates and news about GUI agents and AI advancements
- Website: supernalintelligence.com - Official website with more resources and information
- Awesome AI Agent Leaderboards - Comprehensive list of leaderboards for AI agents
- Awesome AI Agent Benchmarks - Comprehensive list of benchmarks for AI agents
Contributions welcome! Please read the contribution guidelines first or email i@supernal.ai if you see an error or want to contribute.
This awesome list is maintained by Parni and Ian, and is released under the MIT Open Source License.
