Skip to content

ProfRandom92/Awesome-Gui-Agents

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

Awesome GUI Compute Agents

Supernal Intelligence Logo

License: MIT Stars

Follow on X Bluesky Website

A curated list of GUI (Graphical User Interface) compute agents - AI systems that can see, understand, and interact with graphical interfaces like humans do.

This project is maintained by Parni and Ian. Follow Supernal Intelligence for more updates.

Website: supernalintelligence.com
Join our Discord: Supernal Intelligence Discord

For more complete data and the latest information, please visit our website: supernalintelligence.com

What are GUI Compute Agents?

GUI compute agents are AI systems designed to interact with graphical user interfaces just like humans do. They can:

  • See and understand screen elements
  • Click buttons, type text, and drag elements
  • Navigate through applications and websites
  • Complete complex visual workflows
  • Automate GUI-based tasks through natural language instructions

Contents

Commercial Agents

Name Developer Status Key Features Environment
Ace General Agents Upcoming (2025) Achieved 20× human speed on UI tasks; controls full computer via screen pixels Desktop, Browser
ACT-1 Adept AI Released (2022) Pioneer in digital actions; self-correcting behavior Desktop, Browser
CloudCruise CloudCruise Released Cloud-based GUI automation; enterprise-grade Cloud, Browser
Felluo AI Felluo Released Vision-based GUI automation; supports both browser and desktop interactions Desktop, Browser
Adaptive.AI Adaptive AI Inc Released AI risk management framework; technology strategy consulting Browser
AgentGPT Reworkd Released (2023) User-friendly interface for creating goal-oriented agents Browser
AI Agent Studio Automation Anywhere Released (2025) Handles structured and unstructured data; creates AI agents for enterprise automation Browser
Apple Intelligence Agents Apple Upcoming (2025) Deep OS integration; privacy-focused Phone, Desktop
AskUI Vision Agent AskUI Released Cross-platform functionality without virtual machines Desktop, Browser, Phone
Beam AI Beam Released Agentic Process Automation platform for customer support, onboarding, sales proposal generation Browser
Claude Agent Kit Anthropic Upcoming (2024) Official toolkit for building Claude-powered agents Browser
Claude Computer Use Anthropic Released (2024) Works on desktop apps and browsers; AI model-based approach Browser, Desktop, Multi-device
Devin Cognition Labs Upcoming (2025) Full-stack programming capabilities with browser access Desktop, Browser
Fuyu-Heavy Adept AI Released (2024) Ranked 3rd best vision-action model behind GPT-4V and Gemini Ultra Desktop, Browser
Gemini 1.5 Pro (Tool Use) Google Released Long context, tool orchestration in Workspace Browser
Google Mariner Google DeepMind Unreleased High WebVoyager benchmark performance Browser
Gumloop Gumloop Released (2023) Visual workflow canvas; 90+ pre-built templates; Chrome extension for web automation Browser
Highlight AI Embedded Intelligence Released (2024) Instant Q&A and automation on desktop; strong privacy focus Desktop, Browser
Hyperbrowser Hyperbrowser.ai (YC Backed) Released (2024) Sub-second browser launch, 10,000+ concurrent browsers, CAPTCHA solving Browser
Lindy Lindy.ai Released Virtual AI assistant for daily business tasks Browser
Manus Monica AI (China) Released (2024) World's first general AI agent; SOTA on GAIA benchmark Desktop, Browser, Phone
MultiOn (now Please AI) Please AI Released (2023) Multi-step web tasks end-to-end; preference learning Browser
OpenAI CUA (Operator) OpenAI Released (2025) High benchmark performance; uses reasoning models tech Browser, Desktop
Perplexity Comet Perplexity AI Upcoming (2025) Autonomous multi-step search with citations Browser
Project Jarvis Google Rumored Computer-using agent system; few details available Desktop, Browser
Proxy Convergence AI Released (2025) Handles concurrent sub-tasks; cheaper alternative to Operator Browser
Relay Relay.app Released (2021) Clean, simple interface; extensive app integrations Browser
Relevance AI Relevance AI Released Drag-and-drop skill building, templates, integrations Browser
ServiceNow AI Agents ServiceNow Released Built-in governance, analytics, text-to-action capabilities Browser
Vy Vercept Released (2025) Advanced human-computer interaction; works with existing applications Desktop

Open Source Agents

Name Developer License Key Features Environment
Agent S Simular AI Research License Web research, content summarization, data extraction Browser, Desktop
Agent S2 Simular AI Research License OSWorld: 34.5%; AndroidWorld: 50%; outperforms OpenAI CUA/Operator Browser, Desktop, Phone
AutoGen Microsoft MIT Agents can converse with each other to solve tasks Browser
AutoGPT Significant Gravitas MIT Pioneer in autonomous GPT agents; self-prompting with memory Browser
BabyAGI Yohei Nakajima MIT Autonomous task creation and prioritization Browser
Browser Use Y Combinator/ETH Zurich Proprietary Makes websites more digestible for AI agents Browser
c/ua (Computer-Use Agent) TryCua Open Source High-performance virtualization; fully isolated virtual environments Desktop, Virtual Machine
CogAgent Tsinghua Univ. & Zhipu Research License (CC BY-NC) High-performance open model rivaling closed models Desktop, Browser
CrewAI CrewAI Proprietary Enables orchestration of specialized agents in teams Browser
HyperAgent FSoft-AI4Code Apache 2.0 Handles GitHub issue resolution, repository-level code generation Browser, Desktop
LangGraph LangChain MIT Framework for building stateful, multi-agent systems Browser
LLM Agents NVIDIA/Meta Research License Standardized evaluation for LLM agents Browser
Octo Google DeepMind Apache 2.0 Zero-shot generalization to new objects and tasks Physical World
OpenInterpreter Open Interpreter Proprietary Code interpreter for local execution Desktop, Browser
OWL Camel-AI Proprietary Distributed task automation Browser
RooCode Open-source Proprietary Autonomous coding in VS Code Browser, Desktop
Simular AI Simular Research License SOTA on OSWorld and AndroidWorld benchmarks Desktop, Browser, Phone
Suna Kortix Proprietary Highly versatile generalist agent; handles complex tasks Browser
UI-TARS ByteDance/TikTok Research License Autonomous GUI execution on PC/Mac/Android Browser, Desktop, Phone
Vercel AI SDK Computer Use Vercel Open Source Standardized API for different AI models; streaming capabilities Browser, Web
WebVoyager Hongliang He et al. Research License 59.1% success on 15-website benchmark Browser
Felluo AI Felluo Proprietary Vision-based GUI automation Browser, Desktop

Research Projects

Name Institution Focus Area Release Date
Deep Research Agent OpenAI Web browsing, research 2024
Gato Google DeepMind Multi-modal, multi-task, multi-embodiment 2022
HuggingGPT (Jarvis) Microsoft Orchestrates specialists for multi-modal tasks 2023
I-AFM Microsoft Research Multi-modal, multi-task system 2024
Magma Microsoft Research Vision-language-action model 2025
mlejva's Computer Agent Vasek Mlejnsky GUI interaction 2024
PaLM-E Google DeepMind & Robotics at Google Embodied multimodal language model 2023
RT-2 Google DeepMind Vision-language-action model 2023
SayCan Google Grounded language model for robotics 2022
SIMA Google DeepMind 3D virtual environments 2024
WebAgent Google DeepMind Autonomous web browsing and form-filling 2024

By Environment

Browser-Based Agents

Browser-based agents specialize in navigating and interacting with web interfaces:

Name Developer Status Development Type
Hyperbrowser Hyperbrowser.ai (YC Backed) Released Commercial
Perplexity Comet Perplexity AI Upcoming Commercial
Browser Use Y Combinator/ETH Zurich Released Commercial, Open-source
CloudCruise CloudCruise Released Commercial
Deep Research Agent OpenAI Unreleased Commercial, Research
Felluo AI Felluo Released Commercial
Google Mariner Google DeepMind Unreleased Commercial, Research
Gumloop Gumloop Released Commercial
MultiOn (now Please AI) Please AI Released Commercial
Proxy Convergence AI Released Commercial
Suna Kortix Released Open-source
WebVoyager Hongliang He et al. Released Research, Open-source

Desktop Agents

Desktop agents interact with operating system GUIs and desktop applications:

Name Developer Status Development Type
Ace General Agents Upcoming Commercial, Research
Claude Computer Use Anthropic Released Commercial
Felluo AI Felluo Released Commercial
Fuyu-Heavy Adept AI Released Commercial, Research
Highlight AI Embedded Intelligence Released Commercial
OpenAI CUA (Operator) OpenAI Released Commercial
Project Jarvis Google Rumored Commercial, Research
CogAgent Tsinghua Univ. & Zhipu Released Research, Open-source
Vy Vercept Released Commercial
c/ua (Computer-Use Agent) TryCua Released Open Source

Physical World Agents

These agents operate in 3D environments, games, and physical systems:

Name Developer Status Development Type
Gato Google DeepMind Released Research
I-AFM Microsoft Research Released Research
Magma Microsoft Research Released Research, Open Source
Octo Google DeepMind Released Open Source, Research
PaLM-E Google DeepMind & Robotics at Google Released Research
RT-2 Google DeepMind Released Research
SayCan Google Released Research
SIMA Google DeepMind Released Research

Cloud Agents

Cloud-based agents running in remote environments:

Name Developer Status Development Type
CloudCruise CloudCruise Released Commercial

Multi-Device Agents

Agents that can operate across multiple device types:

Name Developer Status Supported Devices
Agent S2 Simular AI Released Windows, MacOS, Linux, Android, iOS
AskUI Vision Agent AskUI Released Windows, MacOS, Linux, Android, iOS
Claude Computer Use Anthropic Released Windows, MacOS, Linux, Multi-device
Manus Monica AI (China) Released Windows, MacOS, Linux, Android, iOS
Simular AI Simular Released Windows, MacOS, Linux, Android, iOS
UI-TARS ByteDance/TikTok Released Windows, MacOS, Linux, Android, iOS

By Task Complexity

For a full breakdown of agents by task complexity, including Single Workflow, Multiple Workflow, and Complex Workflow Agents, please visit our website: supernalintelligence.com

Resources

Communities

Related Awesome Lists

Contribution

Contributions welcome! Please read the contribution guidelines first or email i@supernal.ai if you see an error or want to contribute.

License

This awesome list is maintained by Parni and Ian, and is released under the MIT Open Source License.

About

Awesome list of GUI agents (browser and computer use)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors