Skip to content

Picrew/awesome-agent-harness

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Agent Harness

A curated, implementation-first list of agent harness engineering resources, with GitHub projects as the primary focus.

  • Total entries: 159
  • GitHub entries: 135 (84.9%)
  • GitHub in project categories (excluding readings): 131/131 (100.0%)
  • Categories: 9
  • Last verified: 2026-04-22
  • Language: English | 中文

Featured Harness Blogs

Contents

Category Overview

Category Entries
Harness Architecture & Orchestration 20
Context & Working-State Engineering 8
Execution Substrates & Sandboxing 16
Protocols, Tool Interfaces & Agent Contracts 11
Evaluation Harnesses & Benchmarks 20
Observability & Reliability Operations 13
Guardrails, Security & Governance 11
Reference Harness Implementations 32
Essential Readings & Ecosystem Maps 28

Catalog

Notes:

  • Stars are rendered as badges from snapshot values.
  • Repository update dates are tracked in data/projects.yaml and validation reports.
  • Entries are sorted by stars (descending) within each category.

Harness Architecture & Orchestration

Project Link Stars Tags Summary
DeerFlow GitHub star long-horizon, memory, subagents Long-horizon super-agent harness integrating memory, tools, subagents, and sandboxes.
AutoGen GitHub star multi-agent, orchestration, framework Programming framework for agentic AI with multi-agent interaction and orchestration.
Agno GitHub star scale, runtime, management Agent software runtime focused on running and managing agentic systems at scale.
LangGraph GitHub star graph, workflow, runtime Graph-based runtime for resilient stateful agents and deterministic workflow control.
Semantic Kernel GitHub star enterprise, orchestration, plugins Enterprise-grade agentic application framework with orchestration and plugin patterns.
OpenAI Agents SDK (Python) GitHub star sdk, handoff, workflows Lightweight framework for multi-agent workflows, handoffs, and production patterns.
deepagents GitHub star runtime, orchestration, long-running Open-source harness for long-running, tool-using agents with planning and subagent patterns.
Google ADK (Python) GitHub star toolkit, deployment, evaluation Code-first toolkit to build, evaluate, and deploy advanced AI agents.
PydanticAI GitHub star python, typing, schema Type-safe Python framework for agents with strong schema contracts and tooling.
Hive GitHub star harness, orchestration, runtime Outcome-driven agent runtime harness with explicit control loops and orchestration blocks.
Microsoft Agent Framework GitHub star multi-agent, workflows, observability Multi-language framework for building, orchestrating, and deploying AI agents with graph workflows and observability.
VoltAgent GitHub star typescript, platform, runtime TypeScript agent engineering platform built around open runtime abstractions.
mcp-agent GitHub star mcp, runtime, workflow Practical agent framework centered on MCP tool ecosystems and workflow composition.
Yao GitHub star single-binary, runtime, autonomous Single-binary runtime for defining and running autonomous agents.
Cloudflare Agents GitHub star platform, deployment, runtime Platform runtime for building and deploying agents with production infrastructure primitives.
Docker Agent GitHub star docker, runtime, container Agent builder and runtime stack emphasizing container-native execution.
NeMo Agent Toolkit GitHub star multi-agent, optimization, toolkit Open toolkit for connecting and optimizing teams of AI agents.
Scion GitHub star multi-agent, containers, orchestration Experimental multi-agent orchestration testbed that runs isolated agent harnesses in containers, worktrees, and remote runtimes.
deepagentsjs GitHub star typescript, langgraph, subagents TypeScript agent harness with built-in planning, filesystem tools, subagents, and LangGraph-native runtime hooks.
hankweave GitHub star long-horizon, runtime, checkpoints Headless-first long-horizon runtime that orchestrates existing agent harnesses with sentinels, loops, checkpoints, and event journals.

Context & Working-State Engineering

Project Link Stars Tags Summary
everything-claude-code GitHub star context, skills, harness-practices Large open repository of harness practices around memory, skills, and context control for coding agents.
claude-mem GitHub star memory, context, session Plugin-style memory layer that captures session history and reinjects relevant context into future coding runs.
planning-with-files GitHub star planning, skills, persistence Skill package for persistent file-based planning in coding-agent workflows.
Agent Skills for Context Engineering GitHub star skills, context, production Large skill library oriented around context engineering and production agents.
Context-Engineering Handbook GitHub star context-engineering, handbook, practices First-principles handbook focused on practical context engineering for agent systems.
Trellis GitHub star specs, memory, workflow Multi-platform coding-agent workflow framework with task context, project memory, and spec injection.
Awesome Context Engineering GitHub star awesome-list, context, survey Survey-style list for context engineering resources and frameworks.
context-space GitHub star context, infrastructure, mcp Infrastructure project focused on context engineering building blocks and MCP-centric integrations.

Execution Substrates & Sandboxing

Project Link Stars Tags Summary
Daytona GitHub star sandbox, execution, infra Secure and elastic sandbox infrastructure for running AI-generated code with file, Git, LSP, and execution APIs.
CUA GitHub star computer-use, sandbox, infra Infrastructure stack for computer-use agents with sandbox, SDK, and benchmark support.
E2B GitHub star cloud-sandbox, execution, enterprise Secure cloud environments with real tools for production-grade agent execution.
OpenSandbox GitHub star sandbox, security, runtime Secure and extensible sandbox runtime built for agent workloads.
agent-infra sandbox GitHub star all-in-one, browser, shell All-in-one sandbox combining browser, shell, files, MCP, and IDE server.
Judge0 GitHub star code-execution, sandbox, backend Scalable sandboxed code execution system usable as an agent execution backend.
Agent Sandbox GitHub star kubernetes, sandbox, stateful Kubernetes-native sandbox control plane for isolated, stateful agent runtimes with stable identity, persistence, and warm-pool support.
stakpak/agent GitHub star always-on, autonomous, ops Always-on open agent that runs on your machines with autonomous operational loops.
OSS-Fuzz Gen GitHub star fuzzing, security, execution LLM-powered fuzzing workflows integrated with controlled execution contexts.
Tensorlake GitHub star microvm, sandbox, orchestration Serverless runtime for agent sandboxes with MicroVM isolation, snapshots, suspend-resume, and background orchestration.
Arrakis GitHub star sandbox, microvm, snapshots Self-hosted sandbox substrate with MicroVM isolation, snapshot restore, and REST, SDK, and MCP interfaces for agent code execution and computer use.
AgentScope Runtime GitHub star runtime, sandbox, deployment Production runtime for agent apps with secure tool sandboxes, deployment APIs, observability, and state services.
SWE-ReX GitHub star sandbox, execution, coding-agent Sandboxed execution infrastructure for AI coding agents at local and cloud scale.
sandboxed.sh GitHub star self-hosted, isolation, orchestrator Self-hosted orchestrator running coding agents inside isolated Linux workspaces.
Capsule GitHub star wasm, sandbox, task-runtime Durable runtime that coordinates agent tasks inside isolated WebAssembly sandboxes with retries and lifecycle tracking.
terminal-bench-env GitHub star terminal, benchmark-env, sandbox Environment layer for terminal-agent benchmark execution.

Protocols, Tool Interfaces & Agent Contracts

Project Link Stars Tags Summary
GitHub Spec Kit GitHub star spec-driven, workflows, tooling Toolkit for spec-driven development to guide deterministic agent execution.
MCP Servers GitHub star mcp, servers, implementations Official collection of MCP server implementations across tools and domains.
AGENTS.md GitHub star spec, agent-file, instructions Open format for repository-local instructions that coding agents can follow.
Model Context Protocol GitHub star mcp, protocol, interoperability Core specification and docs for MCP-based tool and context interoperability.
directories (rules and MCP indexes) GitHub star directories, mcp, rules Curated directories of agent rules and MCP servers for tool discovery.
LangChain MCP Adapters GitHub star mcp, adapters, integration Adapters connecting LangChain components with MCP servers.
Microsoft MCP Servers GitHub star mcp, enterprise, servers Microsoft's official MCP server catalog for enterprise data and tools.
ACPX GitHub star acp, client, sessions Headless CLI client for stateful Agent Client Protocol sessions.
Microsoft Learn MCP GitHub star mcp, docs, grounding MCP server and CLI for grounding agents with Microsoft documentation sources.
IBM MCP GitHub star mcp, clients, tooling IBM collection of MCP servers, clients, and developer tooling.
AGENT.md GitHub star standard, agent-file, interoperability Standardized machine-readable file format for agentic coding tools.

Evaluation Harnesses & Benchmarks

Project Link Stars Tags Summary
Promptfoo GitHub star eval, red-team, ci Config-driven prompt/agent/RAG testing, comparison, and red-team evaluation tool.
DeepEval GitHub star evaluation, framework, testing LLM evaluation framework supporting agent and workflow quality testing.
RAGAS GitHub star rag, metrics, evaluation Open evaluation toolkit for LLM and RAG quality metrics.
lm-evaluation-harness GitHub star benchmark, harness, llm Popular benchmark harness for consistent LLM evaluation across tasks.
SWE-bench GitHub star benchmark, swe, evaluation Standard benchmark for evaluating issue-fixing software engineering agents.
verifiers GitHub star verifier, rl, evaluation Library for RL environments and verifier-based evaluation loops.
AgentBench GitHub star benchmark, cross-domain, agent Cross-environment benchmark for evaluating LLM agents as tool-using systems.
LangWatch GitHub star simulation, evaluation, testing End-to-end platform for agent simulations, evaluation loops, and production testing.
EvalScope GitHub star benchmark, framework, llm Customizable framework for large-model benchmarking and performance evaluation.
Terminal-Bench GitHub star terminal, benchmark, long-horizon Terminal-native benchmark suite for long-horizon, verification-heavy agent tasks.
Harbor GitHub star evaluation, harness, rl-env Framework for running agent evaluations and constructing RL-style environments.
tau2-bench GitHub star tool-use, interaction, benchmark Tool-agent-user interaction benchmark emphasizing multi-step execution quality.
NeMo Gym GitHub star rl-env, training, evaluation Toolkit for building RL environments suitable for LLM/agent training and eval.
TheAgentCompany GitHub star benchmark, workplace, multi-step Agent benchmark with simulated software-company tasks for evaluating multi-step workplace autonomy.
Inspect Evals GitHub star inspect, eval-suite, reproducibility Evaluation suite collection for Inspect AI workflows.
auto-harness GitHub star optimization, regression, evals Benchmark-gated optimization loop that mines failures, edits agent code, and guards against regressions overnight.
Agent Evaluation GitHub star evaluation, testing, ci AWS framework for testing virtual agents with evaluator-driven multi-turn conversations, hooks, and CI-friendly workflows.
WorkArena GitHub star browser, benchmark, enterprise Browser benchmark for practical enterprise-like knowledge work tasks.
OpenHands Benchmarks GitHub star openhands, eval, harness Evaluation harness and benchmark definitions for OpenHands systems.
WebArena-Verified GitHub star web-agent, benchmark, deterministic Verified web-agent benchmark with deterministic evaluators.

Observability & Reliability Operations

Project Link Stars Tags Summary
MLflow GitHub star platform, monitoring, evaluation Broad AI engineering platform with monitoring and evaluation support for agents.
Langfuse GitHub star llmops, tracing, metrics Open-source LLM engineering platform for traces, metrics, prompts, and evals.
Opik GitHub star monitoring, eval, tracing End-to-end debug/eval/monitoring stack for LLM apps and agent workflows.
RagaAI Catalyst GitHub star agentops, analytics, monitoring Agent observability and monitoring framework with timeline and graph analytics.
TensorZero GitHub star llmops, gateway, optimization Open LLMOps stack unifying gateway, observability, evaluation, and optimization.
Arize Phoenix GitHub star observability, tracing, evaluation Open platform for AI observability, tracing, and evaluation analytics.
OpenLLMetry GitHub star opentelemetry, instrumentation, tracing OpenTelemetry-based instrumentation for GenAI and LLM applications.
Helicone GitHub star monitoring, traffic, production Lightweight platform for monitoring and evaluating LLM traffic in production.
AgentOps SDK GitHub star agentops, monitoring, cost Monitoring and benchmarking SDK for agent workflows with cost and trace tracking.
Latitude GitHub star platform, eval, observability Open-source agent engineering platform with eval and observability capabilities.
Laminar GitHub star observability, tracing, evals Agent-focused observability stack with tracing, evaluation runs, monitoring, and dashboards.
claude-code-reverse GitHub star trace, visualization, debugging Tooling to visualize and inspect Claude Code LLM interaction traces.
OpenInference GitHub star spec, instrumentation, observability Open instrumentation specification and tooling for AI observability.

Guardrails, Security & Governance

Project Link Stars Tags Summary
LiteLLM GitHub star gateway, proxy, guardrails Unified LLM gateway/proxy with cost tracking, load balancing, and guardrails.
Kong GitHub star gateway, policy, infra API and AI gateway infrastructure useful for policy enforcement in agent systems.
Portkey Gateway GitHub star gateway, guardrails, routing AI gateway with routing and guardrails for multi-model production traffic.
CAI (Cybersecurity AI) GitHub star security, governance, framework Security-focused agent framework for offensive/defensive AI workflows.
OpenAI Realtime Agents GitHub star realtime, orchestration, control Advanced agentic realtime patterns with structured control and interaction loops.
Plano GitHub star proxy, safety, data-plane AI-native proxy and data plane with orchestration, safety, and observability.
OpenAI CS Agents Demo GitHub star demo, handoffs, governance Customer-service multi-agent demo highlighting handoffs and guardrail-like control points.
ContextForge GitHub star gateway, governance, observability Registry and proxy layer that unifies MCP, A2A, and REST/gRPC endpoints with centralized governance and observability.
Archestra GitHub star enterprise, guardrails, governance Enterprise AI platform with guardrails, MCP registry, and orchestration services.
Tracecat GitHub star security, automation, policy AI automation platform for security teams with policy and workflow controls.
AgentGateway GitHub star gateway, mcp, proxy Agentic proxy gateway for AI agents and MCP server ecosystems.

Reference Harness Implementations

Project Link Stars Tags Summary
OpenCode GitHub star terminal, coding-agent, subagents Open-source coding agent with built-in plan/build roles, subagents, LSP support, and a client-server runtime.
Claude Code GitHub star terminal, coding-agent, git-workflows Official terminal coding agent that understands codebases and executes editing, debugging, and Git workflows through natural language.
Gemini CLI GitHub star terminal, coding-agent, mcp Open-source terminal agent with built-in tools, MCP support, checkpointing, and sandboxing controls.
Codex CLI GitHub star terminal, coding-agent, local-execution Terminal-native coding agent that runs locally and exposes practical agent workflows for software tasks.
OpenHands GitHub star coding-agent, software-engineering, repo Open-source AI software engineer focused on repo-level coding task execution.
OpenManus GitHub star general-agent, autonomy, workflows Open foundation for broad autonomous agent workflows with coding-heavy use cases.
learn-claude-code GitHub star tutorial, harness, claude-code Hands-on harness tutorial for building Claude Code-like systems from scratch.
aider GitHub star terminal, repo-map, testing Terminal coding assistant with repo mapping, git-aware edits, and built-in lint/test feedback loops.
Claude Code Plugins: Orchestration and Automation GitHub star claude-code, plugins, orchestration Production-ready Claude Code plugin marketplace bundling agents, skills, tools, and multi-agent workflow orchestrators.
CLI-Anything GitHub star cli, tool-use, automation CLI agent system that unifies command-line tool usage in agent loops.
NanoClaw GitHub star containers, claude-sdk, scheduling Container-isolated Claude agent harness with channel routing, scheduled jobs, per-group memory, and small-codebase customization.
Qwen Code GitHub star terminal, coding-agent, cli Terminal-native open-source coding agent tuned for practical dev loops.
SuperClaude Framework GitHub star config, personas, workflow Configuration framework adding commands, personas, and method templates to coding agents.
Devika GitHub star assistant, planning, coding Open-source coding assistant system for planning and implementing development tasks.
SWE-agent GitHub star swe, issue-fixing, tooling Research-grade coding agent that resolves GitHub issues with explicit tooling loops.
Aperant GitHub star coding-agent, parallel, memory Autonomous multi-agent coding framework with parallel execution, isolated workspaces, QA loops, and persistent memory.
Eigent GitHub star desktop, cowork, productivity Open-source desktop cowork agent for autonomous task execution and productivity.
IronClaw GitHub star security, wasm, routines Security-first personal agent harness with WASM sandboxing, routines, tool plugins, and persistent memory.
OpenHarness GitHub star tool-use, memory, multi-agent Open agent harness implementation covering tool use, skills, memory, permissions, and multi-agent coordination.
GitHub Copilot CLI GitHub star terminal, coding-agent, mcp Official terminal coding agent built on GitHub's Copilot harness with MCP extensibility, approval controls, and GitHub-native context.
Superset GitHub star worktrees, desktop, parallel Worktree-based desktop orchestrator for running and reviewing parallel CLI coding agents from one workspace.
Open SWE GitHub star async, coding-agent, swe Asynchronous open-source coding agent focused on software issue workflows.
OSAURUS GitHub star macos, local-first, memory Native macOS harness for autonomous coding agents with persistent memory.
HiClaw GitHub star multi-agent, human-in-the-loop, shared-state Collaborative multi-agent OS with manager-worker coordination, shared state, and human-in-the-loop oversight via Matrix rooms.
mini-swe-agent GitHub star minimal, swe, coding-agent Minimal coding agent implementation with strong benchmark competitiveness.
TinyAGI GitHub star team-orchestration, autonomous, workflows Team-style agent orchestrator for one-person-company style autonomous workflows.
Devon GitHub star pair-programming, coding-agent, autonomous Open-source pair programmer agent with autonomous coding execution patterns.
oh-my-pi GitHub star terminal, lsp, subagents Terminal AI coding agent with edit safety, LSP integration, and subagent support.
Open Claude Cowork GitHub star desktop, ui, orchestration Desktop coding cowork assistant that turns agent orchestration into GUI workflows.
holaOS GitHub star long-horizon, desktop, durable-state Desktop-first long-horizon agent environment with runtime, memory, tools, apps, and durable state.
Amazon Bedrock AgentCore Samples GitHub star aws, runtime, operations Official sample suite for deploying and operating agents with runtime, gateway, memory, observability, evaluation, and policy layers.
mini-coding-agent GitHub star coding-agent, minimal, approvals Minimal coding agent harness illustrating approvals, memory, bounded delegation, and durable transcripts.

Essential Readings & Ecosystem Maps

Project Link Stars Tags Summary
awesome-claude-code GitHub star awesome-list, claude-code, skills Community collection of Claude Code skills, hooks, and orchestrator tooling.
awesome-agentic-patterns GitHub star awesome-list, patterns, design Catalog of reusable agentic design patterns and implementation motifs.
awesome-mcp-servers GitHub star awesome-list, mcp, tools Curated MCP server index for tool interoperability in agent systems.
awesome-harness-engineering GitHub star awesome-list, curation, harness Curated list focused on harness engineering articles, benchmarks, and implementations.
12 Factor Agents Reference - reading, operations, principles Operations-oriented principles for building maintainable production agents.
Agent Frameworks, Runtimes, and Harnesses, oh my! Reference - reading, langchain, architecture Clear decomposition of framework vs runtime vs harness responsibilities.
Building agents with the Claude Agent SDK Reference - reading, claude, sdk Claude blog on production-oriented SDK usage for sessions, tools, and orchestration.
Building Effective AI Agents Reference - reading, anthropic, agents Anthropic's practical guidance on when to use workflows vs. autonomous agents and how to structure them.
Claude Code auto mode Reference - reading, anthropic, permissions Anthropic's write-up on classifier-backed approval delegation for safer high-autonomy coding-agent runs.
Code execution with MCP Reference - reading, anthropic, mcp Anthropic's design notes on controlled code execution via MCP boundaries.
Demystifying Evals for AI Agents Reference - reading, evals, anthropic Methodology for designing robust agent evals in non-deterministic trajectories.
Effective context engineering for AI agents Reference - reading, context, anthropic Guidance on context-window budgeting and working-state management for agents.
Effective harnesses for long-running agents Reference - reading, long-running, anthropic Practical guide to maintaining state, resumability, and reliability over long agent runs.
Evaluating Deep Agents: Our Learnings Reference - reading, langchain, evaluation LangChain's practical lessons on evaluating stateful and long-horizon agents.
Harness design for long-running application development Reference - reading, app-dev, anthropic Follow-up article on improving long-running app generation through harness structure.
Harness Engineering (Martin Fowler) Reference - reading, architecture, fowler Architectural perspective on harness engineering and entropy control.
Harness engineering (OpenAI) Reference - reading, methodology, openai Field report on building reliable agent-first software via harness constraints and verification.
How we built our multi-agent research system Reference - reading, anthropic, multi-agent Anthropic architecture write-up on role separation and coordination in multi-agent systems.
Improving Deep Agents with harness engineering Reference - reading, langchain, harness Evidence that harness improvements alone can move benchmark performance.
Making Claude Code more secure and autonomous with sandboxing Reference - reading, anthropic, sandboxing How Anthropic uses sandbox boundaries to raise agent autonomy without giving up security controls.
Quantifying infrastructure noise in agentic coding evals Reference - reading, anthropic, evaluation Analysis of how infrastructure choices impact coding-agent benchmark outcomes.
Scaling Managed Agents: Decoupling the brain from the hands Reference - reading, anthropic, architecture Anthropic's meta-harness architecture for decoupling session logs, harness loops, and sandboxes in long-horizon agents.
Skill Issue: Harness Engineering for Coding Agents Reference - reading, humanlayer, coding-agents Practical breakdown of why coding-agent quality depends heavily on harness setup.
Testing Agent Skills Systematically with Evals Reference - reading, openai, evals OpenAI Developers guide for turning agent traces into repeatable skill evaluations.
The Anatomy of an Agent Harness Reference - reading, architecture, langchain Conceptual decomposition of agent harness components and their responsibilities.
Unrolling the Codex agent loop Reference - reading, openai, architecture OpenAI engineering deep dive into the Codex harness loop, prompt growth, tool-call replay, and stateless execution tradeoffs.
Writing effective tools for AI agents Reference - reading, anthropic, tools Best practices for tool interface design so agents call tools safely and reliably.
Your Agent Needs a Harness, Not a Framework Reference - reading, inngest, reliability Argument for reliability-first infrastructure around agents instead of framework-only thinking.

Maintenance Notes

  • Source of truth: data/projects.yaml
  • Regenerate README files: python3 scripts/render_readme.py
  • Verify catalog and links: python3 scripts/verify_catalog.py

Citation

@misc{awesome-agent-harness,
  title={Awesome Agent Harness},
  howpublished={\url{https://github.com/Picrew/awesome-agent-harness.git}},
  year={2026}
}

About

An awesome list of Agent Harness engineering resources, including GitHub projects, tools, benchmarks, and practical guides.

Topics

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages