A curated, implementation-first list of agent harness engineering resources, with GitHub projects as the primary focus.
- Total entries: 159
- GitHub entries: 135 (84.9%)
- GitHub in project categories (excluding readings): 131/131 (100.0%)
- Categories: 9
- Last verified: 2026-04-22
- Language: English | 中文
- Scaling Managed Agents: Decoupling the brain from the hands: Anthropic's meta-harness architecture for decoupling session logs, harness loops, and sandboxes in long-horizon agents.
- Claude Code auto mode: Anthropic's write-up on classifier-backed approval delegation for safer high-autonomy coding-agent runs.
- Harness engineering (OpenAI): Field report on building reliable agent-first software via harness constraints and verification.
- Building Effective AI Agents: Anthropic's practical guidance on when to use workflows vs. autonomous agents and how to structure them.
- Writing effective tools for AI agents: Best practices for tool interface design so agents call tools safely and reliably.
- Effective harnesses for long-running agents: Practical guide to maintaining state, resumability, and reliability over long agent runs.
- Harness design for long-running application development: Follow-up article on improving long-running app generation through harness structure.
- Improving Deep Agents with harness engineering: Evidence that harness improvements alone can move benchmark performance.
- Evaluating Deep Agents: Our Learnings: LangChain's practical lessons on evaluating stateful and long-horizon agents.
- Your Agent Needs a Harness, Not a Framework: Argument for reliability-first infrastructure around agents instead of framework-only thinking.
- Category Overview
- Featured Harness Blogs
- Catalog
- Harness Architecture & Orchestration
- Context & Working-State Engineering
- Execution Substrates & Sandboxing
- Protocols, Tool Interfaces & Agent Contracts
- Evaluation Harnesses & Benchmarks
- Observability & Reliability Operations
- Guardrails, Security & Governance
- Reference Harness Implementations
- Essential Readings & Ecosystem Maps
- Maintenance Notes
- Citation
| Category | Entries |
|---|---|
| Harness Architecture & Orchestration | 20 |
| Context & Working-State Engineering | 8 |
| Execution Substrates & Sandboxing | 16 |
| Protocols, Tool Interfaces & Agent Contracts | 11 |
| Evaluation Harnesses & Benchmarks | 20 |
| Observability & Reliability Operations | 13 |
| Guardrails, Security & Governance | 11 |
| Reference Harness Implementations | 32 |
| Essential Readings & Ecosystem Maps | 28 |
Notes:
Starsare rendered as badges from snapshot values.- Repository update dates are tracked in
data/projects.yamland validation reports. - Entries are sorted by stars (descending) within each category.
| Project | Link | Stars | Tags | Summary |
|---|---|---|---|---|
| DeerFlow | GitHub | long-horizon, memory, subagents | Long-horizon super-agent harness integrating memory, tools, subagents, and sandboxes. | |
| AutoGen | GitHub | multi-agent, orchestration, framework | Programming framework for agentic AI with multi-agent interaction and orchestration. | |
| Agno | GitHub | scale, runtime, management | Agent software runtime focused on running and managing agentic systems at scale. | |
| LangGraph | GitHub | graph, workflow, runtime | Graph-based runtime for resilient stateful agents and deterministic workflow control. | |
| Semantic Kernel | GitHub | enterprise, orchestration, plugins | Enterprise-grade agentic application framework with orchestration and plugin patterns. | |
| OpenAI Agents SDK (Python) | GitHub | sdk, handoff, workflows | Lightweight framework for multi-agent workflows, handoffs, and production patterns. | |
| deepagents | GitHub | runtime, orchestration, long-running | Open-source harness for long-running, tool-using agents with planning and subagent patterns. | |
| Google ADK (Python) | GitHub | toolkit, deployment, evaluation | Code-first toolkit to build, evaluate, and deploy advanced AI agents. | |
| PydanticAI | GitHub | python, typing, schema | Type-safe Python framework for agents with strong schema contracts and tooling. | |
| Hive | GitHub | harness, orchestration, runtime | Outcome-driven agent runtime harness with explicit control loops and orchestration blocks. | |
| Microsoft Agent Framework | GitHub | multi-agent, workflows, observability | Multi-language framework for building, orchestrating, and deploying AI agents with graph workflows and observability. | |
| VoltAgent | GitHub | typescript, platform, runtime | TypeScript agent engineering platform built around open runtime abstractions. | |
| mcp-agent | GitHub | mcp, runtime, workflow | Practical agent framework centered on MCP tool ecosystems and workflow composition. | |
| Yao | GitHub | single-binary, runtime, autonomous | Single-binary runtime for defining and running autonomous agents. | |
| Cloudflare Agents | GitHub | platform, deployment, runtime | Platform runtime for building and deploying agents with production infrastructure primitives. | |
| Docker Agent | GitHub | docker, runtime, container | Agent builder and runtime stack emphasizing container-native execution. | |
| NeMo Agent Toolkit | GitHub | multi-agent, optimization, toolkit | Open toolkit for connecting and optimizing teams of AI agents. | |
| Scion | GitHub | multi-agent, containers, orchestration | Experimental multi-agent orchestration testbed that runs isolated agent harnesses in containers, worktrees, and remote runtimes. | |
| deepagentsjs | GitHub | typescript, langgraph, subagents | TypeScript agent harness with built-in planning, filesystem tools, subagents, and LangGraph-native runtime hooks. | |
| hankweave | GitHub | long-horizon, runtime, checkpoints | Headless-first long-horizon runtime that orchestrates existing agent harnesses with sentinels, loops, checkpoints, and event journals. |
| Project | Link | Stars | Tags | Summary |
|---|---|---|---|---|
| everything-claude-code | GitHub | context, skills, harness-practices | Large open repository of harness practices around memory, skills, and context control for coding agents. | |
| claude-mem | GitHub | memory, context, session | Plugin-style memory layer that captures session history and reinjects relevant context into future coding runs. | |
| planning-with-files | GitHub | planning, skills, persistence | Skill package for persistent file-based planning in coding-agent workflows. | |
| Agent Skills for Context Engineering | GitHub | skills, context, production | Large skill library oriented around context engineering and production agents. | |
| Context-Engineering Handbook | GitHub | context-engineering, handbook, practices | First-principles handbook focused on practical context engineering for agent systems. | |
| Trellis | GitHub | specs, memory, workflow | Multi-platform coding-agent workflow framework with task context, project memory, and spec injection. | |
| Awesome Context Engineering | GitHub | awesome-list, context, survey | Survey-style list for context engineering resources and frameworks. | |
| context-space | GitHub | context, infrastructure, mcp | Infrastructure project focused on context engineering building blocks and MCP-centric integrations. |
| Project | Link | Stars | Tags | Summary |
|---|---|---|---|---|
| Daytona | GitHub | sandbox, execution, infra | Secure and elastic sandbox infrastructure for running AI-generated code with file, Git, LSP, and execution APIs. | |
| CUA | GitHub | computer-use, sandbox, infra | Infrastructure stack for computer-use agents with sandbox, SDK, and benchmark support. | |
| E2B | GitHub | cloud-sandbox, execution, enterprise | Secure cloud environments with real tools for production-grade agent execution. | |
| OpenSandbox | GitHub | sandbox, security, runtime | Secure and extensible sandbox runtime built for agent workloads. | |
| agent-infra sandbox | GitHub | all-in-one, browser, shell | All-in-one sandbox combining browser, shell, files, MCP, and IDE server. | |
| Judge0 | GitHub | code-execution, sandbox, backend | Scalable sandboxed code execution system usable as an agent execution backend. | |
| Agent Sandbox | GitHub | kubernetes, sandbox, stateful | Kubernetes-native sandbox control plane for isolated, stateful agent runtimes with stable identity, persistence, and warm-pool support. | |
| stakpak/agent | GitHub | always-on, autonomous, ops | Always-on open agent that runs on your machines with autonomous operational loops. | |
| OSS-Fuzz Gen | GitHub | fuzzing, security, execution | LLM-powered fuzzing workflows integrated with controlled execution contexts. | |
| Tensorlake | GitHub | microvm, sandbox, orchestration | Serverless runtime for agent sandboxes with MicroVM isolation, snapshots, suspend-resume, and background orchestration. | |
| Arrakis | GitHub | sandbox, microvm, snapshots | Self-hosted sandbox substrate with MicroVM isolation, snapshot restore, and REST, SDK, and MCP interfaces for agent code execution and computer use. | |
| AgentScope Runtime | GitHub | runtime, sandbox, deployment | Production runtime for agent apps with secure tool sandboxes, deployment APIs, observability, and state services. | |
| SWE-ReX | GitHub | sandbox, execution, coding-agent | Sandboxed execution infrastructure for AI coding agents at local and cloud scale. | |
| sandboxed.sh | GitHub | self-hosted, isolation, orchestrator | Self-hosted orchestrator running coding agents inside isolated Linux workspaces. | |
| Capsule | GitHub | wasm, sandbox, task-runtime | Durable runtime that coordinates agent tasks inside isolated WebAssembly sandboxes with retries and lifecycle tracking. | |
| terminal-bench-env | GitHub | terminal, benchmark-env, sandbox | Environment layer for terminal-agent benchmark execution. |
| Project | Link | Stars | Tags | Summary |
|---|---|---|---|---|
| GitHub Spec Kit | GitHub | spec-driven, workflows, tooling | Toolkit for spec-driven development to guide deterministic agent execution. | |
| MCP Servers | GitHub | mcp, servers, implementations | Official collection of MCP server implementations across tools and domains. | |
| AGENTS.md | GitHub | spec, agent-file, instructions | Open format for repository-local instructions that coding agents can follow. | |
| Model Context Protocol | GitHub | mcp, protocol, interoperability | Core specification and docs for MCP-based tool and context interoperability. | |
| directories (rules and MCP indexes) | GitHub | directories, mcp, rules | Curated directories of agent rules and MCP servers for tool discovery. | |
| LangChain MCP Adapters | GitHub | mcp, adapters, integration | Adapters connecting LangChain components with MCP servers. | |
| Microsoft MCP Servers | GitHub | mcp, enterprise, servers | Microsoft's official MCP server catalog for enterprise data and tools. | |
| ACPX | GitHub | acp, client, sessions | Headless CLI client for stateful Agent Client Protocol sessions. | |
| Microsoft Learn MCP | GitHub | mcp, docs, grounding | MCP server and CLI for grounding agents with Microsoft documentation sources. | |
| IBM MCP | GitHub | mcp, clients, tooling | IBM collection of MCP servers, clients, and developer tooling. | |
| AGENT.md | GitHub | standard, agent-file, interoperability | Standardized machine-readable file format for agentic coding tools. |
| Project | Link | Stars | Tags | Summary |
|---|---|---|---|---|
| Promptfoo | GitHub | eval, red-team, ci | Config-driven prompt/agent/RAG testing, comparison, and red-team evaluation tool. | |
| DeepEval | GitHub | evaluation, framework, testing | LLM evaluation framework supporting agent and workflow quality testing. | |
| RAGAS | GitHub | rag, metrics, evaluation | Open evaluation toolkit for LLM and RAG quality metrics. | |
| lm-evaluation-harness | GitHub | benchmark, harness, llm | Popular benchmark harness for consistent LLM evaluation across tasks. | |
| SWE-bench | GitHub | benchmark, swe, evaluation | Standard benchmark for evaluating issue-fixing software engineering agents. | |
| verifiers | GitHub | verifier, rl, evaluation | Library for RL environments and verifier-based evaluation loops. | |
| AgentBench | GitHub | benchmark, cross-domain, agent | Cross-environment benchmark for evaluating LLM agents as tool-using systems. | |
| LangWatch | GitHub | simulation, evaluation, testing | End-to-end platform for agent simulations, evaluation loops, and production testing. | |
| EvalScope | GitHub | benchmark, framework, llm | Customizable framework for large-model benchmarking and performance evaluation. | |
| Terminal-Bench | GitHub | terminal, benchmark, long-horizon | Terminal-native benchmark suite for long-horizon, verification-heavy agent tasks. | |
| Harbor | GitHub | evaluation, harness, rl-env | Framework for running agent evaluations and constructing RL-style environments. | |
| tau2-bench | GitHub | tool-use, interaction, benchmark | Tool-agent-user interaction benchmark emphasizing multi-step execution quality. | |
| NeMo Gym | GitHub | rl-env, training, evaluation | Toolkit for building RL environments suitable for LLM/agent training and eval. | |
| TheAgentCompany | GitHub | benchmark, workplace, multi-step | Agent benchmark with simulated software-company tasks for evaluating multi-step workplace autonomy. | |
| Inspect Evals | GitHub | inspect, eval-suite, reproducibility | Evaluation suite collection for Inspect AI workflows. | |
| auto-harness | GitHub | optimization, regression, evals | Benchmark-gated optimization loop that mines failures, edits agent code, and guards against regressions overnight. | |
| Agent Evaluation | GitHub | evaluation, testing, ci | AWS framework for testing virtual agents with evaluator-driven multi-turn conversations, hooks, and CI-friendly workflows. | |
| WorkArena | GitHub | browser, benchmark, enterprise | Browser benchmark for practical enterprise-like knowledge work tasks. | |
| OpenHands Benchmarks | GitHub | openhands, eval, harness | Evaluation harness and benchmark definitions for OpenHands systems. | |
| WebArena-Verified | GitHub | web-agent, benchmark, deterministic | Verified web-agent benchmark with deterministic evaluators. |
| Project | Link | Stars | Tags | Summary |
|---|---|---|---|---|
| MLflow | GitHub | platform, monitoring, evaluation | Broad AI engineering platform with monitoring and evaluation support for agents. | |
| Langfuse | GitHub | llmops, tracing, metrics | Open-source LLM engineering platform for traces, metrics, prompts, and evals. | |
| Opik | GitHub | monitoring, eval, tracing | End-to-end debug/eval/monitoring stack for LLM apps and agent workflows. | |
| RagaAI Catalyst | GitHub | agentops, analytics, monitoring | Agent observability and monitoring framework with timeline and graph analytics. | |
| TensorZero | GitHub | llmops, gateway, optimization | Open LLMOps stack unifying gateway, observability, evaluation, and optimization. | |
| Arize Phoenix | GitHub | observability, tracing, evaluation | Open platform for AI observability, tracing, and evaluation analytics. | |
| OpenLLMetry | GitHub | opentelemetry, instrumentation, tracing | OpenTelemetry-based instrumentation for GenAI and LLM applications. | |
| Helicone | GitHub | monitoring, traffic, production | Lightweight platform for monitoring and evaluating LLM traffic in production. | |
| AgentOps SDK | GitHub | agentops, monitoring, cost | Monitoring and benchmarking SDK for agent workflows with cost and trace tracking. | |
| Latitude | GitHub | platform, eval, observability | Open-source agent engineering platform with eval and observability capabilities. | |
| Laminar | GitHub | observability, tracing, evals | Agent-focused observability stack with tracing, evaluation runs, monitoring, and dashboards. | |
| claude-code-reverse | GitHub | trace, visualization, debugging | Tooling to visualize and inspect Claude Code LLM interaction traces. | |
| OpenInference | GitHub | spec, instrumentation, observability | Open instrumentation specification and tooling for AI observability. |
| Project | Link | Stars | Tags | Summary |
|---|---|---|---|---|
| LiteLLM | GitHub | gateway, proxy, guardrails | Unified LLM gateway/proxy with cost tracking, load balancing, and guardrails. | |
| Kong | GitHub | gateway, policy, infra | API and AI gateway infrastructure useful for policy enforcement in agent systems. | |
| Portkey Gateway | GitHub | gateway, guardrails, routing | AI gateway with routing and guardrails for multi-model production traffic. | |
| CAI (Cybersecurity AI) | GitHub | security, governance, framework | Security-focused agent framework for offensive/defensive AI workflows. | |
| OpenAI Realtime Agents | GitHub | realtime, orchestration, control | Advanced agentic realtime patterns with structured control and interaction loops. | |
| Plano | GitHub | proxy, safety, data-plane | AI-native proxy and data plane with orchestration, safety, and observability. | |
| OpenAI CS Agents Demo | GitHub | demo, handoffs, governance | Customer-service multi-agent demo highlighting handoffs and guardrail-like control points. | |
| ContextForge | GitHub | gateway, governance, observability | Registry and proxy layer that unifies MCP, A2A, and REST/gRPC endpoints with centralized governance and observability. | |
| Archestra | GitHub | enterprise, guardrails, governance | Enterprise AI platform with guardrails, MCP registry, and orchestration services. | |
| Tracecat | GitHub | security, automation, policy | AI automation platform for security teams with policy and workflow controls. | |
| AgentGateway | GitHub | gateway, mcp, proxy | Agentic proxy gateway for AI agents and MCP server ecosystems. |
| Project | Link | Stars | Tags | Summary |
|---|---|---|---|---|
| OpenCode | GitHub | terminal, coding-agent, subagents | Open-source coding agent with built-in plan/build roles, subagents, LSP support, and a client-server runtime. | |
| Claude Code | GitHub | terminal, coding-agent, git-workflows | Official terminal coding agent that understands codebases and executes editing, debugging, and Git workflows through natural language. | |
| Gemini CLI | GitHub | terminal, coding-agent, mcp | Open-source terminal agent with built-in tools, MCP support, checkpointing, and sandboxing controls. | |
| Codex CLI | GitHub | terminal, coding-agent, local-execution | Terminal-native coding agent that runs locally and exposes practical agent workflows for software tasks. | |
| OpenHands | GitHub | coding-agent, software-engineering, repo | Open-source AI software engineer focused on repo-level coding task execution. | |
| OpenManus | GitHub | general-agent, autonomy, workflows | Open foundation for broad autonomous agent workflows with coding-heavy use cases. | |
| learn-claude-code | GitHub | tutorial, harness, claude-code | Hands-on harness tutorial for building Claude Code-like systems from scratch. | |
| aider | GitHub | terminal, repo-map, testing | Terminal coding assistant with repo mapping, git-aware edits, and built-in lint/test feedback loops. | |
| Claude Code Plugins: Orchestration and Automation | GitHub | claude-code, plugins, orchestration | Production-ready Claude Code plugin marketplace bundling agents, skills, tools, and multi-agent workflow orchestrators. | |
| CLI-Anything | GitHub | cli, tool-use, automation | CLI agent system that unifies command-line tool usage in agent loops. | |
| NanoClaw | GitHub | containers, claude-sdk, scheduling | Container-isolated Claude agent harness with channel routing, scheduled jobs, per-group memory, and small-codebase customization. | |
| Qwen Code | GitHub | terminal, coding-agent, cli | Terminal-native open-source coding agent tuned for practical dev loops. | |
| SuperClaude Framework | GitHub | config, personas, workflow | Configuration framework adding commands, personas, and method templates to coding agents. | |
| Devika | GitHub | assistant, planning, coding | Open-source coding assistant system for planning and implementing development tasks. | |
| SWE-agent | GitHub | swe, issue-fixing, tooling | Research-grade coding agent that resolves GitHub issues with explicit tooling loops. | |
| Aperant | GitHub | coding-agent, parallel, memory | Autonomous multi-agent coding framework with parallel execution, isolated workspaces, QA loops, and persistent memory. | |
| Eigent | GitHub | desktop, cowork, productivity | Open-source desktop cowork agent for autonomous task execution and productivity. | |
| IronClaw | GitHub | security, wasm, routines | Security-first personal agent harness with WASM sandboxing, routines, tool plugins, and persistent memory. | |
| OpenHarness | GitHub | tool-use, memory, multi-agent | Open agent harness implementation covering tool use, skills, memory, permissions, and multi-agent coordination. | |
| GitHub Copilot CLI | GitHub | terminal, coding-agent, mcp | Official terminal coding agent built on GitHub's Copilot harness with MCP extensibility, approval controls, and GitHub-native context. | |
| Superset | GitHub | worktrees, desktop, parallel | Worktree-based desktop orchestrator for running and reviewing parallel CLI coding agents from one workspace. | |
| Open SWE | GitHub | async, coding-agent, swe | Asynchronous open-source coding agent focused on software issue workflows. | |
| OSAURUS | GitHub | macos, local-first, memory | Native macOS harness for autonomous coding agents with persistent memory. | |
| HiClaw | GitHub | multi-agent, human-in-the-loop, shared-state | Collaborative multi-agent OS with manager-worker coordination, shared state, and human-in-the-loop oversight via Matrix rooms. | |
| mini-swe-agent | GitHub | minimal, swe, coding-agent | Minimal coding agent implementation with strong benchmark competitiveness. | |
| TinyAGI | GitHub | team-orchestration, autonomous, workflows | Team-style agent orchestrator for one-person-company style autonomous workflows. | |
| Devon | GitHub | pair-programming, coding-agent, autonomous | Open-source pair programmer agent with autonomous coding execution patterns. | |
| oh-my-pi | GitHub | terminal, lsp, subagents | Terminal AI coding agent with edit safety, LSP integration, and subagent support. | |
| Open Claude Cowork | GitHub | desktop, ui, orchestration | Desktop coding cowork assistant that turns agent orchestration into GUI workflows. | |
| holaOS | GitHub | long-horizon, desktop, durable-state | Desktop-first long-horizon agent environment with runtime, memory, tools, apps, and durable state. | |
| Amazon Bedrock AgentCore Samples | GitHub | aws, runtime, operations | Official sample suite for deploying and operating agents with runtime, gateway, memory, observability, evaluation, and policy layers. | |
| mini-coding-agent | GitHub | coding-agent, minimal, approvals | Minimal coding agent harness illustrating approvals, memory, bounded delegation, and durable transcripts. |
| Project | Link | Stars | Tags | Summary |
|---|---|---|---|---|
| awesome-claude-code | GitHub | awesome-list, claude-code, skills | Community collection of Claude Code skills, hooks, and orchestrator tooling. | |
| awesome-agentic-patterns | GitHub | awesome-list, patterns, design | Catalog of reusable agentic design patterns and implementation motifs. | |
| awesome-mcp-servers | GitHub | awesome-list, mcp, tools | Curated MCP server index for tool interoperability in agent systems. | |
| awesome-harness-engineering | GitHub | awesome-list, curation, harness | Curated list focused on harness engineering articles, benchmarks, and implementations. | |
| 12 Factor Agents | Reference | - | reading, operations, principles | Operations-oriented principles for building maintainable production agents. |
| Agent Frameworks, Runtimes, and Harnesses, oh my! | Reference | - | reading, langchain, architecture | Clear decomposition of framework vs runtime vs harness responsibilities. |
| Building agents with the Claude Agent SDK | Reference | - | reading, claude, sdk | Claude blog on production-oriented SDK usage for sessions, tools, and orchestration. |
| Building Effective AI Agents | Reference | - | reading, anthropic, agents | Anthropic's practical guidance on when to use workflows vs. autonomous agents and how to structure them. |
| Claude Code auto mode | Reference | - | reading, anthropic, permissions | Anthropic's write-up on classifier-backed approval delegation for safer high-autonomy coding-agent runs. |
| Code execution with MCP | Reference | - | reading, anthropic, mcp | Anthropic's design notes on controlled code execution via MCP boundaries. |
| Demystifying Evals for AI Agents | Reference | - | reading, evals, anthropic | Methodology for designing robust agent evals in non-deterministic trajectories. |
| Effective context engineering for AI agents | Reference | - | reading, context, anthropic | Guidance on context-window budgeting and working-state management for agents. |
| Effective harnesses for long-running agents | Reference | - | reading, long-running, anthropic | Practical guide to maintaining state, resumability, and reliability over long agent runs. |
| Evaluating Deep Agents: Our Learnings | Reference | - | reading, langchain, evaluation | LangChain's practical lessons on evaluating stateful and long-horizon agents. |
| Harness design for long-running application development | Reference | - | reading, app-dev, anthropic | Follow-up article on improving long-running app generation through harness structure. |
| Harness Engineering (Martin Fowler) | Reference | - | reading, architecture, fowler | Architectural perspective on harness engineering and entropy control. |
| Harness engineering (OpenAI) | Reference | - | reading, methodology, openai | Field report on building reliable agent-first software via harness constraints and verification. |
| How we built our multi-agent research system | Reference | - | reading, anthropic, multi-agent | Anthropic architecture write-up on role separation and coordination in multi-agent systems. |
| Improving Deep Agents with harness engineering | Reference | - | reading, langchain, harness | Evidence that harness improvements alone can move benchmark performance. |
| Making Claude Code more secure and autonomous with sandboxing | Reference | - | reading, anthropic, sandboxing | How Anthropic uses sandbox boundaries to raise agent autonomy without giving up security controls. |
| Quantifying infrastructure noise in agentic coding evals | Reference | - | reading, anthropic, evaluation | Analysis of how infrastructure choices impact coding-agent benchmark outcomes. |
| Scaling Managed Agents: Decoupling the brain from the hands | Reference | - | reading, anthropic, architecture | Anthropic's meta-harness architecture for decoupling session logs, harness loops, and sandboxes in long-horizon agents. |
| Skill Issue: Harness Engineering for Coding Agents | Reference | - | reading, humanlayer, coding-agents | Practical breakdown of why coding-agent quality depends heavily on harness setup. |
| Testing Agent Skills Systematically with Evals | Reference | - | reading, openai, evals | OpenAI Developers guide for turning agent traces into repeatable skill evaluations. |
| The Anatomy of an Agent Harness | Reference | - | reading, architecture, langchain | Conceptual decomposition of agent harness components and their responsibilities. |
| Unrolling the Codex agent loop | Reference | - | reading, openai, architecture | OpenAI engineering deep dive into the Codex harness loop, prompt growth, tool-call replay, and stateless execution tradeoffs. |
| Writing effective tools for AI agents | Reference | - | reading, anthropic, tools | Best practices for tool interface design so agents call tools safely and reliably. |
| Your Agent Needs a Harness, Not a Framework | Reference | - | reading, inngest, reliability | Argument for reliability-first infrastructure around agents instead of framework-only thinking. |
- Source of truth:
data/projects.yaml - Regenerate README files:
python3 scripts/render_readme.py - Verify catalog and links:
python3 scripts/verify_catalog.py
@misc{awesome-agent-harness,
title={Awesome Agent Harness},
howpublished={\url{https://github.com/Picrew/awesome-agent-harness.git}},
year={2026}
}