Open-source runtime for personal AI agents that can finish work reliably.
Clawy Agent is not a prompt chain and not a chatbot wrapper. It is a durable agent runtime: every task runs inside an observable loop with tool execution, runtime checks, persistent transcripts, memory, deterministic evidence, file delivery, scheduled automation, and user-defined harness rules.
If you are tired of agents that create files but forget to send them, claim work is done without verification, compute dates or totals from model intuition, lose context after a restart, misroute scheduled jobs, or ignore workflow instructions buried in the prompt, Clawy Agent moves those behaviors out of vibes and into runtime state.
Think Claude Code, but open-source, multi-provider, always-on, and programmable.
Clawy Agent includes Clawy Agent App, a self-hostable workbench for running a Codex-like personal agent app with your own provider, workspace, tools, memory, schedules, and harness rules.
The included /app shell connects to the same runtime over HTTP/SSE, streams
turns, shows live sessions, background tasks, scheduled jobs, artifacts, loaded
skills, and runtime events, and keeps provider secrets out of the browser by
using a separate server token.
The goal is to keep the visible app surface open while keeping hosted Clawy Cloud's production control plane separate: billing, fleet provisioning, managed credentials, production auth, hosted data contracts, and operator backoffice stay hosted-only. See the open-source app plan for scope, architecture, milestones, and release gates.
Most agent frameworks give you a model, a tool schema, and a loop. That is not enough for real personal agents.
Real agents need to:
- keep working across long tasks, restarts, and channel reconnects
- remember user context without stuffing the whole chat into the next prompt
- run tools while respecting file boundaries, safety rules, and permissions
- pause for user input without losing the turn
- verify work, exact values, and source usage before committing a final answer
- deliver generated files back to the user instead of only writing them to disk
- run scheduled workflows without letting the model guess delivery channels or execute worker tasks in the wrong role
- expose the control surface so operators can add rules without forking core code
Clawy Agent is built around that premise. The LLM is the reasoning engine; the runtime is the discipline layer that decides what must be evidenced, persisted, retried, blocked, or delivered.
Every user request becomes an atomic Turn. A turn can stream, think, call
tools, receive tool results, retry failed drafts, ask the user a question, spawn
child agents, and only then commit a final answer.
User message
-> beforeTurnStart session resume, onboarding, memory prep
-> beforeLLMCall context, identity, rules, memory, policy
-> LLM stream text, thinking, tool_use
-> beforeToolUse permission gates, resource checks
-> Tool execution files, shell, browser, web, documents, child agents
-> afterToolUse provenance, delivery, harness checks
-> ... repeat until ready
-> beforeCommit verification, output purity, delivery gates
-> turn_committed transcript, memory, artifacts, channel delivery
The important part: checks are not just text in the system prompt. They are
runtime gates at the points where mistakes happen, backed by an
ExecutionContract that carries criteria, resource bindings, verification
evidence, and deterministic evidence through the turn.
| Capability | What it means in practice |
|---|---|
| Atomic agentic loop | The agent can plan, execute tools, evaluate outputs, and continue until the job is actually complete. |
| Lifecycle hooks | Add deterministic or LLM-judged checks at beforeLLMCall, beforeToolUse, afterToolUse, beforeCommit, and more. |
| Execution contracts | Acceptance criteria, resource bindings, used-resource provenance, verification evidence, and deterministic requirements live in runtime state. |
| Deterministic exactness | Dates, time windows, counts, averages, sums, percent changes, and comparisons can be forced through runtime evidence instead of model guesswork. |
| Scheduled-work discipline | Cron turns are treated as orchestration work: delivery channel is persisted, parent turns stay meta-only, worker work is delegated, and delivery safety is enforced. |
| Replayable transcripts | Tool calls, tool results, control events, compaction boundaries, and canonical assistant messages are persisted for restart-safe replay. |
| Hipocampus memory | A layered memory system with root/daily/weekly/monthly compaction and qmd-backed recall. |
| User Harness Rules | Install Markdown rules that become runtime checks, including required tools, required tool input patterns, LLM verifiers, and blockers. |
| Native delivery path | Documents, spreadsheets, and workspace files can be generated, registered, and delivered back through supported channels. |
| Child agents | Spawn background agents with bounded tools, workspace isolation, and result delivery. |
| Multi-channel | Run the same runtime from CLI, HTTP, Telegram, or Discord. |
| Multi-provider | Use Anthropic, OpenAI, or Google models through one runtime interface. |
Clawy Agent ships with 30+ native tools and runtime subsystems:
- Workspace tools:
FileRead,FileWrite,FileEdit,Glob,Grep,Bash - Web and browser:
WebSearch,WebFetch,Browser,SocialBrowser - Deterministic workbench:
Clock,DateRange,Calculation - Knowledge and memory:
KnowledgeSearch, Hipocampus recall, qmd indexing - Generated outputs:
DocumentWrite,SpreadsheetWrite,FileDeliver,FileSend - Artifacts:
ArtifactCreate,ArtifactRead,ArtifactList,ArtifactUpdate,ArtifactDelete - Delegation:
SpawnAgent,TaskList,TaskGet,TaskOutput,TaskStop - Planning and control:
EnterPlanMode,ExitPlanMode,AskUserQuestion,TaskBoard - Automation:
CronCreate,CronList,CronUpdate,CronDelete - Discipline:
CommitCheckpoint, execution contracts, verification evidence gates - Skills: workspace
skills/loading plusPOST /v1/admin/skills/reload
Optional dependencies enable richer formats and rendering paths, including DOCX, PDF, HWPX, XLSX, qmd, and Playwright-backed browser work.
Browser uses the external agent-browser and integration.sh helpers for
centralized browser-worker sessions. SocialBrowser uses the same browser
control path, but only against a user-authorized Instagram/X session claimed
through integration.sh social-browser/*. Standalone deployments that want this
tool must provide those integration endpoints:
social-browser/status?provider=instagram|xsocial-browser/claimwith{provider,maxItems}returning{provider,sessionId,cdpEndpoint,maxItems,expiresAt?}social-browser/close?provider=instagram|x
The runtime keeps navigation provider-scoped, redacts CDP endpoints/tokens from tool output, and caps visible scraped items.
Agent
|-- Session one conversation / channel thread
| |-- Turn atomic agentic loop
| |-- Transcript append-only JSONL replay log
| |-- Context identity + rules + memory + tool state
| |-- ExecutionContract criteria + resources + deterministic evidence
|
|-- ToolRegistry native tools + loaded skills
|-- HookRegistry runtime control plane
|-- PolicyKernel compiled runtime policy + user harness rules
|-- OutputArtifactRegistry generated files and delivery metadata
|-- BackgroundTaskRegistry spawned child-agent work
|-- CronScheduler durable scheduled tasks + channel routing
|-- HipocampusService memory compaction + recall
|-- ChannelAdapters CLI, HTTP, Telegram, Discord
Design principles:
- Runtime over prompt vibes. Important constraints should live in hooks, gates, transcripts, and tool boundaries, not only in instructions.
- Durability by default. A useful agent should survive reconnects, retries, background work, and long conversations.
- Evidence before exact claims. Dates, counts, arithmetic, source usage, and completion claims should be grounded in tool results or explicitly marked as unverifiable.
- Operator control. Users should be able to install rules and skills without patching the core runtime.
- Visible work. Tool calls, progress, generated artifacts, and delivery events are first-class runtime state.
- Fail open where ergonomic, fail closed where safety matters. Memory recall should not kill a turn; unsafe file writes and false completion claims can.
Clawy Agent is designed for the failure modes that show up once agents are used for real work, not only demos.
Each turn can carry an ExecutionContract. The contract records:
- acceptance criteria and their verification state
- resource bindings and used-resource provenance
- generated artifacts and delivery evidence
- deterministic requirements and deterministic evidence
Hooks and tools read and write this contract throughout the turn. That lets the runtime block a weak final answer because a criterion is still pending, because the agent cited a resource it did not use, or because a numeric/date claim was not backed by deterministic evidence.
When a request asks for exact values, the runtime can classify it as requiring deterministic evidence. Typical triggers include date ranges, recency windows, counts, totals, averages, percent changes, financial values, and comparisons.
The model is then expected to use native tools such as Clock, DateRange,
Calculation, FileRead, KnowledgeSearch, WebFetch, or WebSearch
instead of doing mental math. Those tools can record structured evidence on the
execution contract. Before commit, the deterministic evidence verifier compares
the draft answer against the recorded evidence and can force a retry when the
answer invents or contradicts exact values.
The result is not "the model was told to be careful." The runtime has a place to store the requirement, a place to store the evidence, and a gate that can reject the final answer.
Cron jobs are treated as durable workflows, not delayed chat messages. When a cron is created, Clawy Agent captures the source delivery channel instead of asking the model to choose a target later. When the cron fires, the parent turn is constrained to meta-orchestration: inspect the schedule, delegate the actual work to a child agent, and summarize or deliver the result.
Cron safety is enforced through several runtime pieces working together:
CronSchedulerpersists cron records and next-fire timescronMetaOrchestratorkeeps parent cron turns in the orchestration rolebeforeToolUseguards prevent parent cron turns from doing worker I/ObeforeCommitchecks reject cron parent answers that skipped delegationcronDeliverySafetyblocks direct or ambiguous channel delivery patternsTaskBoarditeration state, the sweeper, and stop conditions keep long scheduled loops restart-safe and bounded
That is how Clawy Agent avoids the common failure where a scheduled agent ignores the workflow boundary, opens the wrong resource, or sends the result to the wrong channel.
Operators can extend the runtime without forking it:
- Markdown harness rules compile into executable gates
require_toolensures a tool was successfully used in the current turnrequire_tool_input_matchensures a successful tool used the expected input, such as a specificWebFetch.urlorBash.commandpatternllm_verifieradds scoped judgment checks where deterministic checks are not enough- workspace
skills/can add prompt-only or script-backed tools and can be reloaded throughPOST /v1/admin/skills/reload
Native tools are restored after skill loading, so a workspace skill cannot
accidentally replace core tools such as Browser, SocialBrowser,
WebSearch, WebFetch, Clock, DateRange, or Calculation.
git clone https://github.com/ClawyPro/clawy-agent.git
cd clawy-agent
npm install
npx tsx src/cli/index.ts init
npx tsx src/cli/index.ts startThe init command writes clawy-agent.yaml. The start command runs an
interactive terminal agent against the configured workspace.
git clone https://github.com/ClawyPro/clawy-agent.git
cd clawy-agent
npm installRun commands with:
npx tsx src/cli/index.ts <command>An npm package is planned. Until it is published, use the source install above.
npm install -g clawy-agent
clawy-agent <command>npx tsx src/cli/index.ts startTerminal conversation mode for local work.
export CLAWY_AGENT_SERVER_TOKEN=$(openssl rand -hex 24)
npx tsx src/cli/index.ts serve --port 8080Starts the HTTP API server. If Telegram or Discord tokens are configured, the same process also connects to those channels.
Open the self-hosted app at:
http://localhost:8080/app
Use CLAWY_AGENT_SERVER_TOKEN as the app's server token. Do not paste your LLM
provider API key into the browser.
The app currently uses these local read-only inspection endpoints:
| Endpoint | Purpose |
|---|---|
GET /v1/app/runtime |
Aggregate snapshot for sessions, tasks, crons, artifacts, tools, and skills. |
GET /v1/app/sessions |
Live session metadata, permission posture, and budget counters. |
GET /v1/app/transcript?sessionKey=... |
Bounded committed transcript replay for a session. |
GET /v1/app/tasks |
Background child-agent task list. |
GET /v1/app/crons |
Scheduled workflow list, including internal runtime crons for operators. |
GET /v1/app/artifacts |
Generated artifact index. |
GET /v1/app/skills |
Loaded skills, skill issues, and runtime skill hooks. |
These endpoints require Authorization: Bearer $CLAWY_AGENT_SERVER_TOKEN when
server.gatewayToken is configured.
import { Agent } from "clawy-agent";
const agent = new Agent({
botId: "my-agent",
userId: "local",
workspaceRoot: "./workspace",
model: "claude-sonnet-4-6",
gatewayToken: process.env.ANTHROPIC_API_KEY!,
apiProxyUrl: "https://api.anthropic.com",
});
await agent.start();Run npx tsx src/cli/index.ts init to generate clawy-agent.yaml
interactively, or create it manually:
llm:
provider: anthropic # anthropic, openai, or google
model: claude-sonnet-4-6
apiKey: ${ANTHROPIC_API_KEY}
server:
gatewayToken: ${CLAWY_AGENT_SERVER_TOKEN}
channels:
telegram:
token: ${TELEGRAM_BOT_TOKEN}
discord:
token: ${DISCORD_BOT_TOKEN}
hooks:
builtin:
factGrounding: true
preRefusalVerifier: true
workspaceAwareness: true
sessionResume: true
discipline: false
memory:
enabled: true
compaction: true
workspace: ./workspace
identity:
name: "My Agent"
instructions: "You are a helpful coding assistant."- Create a bot via @BotFather.
- Copy the bot token.
- Add it to
clawy-agent.yaml.
channels:
telegram:
token: ${TELEGRAM_BOT_TOKEN}Then start server mode:
export TELEGRAM_BOT_TOKEN=123456:ABC-DEF...
export ANTHROPIC_API_KEY=sk-ant-...
npx tsx src/cli/index.ts serveThe agent uses Telegram long polling and supports typing indicators, reply
context, and /reset.
- Create an application at the Discord Developer Portal.
- Create a bot and copy the token.
- Invite it with the
botandapplications.commandsscopes. - Add the token to config and start
serve.
The bot responds to mentions in channels where it is present.
Switch providers by changing llm.provider and llm.apiKey:
# Anthropic Claude
llm:
provider: anthropic
model: claude-sonnet-4-6
apiKey: ${ANTHROPIC_API_KEY}
# OpenAI GPT
llm:
provider: openai
model: gpt-5.4
apiKey: ${OPENAI_API_KEY}
# Google Gemini
llm:
provider: google
model: gemini-2.5-flash
apiKey: ${GOOGLE_API_KEY}All providers support streaming, tool use, and the full agentic loop. The provider layer handles message and tool-call format conversion.
Hooks are the core extension point. They can inspect or modify turn state, inject context, approve or block tools, verify final answers, write memory, and emit audit events.
Common built-in gates include:
| Hook | Purpose |
|---|---|
factGrounding |
Reduces unsupported factual claims. |
preRefusalVerifier |
Challenges unnecessary refusals before they reach the user. |
deterministicExactness |
Classifies exact numeric/date/count requests and records deterministic requirements. |
deterministicEvidenceVerifier |
Checks final exact claims against recorded deterministic evidence. |
workspaceAwareness |
Injects relevant filesystem context. |
sessionResume |
Restores continuity when a session resumes. |
discipline |
Enables TDD/git enforcement for coding tasks. |
dangerousPatterns |
Blocks unsafe operations. |
outputPurityGate |
Blocks leaked internal planning in final answers. |
completionEvidenceGate |
Requires evidence before completion claims. |
resourceBoundaryGate |
Prevents use of resources outside the task boundary. |
cronMetaOrchestrator |
Keeps scheduled parent turns in a meta-orchestration role. |
cronDeliverySafety |
Prevents ambiguous or direct channel delivery from cron worker paths. |
userHarnessRules |
Enforces operator-installed Markdown rules. |
User Harness Rules are runtime checks installed as Markdown files in the agent workspace. They let an operator turn "please always do X" into an executable gate.
Example use cases:
- if a document or spreadsheet is created, deliver it before saying it is ready
- before final answer, verify all requested acceptance criteria
- block a response that cites a file the agent did not read this turn
- require a tool call after a specific type of generated output
Quick setup:
mkdir -p ./workspace/harness-rules
cp examples/harness-rules/file-delivery-after-create.md ./workspace/harness-rules/
cp examples/harness-rules/final-answer-verifier.md ./workspace/harness-rules/
cp examples/harness-rules/tool-input-match.md ./workspace/harness-rules/
npx tsx src/cli/index.ts startYou can also put one structured rule in ./workspace/USER-HARNESS-RULES.md, or
write natural-language operational rules in ./workspace/USER-RULES.md:
- 파일을 만들면 반드시 채팅에 첨부해줘.
- 최종 답변 전에는 요구사항을 충족했는지 한 번 더 검사해.Structured Markdown rules use YAML frontmatter:
---
id: user-harness:file-delivery-after-create
trigger: beforeCommit
condition:
anyToolUsed:
- DocumentWrite
- SpreadsheetWrite
action:
type: require_tool
toolName: FileDeliver
enforcement: block_on_fail
timeoutMs: 2000
---
When a document or spreadsheet is created, deliver it to the chat before claiming completion.Supported triggers are beforeCommit and afterToolUse. Supported actions are
require_tool, require_tool_input_match, llm_verifier, and block.
require_tool_input_match checks that a successful same-turn tool call used a
specific tool input field, such as toolName: Bash with inputPath: command or
toolName: WebFetch with inputPath: url. Conditions can also include
userMessageMatches for regex-scoped rules. Unknown
natural-language lines stay advisory; recognized patterns and structured
frontmatter become executable rules. Set CORE_AGENT_USER_HARNESS_RULES=off to
disable these checks.
- Node.js 22+
- An API key for Anthropic, OpenAI, or Google
See CONTRIBUTING.md.
Apache 2.0. See LICENSE.