AgentBoundary v0.1 conformance evaluation of Claude Agent SDK permission_policy — pre-publication review

Hi — I run JamJet Labs and am publishing a write-up that compares a handful of agent-governance products against an open spec I've been authoring called **AgentBoundary** (`jamjet-labs/agentboundary`, v0.1 stable + v0.2-alpha draft). AgentBoundary defines a portable JSON "receipt" for AI tool calls that a third party can verify without trusting the runtime that produced it. The Claude Agent SDK's `permission_policy` is one of four products in the comparison.

Posting here for a 7-day right-to-respond before publication. Corrections received in the window will be folded into the report; corrections received after appear inline with date stamps.

**What I did**

- Read `code.claude.com/docs/en/agent-sdk/permissions`, `code.claude.com/docs/en/agent-sdk/user-input`, and the Managed Agents overview at `platform.claude.com/docs/en/managed-agents/overview`
- Built an adapter at [`adapters/anthropic-permission-policy/`](https://github.com/jamjet-labs/agentboundary/tree/main/adapters/anthropic-permission-policy) that translates a synthetic permission-decision event captured at the SDK boundary (hooks → deny → mode → allow → `canUseTool`) into an AgentBoundary v0.2-alpha receipt
- Ran all 40 conformance scenarios against adapter-translated receipts
- Per-scenario verdicts in [`results.md`](https://github.com/jamjet-labs/agentboundary/blob/main/adapters/anthropic-permission-policy/results.md); SDK→receipt mapping in [`mapping.md`](https://github.com/jamjet-labs/agentboundary/blob/main/adapters/anthropic-permission-policy/mapping.md)

**Headline**

```
PASS         12
PARTIAL       9
DOCS-ONLY     3
NOT COVERED  14
N/A           2
──────────────
TOTAL        40
```

**The framing in the report:** Claude Agent SDK ships the richest *runtime permission primitive* of the four products evaluated — layered evaluation, scoped tool patterns (`Bash(rm *)`), permission modes, programmatic hooks, `canUseTool` callback with `updatedInput`. Level 3 hashing scenarios (33, 34, 39, 40) pass cleanly because `tool_input` is raw JSON the adapter canonicalises. Where the comparison shows a gap: there's no portable emitted artifact a third party can verify outside the Console. The Managed Agents Console maintains an audit log per the April 2026 launch announcement; the schema isn't publicly documented. So the report's framing is *complementary*: the SDK is the strongest runtime primitive; AgentBoundary is the export format for the artifact gap. A team can wrap `canUseTool` (or `query()`) and emit a v0.2-alpha receipt at the action boundary.

**The ask:** if any per-scenario mapping or factual claim is wrong, corrections are welcome here or via PR to `jamjet-labs/agentboundary` within 7 days. After that, the report publishes with the data as currently mapped.

Happy to share §7.1 (the Claude Agent SDK section, ~400 words) if either of you wants a sneak look. Thanks for shipping the SDK — `canUseTool` was the cleanest decision-capture surface of the four I evaluated.

— Sunil


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AgentBoundary v0.1 conformance evaluation of Claude Agent SDK permission_policy — pre-publication review #986

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

AgentBoundary v0.1 conformance evaluation of Claude Agent SDK permission_policy — pre-publication review #986

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions