Hi — I run JamJet Labs and am publishing a write-up that compares a handful of agent-governance products against an open spec I've been authoring called AgentBoundary (jamjet-labs/agentboundary, v0.1 stable + v0.2-alpha draft). AgentBoundary defines a portable JSON "receipt" for AI tool calls that a third party can verify without trusting the runtime that produced it. The Claude Agent SDK's permission_policy is one of four products in the comparison.
Posting here for a 7-day right-to-respond before publication. Corrections received in the window will be folded into the report; corrections received after appear inline with date stamps.
What I did
- Read
code.claude.com/docs/en/agent-sdk/permissions, code.claude.com/docs/en/agent-sdk/user-input, and the Managed Agents overview at platform.claude.com/docs/en/managed-agents/overview
- Built an adapter at
adapters/anthropic-permission-policy/ that translates a synthetic permission-decision event captured at the SDK boundary (hooks → deny → mode → allow → canUseTool) into an AgentBoundary v0.2-alpha receipt
- Ran all 40 conformance scenarios against adapter-translated receipts
- Per-scenario verdicts in
results.md; SDK→receipt mapping in mapping.md
Headline
PASS 12
PARTIAL 9
DOCS-ONLY 3
NOT COVERED 14
N/A 2
──────────────
TOTAL 40
The framing in the report: Claude Agent SDK ships the richest runtime permission primitive of the four products evaluated — layered evaluation, scoped tool patterns (Bash(rm *)), permission modes, programmatic hooks, canUseTool callback with updatedInput. Level 3 hashing scenarios (33, 34, 39, 40) pass cleanly because tool_input is raw JSON the adapter canonicalises. Where the comparison shows a gap: there's no portable emitted artifact a third party can verify outside the Console. The Managed Agents Console maintains an audit log per the April 2026 launch announcement; the schema isn't publicly documented. So the report's framing is complementary: the SDK is the strongest runtime primitive; AgentBoundary is the export format for the artifact gap. A team can wrap canUseTool (or query()) and emit a v0.2-alpha receipt at the action boundary.
The ask: if any per-scenario mapping or factual claim is wrong, corrections are welcome here or via PR to jamjet-labs/agentboundary within 7 days. After that, the report publishes with the data as currently mapped.
Happy to share §7.1 (the Claude Agent SDK section, ~400 words) if either of you wants a sneak look. Thanks for shipping the SDK — canUseTool was the cleanest decision-capture surface of the four I evaluated.
— Sunil
Hi — I run JamJet Labs and am publishing a write-up that compares a handful of agent-governance products against an open spec I've been authoring called AgentBoundary (
jamjet-labs/agentboundary, v0.1 stable + v0.2-alpha draft). AgentBoundary defines a portable JSON "receipt" for AI tool calls that a third party can verify without trusting the runtime that produced it. The Claude Agent SDK'spermission_policyis one of four products in the comparison.Posting here for a 7-day right-to-respond before publication. Corrections received in the window will be folded into the report; corrections received after appear inline with date stamps.
What I did
code.claude.com/docs/en/agent-sdk/permissions,code.claude.com/docs/en/agent-sdk/user-input, and the Managed Agents overview atplatform.claude.com/docs/en/managed-agents/overviewadapters/anthropic-permission-policy/that translates a synthetic permission-decision event captured at the SDK boundary (hooks → deny → mode → allow →canUseTool) into an AgentBoundary v0.2-alpha receiptresults.md; SDK→receipt mapping inmapping.mdHeadline
The framing in the report: Claude Agent SDK ships the richest runtime permission primitive of the four products evaluated — layered evaluation, scoped tool patterns (
Bash(rm *)), permission modes, programmatic hooks,canUseToolcallback withupdatedInput. Level 3 hashing scenarios (33, 34, 39, 40) pass cleanly becausetool_inputis raw JSON the adapter canonicalises. Where the comparison shows a gap: there's no portable emitted artifact a third party can verify outside the Console. The Managed Agents Console maintains an audit log per the April 2026 launch announcement; the schema isn't publicly documented. So the report's framing is complementary: the SDK is the strongest runtime primitive; AgentBoundary is the export format for the artifact gap. A team can wrapcanUseTool(orquery()) and emit a v0.2-alpha receipt at the action boundary.The ask: if any per-scenario mapping or factual claim is wrong, corrections are welcome here or via PR to
jamjet-labs/agentboundarywithin 7 days. After that, the report publishes with the data as currently mapped.Happy to share §7.1 (the Claude Agent SDK section, ~400 words) if either of you wants a sneak look. Thanks for shipping the SDK —
canUseToolwas the cleanest decision-capture surface of the four I evaluated.— Sunil