Skip to content

AgentBoundary v0.1 conformance evaluation of Claude Agent SDK permission_policy — pre-publication review #986

@sunilp

Description

@sunilp

Hi — I run JamJet Labs and am publishing a write-up that compares a handful of agent-governance products against an open spec I've been authoring called AgentBoundary (jamjet-labs/agentboundary, v0.1 stable + v0.2-alpha draft). AgentBoundary defines a portable JSON "receipt" for AI tool calls that a third party can verify without trusting the runtime that produced it. The Claude Agent SDK's permission_policy is one of four products in the comparison.

Posting here for a 7-day right-to-respond before publication. Corrections received in the window will be folded into the report; corrections received after appear inline with date stamps.

What I did

  • Read code.claude.com/docs/en/agent-sdk/permissions, code.claude.com/docs/en/agent-sdk/user-input, and the Managed Agents overview at platform.claude.com/docs/en/managed-agents/overview
  • Built an adapter at adapters/anthropic-permission-policy/ that translates a synthetic permission-decision event captured at the SDK boundary (hooks → deny → mode → allow → canUseTool) into an AgentBoundary v0.2-alpha receipt
  • Ran all 40 conformance scenarios against adapter-translated receipts
  • Per-scenario verdicts in results.md; SDK→receipt mapping in mapping.md

Headline

PASS         12
PARTIAL       9
DOCS-ONLY     3
NOT COVERED  14
N/A           2
──────────────
TOTAL        40

The framing in the report: Claude Agent SDK ships the richest runtime permission primitive of the four products evaluated — layered evaluation, scoped tool patterns (Bash(rm *)), permission modes, programmatic hooks, canUseTool callback with updatedInput. Level 3 hashing scenarios (33, 34, 39, 40) pass cleanly because tool_input is raw JSON the adapter canonicalises. Where the comparison shows a gap: there's no portable emitted artifact a third party can verify outside the Console. The Managed Agents Console maintains an audit log per the April 2026 launch announcement; the schema isn't publicly documented. So the report's framing is complementary: the SDK is the strongest runtime primitive; AgentBoundary is the export format for the artifact gap. A team can wrap canUseTool (or query()) and emit a v0.2-alpha receipt at the action boundary.

The ask: if any per-scenario mapping or factual claim is wrong, corrections are welcome here or via PR to jamjet-labs/agentboundary within 7 days. After that, the report publishes with the data as currently mapped.

Happy to share §7.1 (the Claude Agent SDK section, ~400 words) if either of you wants a sneak look. Thanks for shipping the SDK — canUseTool was the cleanest decision-capture surface of the four I evaluated.

— Sunil

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions