Problem
-
The SDK provides token usage in two places: RateLimitEvent.utilization (session-level, lagging) and model_usage in ResultMessage (per-agent, but only after completion). During
execution, there's no way to observe how many tokens a specific subagent has consumed so far.
-
This makes it impossible to build real-time resource management on top of the SDK. You can set a static max_budget_usd kill switch, but you can't dynamically adjust agent behavior based on actual consumption mid-execution.
Proposal
Emit a TokenUsageEvent (or extend the existing RateLimitEvent) per subagent during execution:
{
"type": "token_usage",
"agent_id": "researcher-agent",
"input_tokens": 14200,
"output_tokens": 3800,
"cache_read_input_tokens": 600,
"cumulative_cost_usd": 0.042,
"turn": 3
}
This could be emitted after each LLM call within a subagent, similar to how RateLimitEvent is already streamed.
Use case
- I'm building an open-source agent scheduler that sits above agent frameworks and manages resource allocation across concurrent agents.
- The Anthropic SDK's hook architecture (
PreToolUse/PostToolUse at zero token cost) is already the most scheduler-friendly integration point across all the major agent platforms I analyzed.
- Streaming per-agent token usage would complete the picture by providing hooks for admission control and usage events for real-time accounting.
Concrete scenarios:
- Throttle a subagent that's consuming disproportionate tokens before it hits
max_budget_usd.
- Downgrade a subagent to a cheaper model mid-execution if utilization patterns suggest the task is simpler than expected.
- Rebalance
max_turns across concurrent subagents based on actual vs. projected consumption.
Problem
The SDK provides token usage in two places:
RateLimitEvent.utilization(session-level, lagging) andmodel_usageinResultMessage(per-agent, but only after completion). Duringexecution, there's no way to observe how many tokens a specific subagent has consumed so far.
This makes it impossible to build real-time resource management on top of the SDK. You can set a static
max_budget_usdkill switch, but you can't dynamically adjust agent behavior based on actual consumption mid-execution.Proposal
Emit a
TokenUsageEvent(or extend the existingRateLimitEvent) per subagent during execution:{
"type": "token_usage",
"agent_id": "researcher-agent",
"input_tokens": 14200,
"output_tokens": 3800,
"cache_read_input_tokens": 600,
"cumulative_cost_usd": 0.042,
"turn": 3
}
This could be emitted after each LLM call within a subagent, similar to how
RateLimitEventis already streamed.Use case
PreToolUse/PostToolUseat zero token cost) is already the most scheduler-friendly integration point across all the major agent platforms I analyzed.Concrete scenarios:
max_budget_usd.max_turnsacross concurrent subagents based on actual vs. projected consumption.