Releases: itsarbit/tokenwise
v0.5.0
Full Changelog: v0.4.5...v0.5.0
v0.4.5
Full Changelog: v0.4.4...v0.4.5
v0.4.4
Changed
- README polish — removed duplicate install section, modular CLI subsections, one-line feature bullets, compact Pareto plot with model labels, escalation shown in demo output, sharper positioning before comparison table
- Strategy benchmark plot — smaller figure (7×4), tighter Y-axis, model name labels, directional x-axis hints
v0.4.3
What's New in v0.4.3
Total-cost-aware budget enforcement
_compute_max_tokens now subtracts estimated input token cost before computing the output cap, ensuring total cost (input + output) stays within the budget ceiling. Previously only output price was considered, allowing input cost to push total spend over budget. A 1.2x safety margin on input token estimates accounts for tokenizer variance.
Parallel reservation-based execution
Independent plan steps now run concurrently via async DAG scheduling. Each step reserves its estimated cost before launch, preventing parallel steps from collectively overshooting the budget. Deadlock detection catches cyclic dependencies.
Improved ledger coverage
Every LLM call — including failed attempts and escalation retries — is recorded in the structured CostLedger. The persistent JSONL ledger (tokenwise ledger --summary) now includes planner cost in aggregate spend.
Configurable minimum output tokens
min_output_tokens is now a setting rather than a hardcoded constant. Configure via TOKENWISE_MIN_OUTPUT_TOKENS, config file, or Executor(min_output_tokens=N). Default remains 100. Set lower for workflows that need tiny outputs under tight budgets.
Clarified async usage
executor.execute(plan) auto-detects an existing event loop (Jupyter, FastAPI) and falls back to sequential execution. For concurrent DAG scheduling in async code, use await executor.aexecute(plan) directly. Budget accuracy footnote added to README — input token estimation is heuristic (chars/4 + 1.2x margin), not tokenizer-based.
Also in this release
- mypy strict: 0 errors across all 19 source files
types-PyYAMLadded to dev dependenciesLLMProvider.astream_completionprotocol signature corrected for async generators
Benchmark it yourself — single command, produces benchmarks/results.csv and benchmarks/pareto.png:
uv sync --group benchmark && uv run python benchmarks/pareto.py \
--models openai/gpt-4.1-nano openai/gpt-4.1-mini openai/gpt-4.1 \
anthropic/claude-sonnet-4 anthropic/claude-opus-4.6 \
--csv benchmarks/results.csv --output benchmarks/pareto.pngSee the Benchmarks section in the README for full model list and sample results.
Full Changelog: v0.4.2...v0.4.3
v0.4.2 — Production Hardening
Fixed
- Remove "creative" capability — removed unused
"creative"capability from decomposition prompt, curated model mapping, and test fixtures; the router had no creative patterns so it was silently restricting fallback to Anthropic-only models - Budget consistency in async executor —
aexecute()now passesstep.estimated_cost(notestimated_cost * 2) as the budget for each step, matching the reservation amount - Multi-capability downgrade —
_optimize_for_budgetnow checks all required capabilities when downgrading models, not just the first; prevents a step needing["code", "reasoning"]from being downgraded to a code-only model - max_tokens budget guardrail —
_execute_stepand_aexecute_stepnow compute and passmax_tokensbased on remaining budget, preventing a single completion from overshooting; steps with budget below 100 output tokens are skipped gracefully
Changed
- README restructured — moved blog post link to after Quick Start; softened absolute claims about cost ledger coverage
- Registry —
find_models()andcheapest()accept acapabilitieslist parameter for multi-capability filtering
Full changelog: https://github.com/itsarbit/tokenwise/blob/master/CHANGELOG.md
v0.4.1
Fixed
- Strict budget enforcement at call time — sequential executor now checks step's estimated cost against remaining budget before making the LLM call; steps that would exceed budget are skipped
- Ledger store summary — removed redundant
load(limit=0)call;summary()now uses_load_all()directly
v0.4.0
What's New
- Planner cost budgeted — the LLM call used for task decomposition now has its cost tracked and deducted from the user's budget;
Plan.planner_costexposes the cost; CLI displays it when > 0 - Parallel step execution — executor now runs independent steps concurrently via
asyncio.gather(); steps declaredepends_onindices in the decomposition prompt; the executor builds a DAG and launches ready steps in parallel;execute()delegates toasyncio.run(aexecute())transparently - Persistent spend tracking — new
LedgerStoreclass persists execution history to a JSONL file (~/.config/tokenwise/ledger.jsonlby default);tokenwise plan --executeauto-saves; newtokenwise ledgerCLI command shows history and--summaryaggregates TOKENWISE_LEDGER_PATH— new env var / config field to customize the ledger file path
Changed
- Decomposition prompt — now asks the LLM to produce
depends_on(0-indexed step indices) for each step; planner parses these and falls back to sequential chain if missing - Executor —
execute()now dispatches to async DAG-based scheduling; falls back to sequential when already inside an async event loop
Full Changelog: v0.3.0...v0.4.0
v0.3.0
What's New in v0.3.0
Added
- CostLedger — structured cost tracking across attempts and escalations;
PlanResult.ledgerrecords every LLM call with reason, model, tokens, cost, and success/failure - Strict budget ceiling —
router.route()now raisesValueErrorwhen no model fits the budget (controlled viabudget_strictparameter; defaultTrue) - Decomposition visibility —
Plannow exposesdecomposition_source("llm" or "fallback") anddecomposition_errorso callers know when task decomposition fell back - TTL on failed models — proxy's failed-model set now expires entries after 5 minutes (configurable) and caps at 50 entries, preventing unbounded growth
- Shared HTTP client — providers reuse the proxy's
httpx.AsyncClientinstead of creating a new client per request, reducing connection overhead - Ledger table in CLI —
tokenwise plan --executenow prints a Rich cost breakdown table with wasted-cost summary - Step-level capabilities —
Step.required_capabilitiesexplicitly tracks what each step needs, used during escalation filtering - Structured error codes —
StepResult.http_status_codecaptures the HTTP status from provider errors for reliable error classification
Changed
- Escalation ordering — executor and proxy now escalate to stronger tiers first (FLAGSHIP → MID) instead of trying budget tier first; fallback candidates are filtered by the full set of required capabilities
- Error classification — split HTTP codes into unusable (402, 403, 404) vs transient (500, 502, 503, 504);
_is_model_errorchecks the integer status code instead of brittle string matching - Router — budget error path now picks the cheapest model by
estimate_cost()(input + output) instead ofinput_pricealone - Router — budget is strict by default; planner uses
budget_strict=Falsefor its own internal routing - Package name — PyPI distribution renamed to
tokenwise-llm(import nametokenwiseunchanged)
Fixed
- Proxy
failed_modelsset no longer grows without bound across the server lifetime - Removed HTTP 400 from retryable/fallback codes (400 is a request schema error, not a model outage)