Releases · itsarbit/tokenwise

README polish — removed duplicate install section, modular CLI subsections, one-line feature bullets, compact Pareto plot with model labels, escalation shown in demo output, sharper positioning before comparison table
Strategy benchmark plot — smaller figure (7×4), tighter Y-axis, model name labels, directional x-axis hints

Assets 2

22 Feb 15:57

itsarbit

v0.4.3

8073dd3

v0.4.3

What's New in v0.4.3

Total-cost-aware budget enforcement

_compute_max_tokens now subtracts estimated input token cost before computing the output cap, ensuring total cost (input + output) stays within the budget ceiling. Previously only output price was considered, allowing input cost to push total spend over budget. A 1.2x safety margin on input token estimates accounts for tokenizer variance.

Parallel reservation-based execution

Independent plan steps now run concurrently via async DAG scheduling. Each step reserves its estimated cost before launch, preventing parallel steps from collectively overshooting the budget. Deadlock detection catches cyclic dependencies.

Improved ledger coverage

Every LLM call — including failed attempts and escalation retries — is recorded in the structured CostLedger. The persistent JSONL ledger (tokenwise ledger --summary) now includes planner cost in aggregate spend.

Configurable minimum output tokens

min_output_tokens is now a setting rather than a hardcoded constant. Configure via TOKENWISE_MIN_OUTPUT_TOKENS, config file, or Executor(min_output_tokens=N). Default remains 100. Set lower for workflows that need tiny outputs under tight budgets.

Clarified async usage

executor.execute(plan) auto-detects an existing event loop (Jupyter, FastAPI) and falls back to sequential execution. For concurrent DAG scheduling in async code, use await executor.aexecute(plan) directly. Budget accuracy footnote added to README — input token estimation is heuristic (chars/4 + 1.2x margin), not tokenizer-based.

Also in this release

mypy strict: 0 errors across all 19 source files
types-PyYAML added to dev dependencies
LLMProvider.astream_completion protocol signature corrected for async generators

Benchmark it yourself — single command, produces benchmarks/results.csv and benchmarks/pareto.png:

uv sync --group benchmark && uv run python benchmarks/pareto.py \
  --models openai/gpt-4.1-nano openai/gpt-4.1-mini openai/gpt-4.1 \
    anthropic/claude-sonnet-4 anthropic/claude-opus-4.6 \
  --csv benchmarks/results.csv --output benchmarks/pareto.png

See the Benchmarks section in the README for full model list and sample results.

Full Changelog: v0.4.2...v0.4.3

Assets 2

22 Feb 06:32

itsarbit

v0.4.2

67f6e9e

v0.4.2 — Production Hardening

Fixed

Remove "creative" capability — removed unused "creative" capability from decomposition prompt, curated model mapping, and test fixtures; the router had no creative patterns so it was silently restricting fallback to Anthropic-only models
Budget consistency in async executor — aexecute() now passes step.estimated_cost (not estimated_cost * 2) as the budget for each step, matching the reservation amount
Multi-capability downgrade — _optimize_for_budget now checks all required capabilities when downgrading models, not just the first; prevents a step needing ["code", "reasoning"] from being downgraded to a code-only model
max_tokens budget guardrail — _execute_step and _aexecute_step now compute and pass max_tokens based on remaining budget, preventing a single completion from overshooting; steps with budget below 100 output tokens are skipped gracefully

Changed

README restructured — moved blog post link to after Quick Start; softened absolute claims about cost ledger coverage
Registry — find_models() and cheapest() accept a capabilities list parameter for multi-capability filtering

Full changelog: https://github.com/itsarbit/tokenwise/blob/master/CHANGELOG.md

Assets 2

22 Feb 06:06

itsarbit

v0.4.1

d315bec

v0.4.1

Fixed

Strict budget enforcement at call time — sequential executor now checks step's estimated cost against remaining budget before making the LLM call; steps that would exceed budget are skipped
Ledger store summary — removed redundant load(limit=0) call; summary() now uses _load_all() directly

Assets 2

21 Feb 16:11

itsarbit

v0.4.0

f262535

v0.4.0

What's New

Planner cost budgeted — the LLM call used for task decomposition now has its cost tracked and deducted from the user's budget; Plan.planner_cost exposes the cost; CLI displays it when > 0
Parallel step execution — executor now runs independent steps concurrently via asyncio.gather(); steps declare depends_on indices in the decomposition prompt; the executor builds a DAG and launches ready steps in parallel; execute() delegates to asyncio.run(aexecute()) transparently
Persistent spend tracking — new LedgerStore class persists execution history to a JSONL file (~/.config/tokenwise/ledger.jsonl by default); tokenwise plan --execute auto-saves; new tokenwise ledger CLI command shows history and --summary aggregates
TOKENWISE_LEDGER_PATH — new env var / config field to customize the ledger file path

Changed

Decomposition prompt — now asks the LLM to produce depends_on (0-indexed step indices) for each step; planner parses these and falls back to sequential chain if missing
Executor — execute() now dispatches to async DAG-based scheduling; falls back to sequential when already inside an async event loop

Full Changelog: v0.3.0...v0.4.0

Assets 2

21 Feb 07:21

itsarbit

v0.3.0

41d99ec

v0.3.0

What's New in v0.3.0

Added

CostLedger — structured cost tracking across attempts and escalations; PlanResult.ledger records every LLM call with reason, model, tokens, cost, and success/failure
Strict budget ceiling — router.route() now raises ValueError when no model fits the budget (controlled via budget_strict parameter; default True)
Decomposition visibility — Plan now exposes decomposition_source ("llm" or "fallback") and decomposition_error so callers know when task decomposition fell back
TTL on failed models — proxy's failed-model set now expires entries after 5 minutes (configurable) and caps at 50 entries, preventing unbounded growth
Shared HTTP client — providers reuse the proxy's httpx.AsyncClient instead of creating a new client per request, reducing connection overhead
Ledger table in CLI — tokenwise plan --execute now prints a Rich cost breakdown table with wasted-cost summary
Step-level capabilities — Step.required_capabilities explicitly tracks what each step needs, used during escalation filtering
Structured error codes — StepResult.http_status_code captures the HTTP status from provider errors for reliable error classification

Changed

Escalation ordering — executor and proxy now escalate to stronger tiers first (FLAGSHIP → MID) instead of trying budget tier first; fallback candidates are filtered by the full set of required capabilities
Error classification — split HTTP codes into unusable (402, 403, 404) vs transient (500, 502, 503, 504); _is_model_error checks the integer status code instead of brittle string matching
Router — budget error path now picks the cheapest model by estimate_cost() (input + output) instead of input_price alone
Router — budget is strict by default; planner uses budget_strict=False for its own internal routing
Package name — PyPI distribution renamed to tokenwise-llm (import name tokenwise unchanged)

Fixed

Proxy failed_models set no longer grows without bound across the server lifetime
Removed HTTP 400 from retryable/fallback codes (400 is a request schema error, not a model outage)

Assets 2

Releases: itsarbit/tokenwise

v0.5.0

Uh oh!

v0.4.5

Uh oh!

v0.4.4

Changed

Uh oh!

v0.4.3

What's New in v0.4.3

Total-cost-aware budget enforcement

Parallel reservation-based execution

Improved ledger coverage

Configurable minimum output tokens

Clarified async usage

Also in this release

Uh oh!

v0.4.2 — Production Hardening

Fixed

Changed

Uh oh!

v0.4.1

Fixed

Uh oh!

v0.4.0

What's New

Changed

Uh oh!

v0.3.0

What's New in v0.3.0

Added

Changed

Fixed

Uh oh!