Skip to content

Releases: itsarbit/tokenwise

v0.5.0

23 Feb 16:36

Choose a tag to compare

v0.4.5

23 Feb 00:36

Choose a tag to compare

Full Changelog: v0.4.4...v0.4.5

v0.4.4

22 Feb 18:12

Choose a tag to compare

Changed

  • README polish — removed duplicate install section, modular CLI subsections, one-line feature bullets, compact Pareto plot with model labels, escalation shown in demo output, sharper positioning before comparison table
  • Strategy benchmark plot — smaller figure (7×4), tighter Y-axis, model name labels, directional x-axis hints

v0.4.3

22 Feb 15:57

Choose a tag to compare

What's New in v0.4.3

Total-cost-aware budget enforcement

_compute_max_tokens now subtracts estimated input token cost before computing the output cap, ensuring total cost (input + output) stays within the budget ceiling. Previously only output price was considered, allowing input cost to push total spend over budget. A 1.2x safety margin on input token estimates accounts for tokenizer variance.

Parallel reservation-based execution

Independent plan steps now run concurrently via async DAG scheduling. Each step reserves its estimated cost before launch, preventing parallel steps from collectively overshooting the budget. Deadlock detection catches cyclic dependencies.

Improved ledger coverage

Every LLM call — including failed attempts and escalation retries — is recorded in the structured CostLedger. The persistent JSONL ledger (tokenwise ledger --summary) now includes planner cost in aggregate spend.

Configurable minimum output tokens

min_output_tokens is now a setting rather than a hardcoded constant. Configure via TOKENWISE_MIN_OUTPUT_TOKENS, config file, or Executor(min_output_tokens=N). Default remains 100. Set lower for workflows that need tiny outputs under tight budgets.

Clarified async usage

executor.execute(plan) auto-detects an existing event loop (Jupyter, FastAPI) and falls back to sequential execution. For concurrent DAG scheduling in async code, use await executor.aexecute(plan) directly. Budget accuracy footnote added to README — input token estimation is heuristic (chars/4 + 1.2x margin), not tokenizer-based.

Also in this release

  • mypy strict: 0 errors across all 19 source files
  • types-PyYAML added to dev dependencies
  • LLMProvider.astream_completion protocol signature corrected for async generators

Benchmark it yourself — single command, produces benchmarks/results.csv and benchmarks/pareto.png:

uv sync --group benchmark && uv run python benchmarks/pareto.py \
  --models openai/gpt-4.1-nano openai/gpt-4.1-mini openai/gpt-4.1 \
    anthropic/claude-sonnet-4 anthropic/claude-opus-4.6 \
  --csv benchmarks/results.csv --output benchmarks/pareto.png

See the Benchmarks section in the README for full model list and sample results.

Full Changelog: v0.4.2...v0.4.3

v0.4.2 — Production Hardening

22 Feb 06:32

Choose a tag to compare

Fixed

  • Remove "creative" capability — removed unused "creative" capability from decomposition prompt, curated model mapping, and test fixtures; the router had no creative patterns so it was silently restricting fallback to Anthropic-only models
  • Budget consistency in async executoraexecute() now passes step.estimated_cost (not estimated_cost * 2) as the budget for each step, matching the reservation amount
  • Multi-capability downgrade_optimize_for_budget now checks all required capabilities when downgrading models, not just the first; prevents a step needing ["code", "reasoning"] from being downgraded to a code-only model
  • max_tokens budget guardrail_execute_step and _aexecute_step now compute and pass max_tokens based on remaining budget, preventing a single completion from overshooting; steps with budget below 100 output tokens are skipped gracefully

Changed

  • README restructured — moved blog post link to after Quick Start; softened absolute claims about cost ledger coverage
  • Registryfind_models() and cheapest() accept a capabilities list parameter for multi-capability filtering

Full changelog: https://github.com/itsarbit/tokenwise/blob/master/CHANGELOG.md

v0.4.1

22 Feb 06:06

Choose a tag to compare

Fixed

  • Strict budget enforcement at call time — sequential executor now checks step's estimated cost against remaining budget before making the LLM call; steps that would exceed budget are skipped
  • Ledger store summary — removed redundant load(limit=0) call; summary() now uses _load_all() directly

v0.4.0

21 Feb 16:11

Choose a tag to compare

What's New

  • Planner cost budgeted — the LLM call used for task decomposition now has its cost tracked and deducted from the user's budget; Plan.planner_cost exposes the cost; CLI displays it when > 0
  • Parallel step execution — executor now runs independent steps concurrently via asyncio.gather(); steps declare depends_on indices in the decomposition prompt; the executor builds a DAG and launches ready steps in parallel; execute() delegates to asyncio.run(aexecute()) transparently
  • Persistent spend tracking — new LedgerStore class persists execution history to a JSONL file (~/.config/tokenwise/ledger.jsonl by default); tokenwise plan --execute auto-saves; new tokenwise ledger CLI command shows history and --summary aggregates
  • TOKENWISE_LEDGER_PATH — new env var / config field to customize the ledger file path

Changed

  • Decomposition prompt — now asks the LLM to produce depends_on (0-indexed step indices) for each step; planner parses these and falls back to sequential chain if missing
  • Executorexecute() now dispatches to async DAG-based scheduling; falls back to sequential when already inside an async event loop

Full Changelog: v0.3.0...v0.4.0

v0.3.0

21 Feb 07:21

Choose a tag to compare

What's New in v0.3.0

Added

  • CostLedger — structured cost tracking across attempts and escalations; PlanResult.ledger records every LLM call with reason, model, tokens, cost, and success/failure
  • Strict budget ceilingrouter.route() now raises ValueError when no model fits the budget (controlled via budget_strict parameter; default True)
  • Decomposition visibilityPlan now exposes decomposition_source ("llm" or "fallback") and decomposition_error so callers know when task decomposition fell back
  • TTL on failed models — proxy's failed-model set now expires entries after 5 minutes (configurable) and caps at 50 entries, preventing unbounded growth
  • Shared HTTP client — providers reuse the proxy's httpx.AsyncClient instead of creating a new client per request, reducing connection overhead
  • Ledger table in CLItokenwise plan --execute now prints a Rich cost breakdown table with wasted-cost summary
  • Step-level capabilitiesStep.required_capabilities explicitly tracks what each step needs, used during escalation filtering
  • Structured error codesStepResult.http_status_code captures the HTTP status from provider errors for reliable error classification

Changed

  • Escalation ordering — executor and proxy now escalate to stronger tiers first (FLAGSHIP → MID) instead of trying budget tier first; fallback candidates are filtered by the full set of required capabilities
  • Error classification — split HTTP codes into unusable (402, 403, 404) vs transient (500, 502, 503, 504); _is_model_error checks the integer status code instead of brittle string matching
  • Router — budget error path now picks the cheapest model by estimate_cost() (input + output) instead of input_price alone
  • Router — budget is strict by default; planner uses budget_strict=False for its own internal routing
  • Package name — PyPI distribution renamed to tokenwise-llm (import name tokenwise unchanged)

Fixed

  • Proxy failed_models set no longer grows without bound across the server lifetime
  • Removed HTTP 400 from retryable/fallback codes (400 is a request schema error, not a model outage)