Skip to content

feat: capture provider cost details#520

Closed
harrisony wants to merge 7 commits into
mcowger:mainfrom
harrisony:feature/capture-provider-cost-details
Closed

feat: capture provider cost details#520
harrisony wants to merge 7 commits into
mcowger:mainfrom
harrisony:feature/capture-provider-cost-details

Conversation

@harrisony
Copy link
Copy Markdown
Contributor

No description provided.

harrisony and others added 7 commits May 22, 2026 16:34
…r total cost

The previous implementation used a single safeCost() call wrapping a ??
chain: safeCost(details.total_cost ?? usage?.cost ?? usage?.estimated_cost).
This had two problems:

- upstream_inference_cost was not considered at all
- The entire ?? chain was validated as one, so safeCost() could not
  distinguish which source provided the value

Restructure into an explicit three-step fallback, each validated
independently:
1. cost_details.total_cost (LLM gateway, most detailed)
2. usage.cost (direct cost for OpenRouter-style providers)
3. cost_details.upstream_inference_cost (upstream OpenRouter-style cost)

Steps 2→3 use || (falsy coalescing) to handle the OpenRouter quirk where
usage.cost and/or upstream_inference_cost may be populated depending on
the upstream provider and key type used (e.g. BYOK keys report 0 for
usage.cost, but upstream_inference_cost carries the actual provider cost).
… from standard fields

Previously upstream_inference_prompt_cost was aliased directly into
input_cost. However, these fields have different semantics:

  upstream_inference_prompt_cost = input_cost + cached_input_cost
  (i.e., the combined prompt cost including cached tokens)

Aliasing it into input_cost caused applyUsageCostDetails to zero out
costCached, silently merging the cached portion into costInput.

Changes:
- Stop aliasing upstream_inference_prompt_cost → input_cost and
  upstream_inference_completions_cost → output_cost
- Add same-tier aliasing: upstream_inference_input_cost →
  upstream_inference_prompt_cost and upstream_inference_output_cost →
  upstream_inference_completions_cost (same semantics, different
  provider naming conventions)
- Update extraction tests to assert upstream fields stay separate
- Add tests for BYOK fallback, Responses API input/output variants,
  and LLM Gateway field priority
Previously applyUsageCostDetails only had two branches: superset
(per-bucket breakdown) and minimal (proportional distribution).
Now that extractUsageCostDetails no longer aliases normal-tier
upstream_inference_prompt_cost into input_cost, the normal-tier case
needs explicit handling.

Three tiers:
1. Superset: input_cost/cached_input_cost/cache_write_input_cost present
   → use per-bucket breakdown directly
2. Normal: upstream_inference_prompt_cost/completions_cost present but
   no input-side superset fields → use upstream prompt/completions split,
   then distribute the prompt portion by Plexus's own cache ratio
3. Minimal: no breakdown at all → proportional distribution from
   previously calculated costs
Add tests for: zero-cost non-BYOK requests, OpenRouter markup (cost >> upstream
sum), zero prompt tokens, heavy-cache-hit ratio split, end-to-end BYOK and
non-BYOK extract+apply flows. Also normalise scientific notation literals and
add missing upstream_inference_* fields to existing superset fixtures.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…che token shape

Three fixes informed by real provider testdata:

- applyUsageCostDetails: tighten Normal-tier guard from != null to > 0 on
  upstream_inference_prompt/completions_cost. Vercel AI Gateway emits these
  as 0 (not null) when it has no upstream breakdown, causing the Normal tier
  to fire and zero out all sub-costs despite total_cost being correct. Now
  falls through to Minimal tier (proportional distribution) as intended.

- normalizeOpenAIChatUsage: add prompt_cache_hit_tokens as final fallback for
  cached token count. DeepSeek sends both prompt_tokens_details.cached_tokens
  and prompt_cache_hit_tokens (consistent values), so real responses already
  work; the fallback is defensive for stripped-details payloads.

- response-handler / usage-logging: warn when both SSE :cost and
  usage.cost_details are present on the same response. SSE priority is
  preserved but the conflict is now visible in logs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…st_details block

Handles two providers that report cost at the top level of usage instead
of inside a cost_details block:

- usage.cost without cost_details: OpenRouter-routed models (e.g. Kimi-k2.5)
  that surface cost at the top level but don't populate cost_details. These
  were previously invisible to extractUsageCostDetails which required the
  cost_details object to be present.

- usage.cost_in_usd_ticks: xAI grok models. 1 USD = 10^10 ticks per xAI
  API docs. Converted to USD and returned as total_cost with all sub-cost
  fields null, so applyUsageCostDetails uses the Minimal tier (proportional
  distribution from Plexus-calculated token costs).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@harrisony
Copy link
Copy Markdown
Contributor Author

sorry, fat fingers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant