Design Spec: k8s-litellm-spend-governance
Parent: #87
Target: rw-cli-codecollection
Spec
codebundle_name: "k8s-litellm-spend-governance"
target_collection: "rw-cli-codecollection"
display_name: "Kubernetes LiteLLM Spend and Governance"
author: "rw-codebundle-agent"
purpose: |
Surfaces LiteLLM operational and cost governance signals from proxy Admin APIs:
spend logs, global spend reports, per-key/user/team budgets and rate limits, and
aggregates that highlight failed or budget-blocked traffic—without relying on
container log scraping alone.
tasks:
- name: "Review Recent Spend Logs for Failures"
description: "Queries /spend/logs (with date/window parameters) and flags rows indicating errors, budget_exceeded, rate_limited, or provider failures."
script_name: "review-litellm-spend-logs.sh"
expected_issue_severity: [2, 4]
access_level: "read-only"
data_type: "metrics"
- name: "Check Global Spend Report Against Threshold"
description: "Calls /global/spend/report over a configurable window and compares total spend to LITELLM_SPEND_THRESHOLD_USD (or relative increase vs prior window)."
script_name: "check-litellm-global-spend.sh"
expected_issue_severity: [2, 3]
access_level: "read-only"
data_type: "metrics"
- name: "Inspect Virtual Key Spend and Remaining Budget"
description: "Uses key metadata endpoints (for example /key/info or list keys) to report keys near max_budget, expired keys, or anomalous spend velocity."
script_name: "inspect-litellm-key-budgets.sh"
expected_issue_severity: [2, 3]
access_level: "read-only"
data_type: "metrics"
- name: "Review User Budget and Rate Limit Status"
description: "Calls /user/info (and related) for configured user_id(s) to surface soft_budget_cooldown, tpm/rpm limits, and spend versus budget."
script_name: "review-litellm-user-budgets.sh"
expected_issue_severity: [2, 3]
access_level: "read-only"
data_type: "metrics"
- name: "Summarize Team Budgets and Limits"
description: "When team IDs or aliases are configured, queries team endpoints to verify rpm/tpm/max_budget settings and highlight teams at risk of blocking traffic."
script_name: "summarize-litellm-team-budgets.sh"
expected_issue_severity: [2, 3]
access_level: "read-only"
data_type: "metrics"
- name: "Aggregate Error and Blocked Request Signals"
description: "Derives a short summary of failure modes from spend logs and proxy error fields (for example budget_exceeded count, provider 429/5xx) for quick triage."
script_name: "aggregate-litellm-failure-signals.sh"
expected_issue_severity: [3, 4]
access_level: "read-only"
data_type: "metrics"
scope:
level: "Resource"
qualifiers:
- CONTEXT
- NAMESPACE
- LITELLM_SERVICE_NAME
iteration_pattern: |
Same LiteLLM proxy Service resource as the health bundle; this bundle focuses on
authenticated Admin/spend routes against PROXY_BASE_URL.
resource_types:
- "kubernetes_service"
generation_strategy: |
Pair with k8s-litellm-proxy-health SLX generation: one spend-governance SLX per
proxy instance (context + namespace + service). Optional additional qualifiers for
TEAM_IDS or USER_IDS when operators want scoped governance reports.
env_vars:
- name: CONTEXT
description: "Kubernetes context for optional kubectl correlation"
required: true
- name: NAMESPACE
description: "Namespace of the LiteLLM deployment"
required: true
- name: PROXY_BASE_URL
description: "LiteLLM proxy base URL for API calls"
required: true
- name: LITELLM_SERVICE_NAME
description: "Service name for labeling and docs"
required: true
- name: RW_LOOKBACK_WINDOW
description: "Time window for spend logs and reports (for example 24h, 7d)—implementer maps to API date params"
required: false
default: "24h"
- name: LITELLM_SPEND_THRESHOLD_USD
description: "Alert when global or scoped spend in the window exceeds this USD amount (0 disables)"
required: false
default: "0"
- name: LITELLM_USER_IDS
description: "Comma-separated internal user_ids to check with /user/info; empty skips user task"
required: false
default: ""
- name: LITELLM_TEAM_IDS
description: "Comma-separated team identifiers for team budget task; empty skips"
required: false
default: ""
secrets:
- name: litellm_master_key
description: "Master key or key with permissions for /spend and /global/spend/report routes"
format: "Bearer token"
- name: kubeconfig
description: "kubeconfig for optional kubectl context"
format: "kubeconfig YAML"
platform:
name: "kubernetes"
cli_tools:
- "kubectl"
- "curl"
- "jq"
auth_methods:
- "Bearer master key with spend route permissions"
api_docs: "https://docs.litellm.ai/docs/proxy/cost_tracking"
related_bundles:
- name: "k8s-litellm-proxy-health"
relationship: "complements"
notes: "Health bundle validates proxy availability; this bundle validates financial and policy pressure (budgets, failures)."
- name: "k8s-prometheus-healthcheck"
relationship: "complements"
notes: "If Prometheus scrapes LiteLLM metrics, that bundle complements API-derived spend and error summaries."
test_scenarios:
- name: "nominal_spend"
description: "Spend within threshold, no budget_exceeded in logs"
expected_issues: 0
- name: "budget_exceeded_spike"
description: "Spend logs show repeated budget_exceeded for a key"
expected_issues: 2
expected_severities: [3, 3]
notes: |
Some spend and team report routes are Enterprise or require specific key
permissions; implement graceful degradation with clear issues when HTTP 403
indicates missing scope. Database-backed features must be enabled on the proxy
for full spend logs. Prefer summarization over dumping full log payloads to stay
within report size limits. Document port-forward alongside k8s-litellm-proxy-health.
Design Spec: k8s-litellm-spend-governance
Parent: #87
Target:
rw-cli-codecollectionSpec