Skip to content

[design-spec] k8s-litellm-proxy-health #88

@rw-codebundle-agent

Description

@rw-codebundle-agent

Design Spec: k8s-litellm-proxy-health

Parent: #87
Target: rw-cli-codecollection

Spec

codebundle_name: "k8s-litellm-proxy-health"
target_collection: "rw-cli-codecollection"
display_name: "Kubernetes LiteLLM Proxy API Health"
author: "rw-codebundle-agent"

purpose: |
  Exposes LiteLLM proxy health beyond pod logs by calling the proxy HTTP API:
  liveness/readiness, configured models/routes, optional deep model probes, and
  integration health—typically via in-cluster Service URL or kubectl port-forward
  to the proxy Service.

tasks:
  - name: "Check LiteLLM Liveness Endpoint"
    description: "Calls GET /health/liveliness (or equivalent) to confirm the proxy process responds without invoking upstream LLMs."
    script_name: "check-litellm-liveness.sh"
    expected_issue_severity: [3, 4]
    access_level: "read-only"
    data_type: "metrics"

  - name: "Check LiteLLM Readiness and Dependencies"
    description: "Calls GET /health/readiness and surfaces DB/cache connectivity, version, and callback registration—detects misconfigured persistence or startup failures."
    script_name: "check-litellm-readiness.sh"
    expected_issue_severity: [2, 3]
    access_level: "read-only"
    data_type: "metrics"

  - name: "List Configured Models and Routes"
    description: "Uses /v1/models and/or /model/info to verify expected providers/models are registered; flags empty or unexpected routing relative to config intent."
    script_name: "list-litellm-models.sh"
    expected_issue_severity: [2, 3]
    access_level: "read-only"
    data_type: "logs-config"

  - name: "Optional Deep Model Health Probe"
    description: "When enabled, calls GET /health with master key to run upstream health checks; default off or short timeout because it invokes real LLM API calls and can be costly."
    script_name: "check-litellm-deep-health.sh"
    expected_issue_severity: [3, 4]
    access_level: "read-only"
    data_type: "metrics"

  - name: "Check External Integration Service Health"
    description: "Calls GET /health/services for named integrations (for example observability exporters) when configured; reports unhealthy sidecar integrations."
    script_name: "check-litellm-integration-health.sh"
    expected_issue_severity: [2, 3]
    access_level: "read-only"
    data_type: "metrics"

  - name: "Verify Kubernetes Service Reachability Context"
    description: "Optional kubectl-based checks (Service/Endpoints presence, port) to correlate API failures with cluster networking or missing endpoints—not a substitute for API checks."
    script_name: "verify-litellm-k8s-service.sh"
    expected_issue_severity: [2, 3]
    access_level: "read-only"
    data_type: "metrics"

scope:
  level: "Resource"
  qualifiers:
    - CONTEXT
    - NAMESPACE
    - LITELLM_SERVICE_NAME
  iteration_pattern: |
    One SLX per LiteLLM proxy Service (or Deployment) in a namespace discovered by
    label/name qualifier; PROXY_BASE_URL may point to ClusterIP/DNS or localhost
    when using kubectl port-forward documented in README.

resource_types:
  - "kubernetes_service"
generation_strategy: |
  Generate one SLX per matched Service (or annotated Deployment) in the target namespace
  representing the LiteLLM proxy. Qualifiers: cluster context, namespace, service name.
  Resource match: kubernetes_service with optional label selector for litellm proxy.

env_vars:
  - name: CONTEXT
    description: "Kubernetes context to use for optional kubectl checks"
    required: true

  - name: NAMESPACE
    description: "Namespace where LiteLLM proxy runs"
    required: true

  - name: PROXY_BASE_URL
    description: "Base URL for LiteLLM HTTP API (for example http://127.0.0.1:4000 after kubectl port-forward svc/...)"
    required: true

  - name: LITELLM_SERVICE_NAME
    description: "Kubernetes Service name for the LiteLLM proxy (for discovery and kubectl helpers)"
    required: true

  - name: LITELLM_HTTP_PORT
    description: "Service port number for the proxy HTTP listener"
    required: false
    default: "4000"

  - name: LITELLM_RUN_DEEP_HEALTH
    description: "Set true to enable expensive GET /health upstream probes"
    required: false
    default: "false"

  - name: LITELLM_INTEGRATION_SERVICES
    description: "Comma-separated integration names for /health/services checks, or empty to skip"
    required: false
    default: ""

secrets:
  - name: litellm_master_key
    description: "LiteLLM master or admin API key (Bearer) for protected endpoints such as /health, /health/services, and some model info routes"
    format: "Plain text or secret reference compatible with RunWhen secret import"

  - name: kubeconfig
    description: "Standard kubeconfig for kubectl-backed optional tasks"
    format: "kubeconfig YAML"

platform:
  name: "kubernetes"
  cli_tools:
    - "kubectl"
    - "curl"
    - "jq"
  auth_methods:
    - "Bearer token (LITELLM master key)"
    - "kubeconfig for cluster context"
  api_docs: "https://docs.litellm.ai/docs/proxy/health"

related_bundles:
  - name: "k8s-litellm-spend-governance"
    relationship: "complements"
    notes: "Proxy-health covers availability and routing; spend-governance covers budgets, spend logs, and failure-oriented usage analytics."

  - name: "k8s-redis-healthcheck"
    relationship: "complements"
    notes: "When Redis backs the proxy, Redis bundle validates datastore health; this bundle validates the proxy API layer."

  - name: "k8s-jaeger-http-query"
    relationship: "complements"
    notes: "Similar pattern of HTTP API diagnostics against a Kubernetes-scoped workload."

test_scenarios:
  - name: "healthy_proxy"
    description: "Readiness connected, liveness OK, models listed"
    expected_issues: 0

  - name: "db_disconnected"
    description: "Readiness shows DB error"
    expected_issues: 1
    expected_severities: [3]

notes: |
  Document kubectl port-forward for internal-only Services (for example
  kubectl port-forward -n NAMESPACE svc/LITELLM_SERVICE_NAME 4000:4000) and warn
  that GET /health performs real upstream LLM calls—gate behind LITELLM_RUN_DEEP_HEALTH.
  LiteLLM versions differ slightly; prefer tolerant JSON parsing and explicit HTTP
  status handling. Align timeouts with RunWhen task limits (<1 min total suite).

Metadata

Metadata

Assignees

No one assigned

    Labels

    completedAgent work completeddesign-specArchitect has produced a design specnew-codebundleScoped issue for SRE to implement a new CodeBundle

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions