Design Spec: k8s-karpenter-control-plane-health
Parent: #90
Target: rw-cli-codecollection
Spec
codebundle_name: "k8s-karpenter-control-plane-health"
target_collection: "rw-cli-codecollection"
display_name: "Kubernetes Karpenter Control Plane Health"
author: "rw-codebundle-agent"
purpose: |
Monitors the health of the Karpenter controller installation in a cluster:
workload readiness, admission webhooks, and high-signal Kubernetes events in
the Karpenter namespace. This bundle answers "Is Karpenter itself running and
wired correctly?" before investigating provisioning behavior.
tasks:
- name: "Check Karpenter Controller Workload Health in Cluster `${CONTEXT}`"
description: "Verifies Karpenter controller pods (typical labels app.kubernetes.io/name=karpenter or helm release name) are Ready, counts restarts, and surfaces CrashLoopBackOff or missing replicas."
script_name: "check-karpenter-controller-pods.sh"
expected_issue_severity: [2, 3]
access_level: "read-only"
data_type: "config"
- name: "Verify Karpenter Admission Webhooks in Cluster `${CONTEXT}`"
description: "Lists ValidatingWebhookConfiguration and MutatingWebhookConfiguration objects that reference Karpenter and checks for misconfigured caBundle, missing endpoints, or webhook failures in recent events."
script_name: "check-karpenter-webhooks.sh"
expected_issue_severity: [2, 3]
access_level: "read-only"
data_type: "config"
- name: "Inspect Warning Events in Karpenter Namespace `${KARPENTER_NAMESPACE}`"
description: "Aggregates Warning-type events involving Karpenter pods, services, or webhook resources within RW_LOOKBACK_WINDOW; groups by involved object for triage."
script_name: "karpenter-namespace-warning-events.sh"
expected_issue_severity: [3, 4]
access_level: "read-only"
data_type: "events"
- name: "Summarize Installed Karpenter API Versions and CRDs in Cluster `${CONTEXT}`"
description: "Detects installed Karpenter CRD groups/versions (e.g. karpenter.sh, karpenter.k8s.aws) to support version-aware checks and to flag mixed or deprecated APIs."
script_name: "check-karpenter-crds.sh"
expected_issue_severity: [4]
access_level: "read-only"
data_type: "config"
- name: "Check Karpenter Service and Metrics Endpoints in Namespace `${KARPENTER_NAMESPACE}`"
description: "Validates Service exists for metrics/monitoring port and optionally probes /metrics or health endpoints where exposed, surfacing misconfiguration that breaks observability integrations."
script_name: "check-karpenter-service-metrics.sh"
expected_issue_severity: [3, 4]
access_level: "read-only"
data_type: "metrics"
scope:
level: "Resource"
qualifiers:
- CONTEXT
- KARPENTER_NAMESPACE
iteration_pattern: |
One taskset per Kubernetes cluster context. KARPENTER_NAMESPACE defaults to
`karpenter` but must be overridable for custom installs.
resource_types:
- "kubernetes_cluster"
generation_strategy: |
Generation rule matches cluster-level SLXs where Karpenter is installed.
Qualifier: cluster context name. Optional label or annotation on the workspace
can indicate Karpenter presence to avoid generating for clusters without it.
env_vars:
- name: CONTEXT
description: "kubectl context name for the target cluster"
required: true
- name: KARPENTER_NAMESPACE
description: "Namespace where Karpenter controller runs"
required: false
default: "karpenter"
- name: KUBERNETES_DISTRIBUTION_BINARY
description: "kubectl-compatible binary"
required: false
default: "kubectl"
- name: RW_LOOKBACK_WINDOW
description: "Time window for events (inherits platform convention)"
required: false
default: "30m"
secrets:
- name: kubeconfig
description: "Kubeconfig with read-only cluster access"
format: "Standard kubeconfig file"
platform:
name: "kubernetes"
cli_tools:
- "kubectl"
- "jq"
auth_methods:
- "kubeconfig"
api_docs: "https://karpenter.sh/docs/"
related_bundles:
- name: "k8s-namespace-healthcheck"
relationship: "complements"
notes: "General namespace triage; this bundle is Karpenter-scoped and adds webhook and controller-specific checks."
- name: "k8s-karpenter-autoscaling-health"
relationship: "complements"
notes: "Provisioning and log-level autoscaling signals live in the sibling bundle; run control-plane checks first."
test_scenarios:
- name: "healthy_karpenter_install"
description: "Controller pods ready, webhooks configured, no warning events"
expected_issues: 0
- name: "controller_crashloop"
description: "Karpenter pod in CrashLoopBackOff"
expected_issues: 1
expected_severities: [2]
notes: |
Karpenter Helm chart layouts differ; implementors should discover controller
workloads via common labels (app.kubernetes.io/name, app.kubernetes.io/instance)
and document any required overrides. Do not assume only EKS—keep checks
portable; cloud-specific CRDs appear in the autoscaling-health bundle.
Design Spec: k8s-karpenter-control-plane-health
Parent: #90
Target:
rw-cli-codecollectionSpec