OpenClaw Kubernetes Operator

Self-host OpenClaw AI agents on Kubernetes with production-grade security, observability, and lifecycle management.

OpenClaw is an AI agent platform that acts on your behalf across Telegram, Discord, WhatsApp, and Signal. It manages your inbox, calendar, smart home, and more through 50+ integrations. While OpenClaw.rocks offers fully managed hosting, this operator lets you run OpenClaw on your own infrastructure with the same operational rigor.

Why an Operator?

Deploying AI agents to Kubernetes involves more than a Deployment and a Service. You need network isolation, secret management, persistent storage, health monitoring, optional browser automation, and config rollouts, all wired correctly. This operator encodes those concerns into a single OpenClawInstance custom resource so you can go from zero to production in minutes:

apiVersion: openclaw.rocks/v1alpha1
kind: OpenClawInstance
metadata:
  name: my-agent
spec:
  envFrom:
    - secretRef:
        name: openclaw-api-keys
  storage:
    persistence:
      enabled: true
      size: 10Gi

The operator reconciles this into a fully managed stack of 9+ Kubernetes resources: secured, monitored, and self-healing.

Agents That Adapt Themselves

Agents can autonomously install skills, patch their config, add environment variables, and seed workspace files - all through the Kubernetes API, validated by the operator on every request.

# 1. Enable self-configure on the instance
spec:
  selfConfigure:
    enabled: true
    allowedActions: [skills, config, envVars, workspaceFiles]

# 2. The agent creates this to install a skill at runtime
apiVersion: openclaw.rocks/v1alpha1
kind: OpenClawSelfConfig
metadata:
  name: add-fetch-skill
spec:
  instanceRef: my-agent
  addSkills:
    - "@anthropic/mcp-server-fetch"

Every request is validated against the instance's allowlist policy. Protected config keys cannot be overwritten, and denied requests are logged with a reason. See Self-configure for details.

Note: Without selfConfigure enabled, config or skill changes made by the agent inside the container won't trigger a pod restart. You'll need to restart the pod manually (e.g. kubectl delete pod <pod-name>) for changes to take effect.

Features

	Feature	Details
Declarative	Single CRD	One resource defines the entire stack: StatefulSet, Service, RBAC, NetworkPolicy, PVC, PDB, Ingress, and more
Adaptive	Agent self-configure	Agents autonomously install skills, patch config, and adapt their environment via the K8s API - every change validated against an allowlist policy
Secure	Hardened by default	Non-root (UID 1000), read-only root filesystem, all capabilities dropped, seccomp RuntimeDefault, default-deny NetworkPolicy, validating webhook
Observable	Built-in metrics	Prometheus metrics, ServiceMonitor integration, structured JSON logging, Kubernetes events
Flexible	Provider-agnostic config	Use any AI provider (Anthropic, OpenAI, or others) via environment variables and inline or external config
Config Modes	Merge or overwrite	`overwrite` replaces config on restart; `merge` deep-merges with PVC config, preserving runtime changes. Config is restored on every container restart via init container.
Skills	Declarative install	Install ClawHub skills, npm packages, or GitHub-hosted skill packs via `spec.skills` - supports `npm:` and `pack:` prefixes
Plugins	Declarative install	Install OpenClaw plugins via `spec.plugins` - npm packages installed in a secure init container
Runtime Deps	pnpm & Python/uv	Built-in init containers install pnpm (via corepack) or Python 3.12 + uv for MCP servers and skills
Auto-Update	OCI registry polling	Opt-in version tracking: checks the registry for new semver releases, backs up first, rolls out, and auto-rolls back if the new version fails health checks
Scalable	Auto-scaling	HPA integration with CPU and memory metrics, min/max replica bounds, automatic StatefulSet replica management
Resilient	Self-healing lifecycle	PodDisruptionBudgets, health probes, automatic config rollouts via content hashing, 5-minute drift detection
Backup/Restore	S3-backed snapshots	Automatic backup to S3-compatible storage on deletion, pre-update, and on a cron schedule; restore into a new instance from any snapshot
Workspace Seeding	Initial files & dirs	Pre-populate the workspace with files and directories before the agent starts; reference an external ConfigMap for GitOps workflows
Gateway Auth	Auto-generated tokens	Automatic gateway token Secret per instance, bypassing mDNS pairing (unusable in k8s)
Tailscale	Tailnet access	Expose via Tailscale Serve or Funnel with SSO auth - no Ingress needed
Extensible	Sidecars & init containers	Chromium for browser automation, Ollama for local LLMs, Tailscale for tailnet access, plus custom init containers and sidecars
Cloud Native	SA annotations & CA bundles	AWS IRSA / GCP Workload Identity via ServiceAccount annotations; CA bundle injection for corporate proxies

Architecture

+-----------------------------------------------------------------+
|  OpenClawInstance CR          OpenClawSelfConfig CR              |
|  (your declarative config)   (agent self-modification requests) |
+---------------+-------------------------------------------------+
                | watch
                v
+-----------------------------------------------------------------+
|  OpenClaw Operator                                              |
|  +-----------+  +-------------+  +----------------------------+ |
|  | Reconciler|  |   Webhooks  |  |   Prometheus Metrics       | |
|  |           |  |  (validate  |  |  (reconcile count,         | |
|  |  creates ->  |   & default)|  |   duration, phases)        | |
|  +-----------+  +-------------+  +----------------------------+ |
+---------------+-------------------------------------------------+
                | manages
                v
+-----------------------------------------------------------------+
|  Managed Resources (per instance)                               |
|                                                                 |
|  ServiceAccount -> Role -> RoleBinding    NetworkPolicy         |
|  ConfigMap        PVC      PDB            ServiceMonitor        |
|  GatewayToken Secret                                            |
|                                                                 |
|  StatefulSet                                                    |
|  +-----------------------------------------------------------+ |
|  | Init: config -> pnpm* -> python* -> skills* -> custom      | |
|  |                                        (* = opt-in)        | |
|  +------------------------------------------------------------+ |
|  | OpenClaw Container  Gateway Proxy (nginx)                  | |
|  |                     Chromium (opt) / Ollama (opt)          | |
|  |                     Tailscale (opt) + custom sidecars      | |
|  +------------------------------------------------------------+ |
|                                                                 |
|  Service (default: 18789, 18793 or custom) -> Ingress (opt)     |
+-----------------------------------------------------------------+

Quick Start

Prerequisites

Kubernetes 1.28+
Helm 3

1. Install the operator

helm install openclaw-operator \
  oci://ghcr.io/openclaw-rocks/charts/openclaw-operator \
  --namespace openclaw-operator-system \
  --create-namespace

Alternative: install with Kustomize

# Install CRDs
make install

# Deploy the operator
make deploy IMG=ghcr.io/openclaw-rocks/openclaw-operator:latest

2. Create a secret with your API keys

apiVersion: v1
kind: Secret
metadata:
  name: openclaw-api-keys
type: Opaque
stringData:
  ANTHROPIC_API_KEY: "sk-ant-..."

3. Deploy an OpenClaw instance

apiVersion: openclaw.rocks/v1alpha1
kind: OpenClawInstance
metadata:
  name: my-agent
spec:
  envFrom:
    - secretRef:
        name: openclaw-api-keys
  storage:
    persistence:
      enabled: true
      size: 10Gi

kubectl apply -f secret.yaml -f openclawinstance.yaml

4. Verify

kubectl get openclawinstances
# NAME       PHASE     AGE
# my-agent   Running   2m

kubectl get pods
# NAME         READY   STATUS    AGE
# my-agent-0   1/1     Running   2m

Configuration

Inline config (openclaw.json)

spec:
  config:
    raw:
      agents:
        defaults:
          model:
            primary: "anthropic/claude-sonnet-4-20250514"
          sandbox: true
      session:
        scope: "per-sender"

External ConfigMap reference

spec:
  config:
    configMapRef:
      name: my-openclaw-config
      key: openclaw.json

Config changes are detected via SHA-256 hashing and automatically trigger a rolling update. No manual restart needed.

Gateway proxy

By default, each pod includes an nginx reverse proxy sidecar that forwards traffic to the OpenClaw gateway on loopback. Set spec.gateway.enabled: false to disable it:

Health probes and Service ports target the gateway directly on port 18789
gateway.bind is set to 0.0.0.0 instead of loopback
The gateway-proxy container and its tmp volume are omitted from the pod
To replace the built-in proxy with your own (e.g., Envoy, a signing proxy), disable it and add your proxy via spec.sidecars
Warning: Do not set gateway.bind: loopback in your config JSON when the proxy is disabled - the gateway will only listen on 127.0.0.1 with nothing forwarding external traffic, making the pod unreachable. The operator emits a GatewayBindConflict warning event if this misconfiguration is detected.
TLS: When the proxy is disabled, the gateway serves plaintext ws:// on 0.0.0.0. Ensure your replacement proxy or Ingress handles TLS termination to avoid exposing unencrypted WebSocket traffic (CWE-319).

Gateway authentication

The operator automatically generates a gateway token Secret for each instance and injects it into both the config JSON (gateway.auth.mode: token) and the OPENCLAW_GATEWAY_TOKEN env var. This bypasses Bonjour/mDNS pairing, which is unusable in Kubernetes.

The token is generated once and never overwritten - rotate it by editing the Secret directly
If you set gateway.auth.token in your config or OPENCLAW_GATEWAY_TOKEN in spec.env, your value takes precedence
To bring your own token Secret, set spec.gateway.existingSecret - the operator will use it instead of auto-generating one (the Secret must have a key named token)
The operator automatically sets gateway.controlUi.dangerouslyDisableDeviceAuth: true - device pairing is incompatible with Kubernetes (users cannot approve pairing from inside a container, connections are always proxied, and mDNS is unavailable)
Do not set gateway.mode: local in your config - this mode is for desktop installs and enforces device identity checks that cannot work behind a reverse proxy in Kubernetes
When connecting to the Control UI through an Ingress, pass the gateway token in the URL fragment: https://openclaw.example.com/#token=<your-token>
Since v2026.2.24, OpenClaw restricts gateway.allowedOrigins to same-origin by default - if accessing via a non-default hostname (e.g. Ingress), set gateway.allowedOrigins: ["*"] in your config

Control UI allowed origins

The operator auto-injects gateway.controlUi.allowedOrigins so the Control UI works through reverse proxies without CORS errors. Origins are derived from:

Localhost (always): http://localhost:18789, http://127.0.0.1:18789 for port-forwarding
Ingress hosts: scheme determined from TLS config (https:// if TLS, http:// otherwise)
Explicit extras: spec.gateway.controlUiOrigins for custom proxy URLs

If you set gateway.controlUi.allowedOrigins directly in your config JSON, the operator will not override it.

Chromium sidecar

Enable headless browser automation for web scraping, screenshots, and browser-based integrations:

spec:
  chromium:
    enabled: true
    image:
      repository: chromedp/headless-shell  # default
      tag: "stable"
    resources:
      requests:
        cpu: "250m"
        memory: "512Mi"
      limits:
        cpu: "1000m"
        memory: "2Gi"
    # Pass extra flags to the Chromium process (appended to built-in anti-bot defaults)
    extraArgs:
      - "--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
    # Inject extra environment variables into the sidecar
    extraEnv:
      - name: DISPLAY
        value: ":99"

When enabled, the operator automatically:

Injects a CHROMIUM_URL environment variable into the main container
Configures browser profiles in the OpenClaw config - both "default" and "chrome" profiles are set to point at the sidecar's CDP endpoint, so browser tool calls work regardless of which profile name the LLM passes
Sets up shared memory, security contexts, and health probes for the sidecar
Applies anti-bot-detection flags by default (--disable-blink-features=AutomationControlled, --disable-features=AutomationControlled, --no-first-run)

Persistent browser profiles

By default, all browser state (cookies, localStorage, session tokens) is lost on pod restart. Enable persistence to retain browser profiles across restarts:

spec:
  chromium:
    enabled: true
    persistence:
      enabled: true          # default: false
      storageClass: ""        # optional - uses cluster default if empty
      size: "1Gi"             # default: 1Gi
      existingClaim: ""       # optional - use a pre-existing PVC

When persistence is enabled, the operator creates a dedicated PVC and passes --user-data-dir=/chromium-data to Chrome so that cookies, localStorage, IndexedDB, cached credentials, and session tokens survive pod restarts. This is useful for authenticated browser automation, MFA-protected services, and long-running browser workflows.

Security note: Persistent browser profiles contain sensitive session tokens. The PVC has the same security posture as other instance volumes. Ensure your StorageClass supports encryption at rest for sensitive workloads.

Ollama sidecar

Run local LLMs alongside your agent for private, low-latency inference without external API calls:

spec:
  ollama:
    enabled: true
    models:
      - llama3.2
      - nomic-embed-text
    gpu: 1
    storage:
      sizeLimit: 30Gi
    resources:
      requests:
        cpu: "1"
        memory: "4Gi"
      limits:
        cpu: "4"
        memory: "16Gi"

When enabled, the operator:

Injects an OLLAMA_HOST environment variable into the main container
Pre-pulls specified models via an init container before the agent starts
Configures GPU resource limits when gpu is set (nvidia.com/gpu)
Mounts a model cache volume (emptyDir by default, or an existing PVC via storage.existingClaim)

See Custom AI Providers for configuring OpenClaw to use Ollama models via environment variables.

Web terminal sidecar

Provide browser-based shell access to running instances for debugging and inspection without requiring kubectl exec:

spec:
  webTerminal:
    enabled: true
    readOnly: false
    credential:
      secretRef:
        name: my-terminal-creds
    resources:
      requests:
        cpu: "50m"
        memory: "64Mi"
      limits:
        cpu: "200m"
        memory: "128Mi"

When enabled, the operator:

Injects a ttyd sidecar container on port 7681
Mounts the instance data volume at /home/openclaw/.openclaw so you can inspect config, logs, and data files
Adds the web terminal port to the Service and NetworkPolicy for external access
Supports basic auth via a Secret with username and password keys
Supports read-only mode (readOnly: true) for production environments where shell input should be disabled

Tailscale integration

Expose your instance via Tailscale Serve (tailnet-only) or Funnel (public internet) - no Ingress or LoadBalancer needed:

spec:
  tailscale:
    enabled: true
    mode: serve          # "serve" (tailnet only) or "funnel" (public internet)
    authKeySecretRef:
      name: tailscale-auth
    authSSO: true        # allow passwordless login for tailnet members
    hostname: my-agent   # defaults to instance name
    image:
      repository: ghcr.io/tailscale/tailscale  # default
      tag: latest
    resources:
      requests:
        cpu: 50m
        memory: 64Mi
      limits:
        cpu: 200m
        memory: 256Mi

When enabled, the operator runs a Tailscale sidecar (tailscaled) that handles serve/funnel declaratively via TS_SERVE_CONFIG. An init container copies the tailscale CLI binary to a shared volume so the main container can call tailscale whois for SSO authentication. The sidecar runs in userspace mode (TS_USERSPACE=true) - no NET_ADMIN capability needed.

State persistence: Tailscale node identity and TLS certificates are automatically persisted to a Kubernetes Secret (<instance>-ts-state) via TS_KUBE_SECRET. This prevents hostname incrementing (device-1, device-2, ...) and Let's Encrypt certificate re-issuance across pod restarts. The operator pre-creates the state Secret, grants the pod's ServiceAccount get/update/patch access to it, and mounts the SA token automatically.

Use ephemeral+reusable auth keys from the Tailscale admin console. When authSSO is enabled, tailnet members can authenticate without a gateway token.

Config merge mode

By default, the operator overwrites the config file on every pod restart. Set mergeMode: merge to deep-merge operator config with existing PVC config, preserving runtime changes made by the agent:

spec:
  config:
    mergeMode: merge
    raw:
      agents:
        defaults:
          model:
            primary: "anthropic/claude-sonnet-4-20250514"

Caveat: In merge mode, removing a key from the CR does not remove it from the PVC config - the old value persists because deep-merge only adds or updates keys. If you need to remove stale config keys (e.g., after removing gateway.mode: local), temporarily switch to mergeMode: overwrite, apply, wait for the pod to restart, then switch back to merge.

Skill installation

Install skills declaratively. The operator runs an init container that fetches each skill before the agent starts. Entries use ClawHub by default, or prefix with npm: to install from npmjs.com. ClawHub installs are idempotent - if a skill is already installed (e.g., when using persistent storage), it is skipped rather than failing:

spec:
  skills:
    - "@anthropic/mcp-server-fetch"       # ClawHub (default)
    - "npm:@openclaw/matrix"              # npm package from npmjs.com

npm lifecycle scripts are disabled globally on the init container (NPM_CONFIG_IGNORE_SCRIPTS=true) to mitigate supply chain attacks.

Skill packs

Skill packs bundle multiple files (SKILL.md, scripts, config) into a single installable unit hosted on GitHub. Use the pack: prefix with owner/repo/path format:

spec:
  skills:
    - "pack:openclaw-rocks/skills/image-gen"            # latest from default branch
    - "pack:openclaw-rocks/skills/image-gen@v1.0.0"     # pinned to tag
    - "pack:myorg/private-skills/custom-tool@main"       # private repo (requires GITHUB_TOKEN)

Each pack directory must contain a skillpack.json manifest:

{
  "files": {
    "skills/image-gen/SKILL.md": "SKILL.md",
    "skills/image-gen/scripts/generate.py": "scripts/generate.py"
  },
  "directories": ["skills/image-gen/scripts"],
  "config": {
    "image-gen": {"enabled": true}
  }
}

The operator resolves packs via the GitHub Contents API (cached for 5 minutes), seeds files into the workspace via the init container, and injects config entries into config.raw.skills.entries (user overrides take precedence). Set GITHUB_TOKEN on the operator deployment for private repo access.

Plugin installation

Install plugins declaratively. The operator runs a dedicated init container that installs each plugin via npm install before the agent starts:

spec:
  plugins:
    - "@martian-engineering/lossless-claw"
    - "some-other-plugin"

npm lifecycle scripts are disabled globally on the init container (NPM_CONFIG_IGNORE_SCRIPTS=true) to mitigate supply chain attacks. Plugins are installed into the PVC-backed ~/.openclaw/node_modules directory and persist across pod restarts.

Workspace seeding

Pre-populate the agent workspace with files and directories before the agent starts. Files can be provided inline or referenced from an external ConfigMap -- ideal for GitOps workflows where workspace content is managed alongside your manifests.

Inline files:

spec:
  workspace:
    initialDirectories:
      - tools/scripts
    initialFiles:
      README.md: |
        # My Workspace
        This workspace is managed by OpenClaw.

External ConfigMap reference:

spec:
  workspace:
    configMapRef:
      name: my-workspace-files      # all keys become workspace files
    initialFiles:                    # inline files (override configMapRef)
      EXTRA.md: "additional content"

All keys in the referenced ConfigMap are written as files into the workspace directory. When both configMapRef and initialFiles are specified, inline files take precedence over ConfigMap entries with the same filename.

Merge priority (highest wins): operator-injected files > inline initialFiles > external configMapRef > skill packs.

The operator sets a WorkspaceReady status condition to False when the referenced ConfigMap is missing or contains invalid filenames, and True once workspace files are seeded successfully. The controller watches external ConfigMaps for changes and re-reconciles automatically.

How it works: Workspace files are seeded once via an init container. The init container copies files from a read-only ConfigMap volume to the PVC. The main container only sees the PVC (writable), so agents can modify their workspace files and changes persist across pod restarts. ConfigMaps are never mounted directly on the main container.

GitOps example with Kustomize:

# kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: my-namespace              # must match the instance namespace

generatorOptions:
  disableNameSuffixHash: true        # required - operator looks up by exact name

configMapGenerator:
  - name: my-workspace-files
    files:
      - workspace/SOUL.md
      - workspace/AGENT.md

Important: Two kustomize settings are required when using configMapGenerator with configMapRef:

disableNameSuffixHash: true -- The operator looks up ConfigMaps by exact name. Kustomize's default hash suffix (e.g. -57k7g4dthc) would cause a ConfigMapNotFound error.

namespace -- Generated ConfigMaps must be in the same namespace as the instance. Without this, kustomize creates them in the default namespace.

Additional workspaces (multi-agent):

When running multiple agents with isolated workspaces, use additionalWorkspaces to seed files for each agent. Each entry seeds to ~/.openclaw/workspace-<name>/ -- set matching paths in spec.config.raw.agents.list[].workspace.

spec:
  workspace:
    configMapRef:
      name: main-agent-workspace
    additionalWorkspaces:
      - name: scheduler
        configMapRef:
          name: scheduler-workspace
        initialFiles:
          SOUL.md: "I am the scheduler agent"
        initialDirectories:
          - tools
  config:
    raw:
      agents:
        list:
          - id: main
            name: "Main Agent"
          - id: scheduler
            name: "Scheduler Agent"
      bindings:
        - agentId: scheduler
          match:
            channel: discord
            peer:
              kind: channel
              id: "123456789"        # bind to a specific channel

Each additional workspace supports the same configMapRef, initialFiles, and initialDirectories as the default workspace. Operator-injected ENVIRONMENT.md is included; BOOTSTRAP.md is not (only the default agent runs onboarding). Max 10 additional workspaces.

Seed-once behavior: Workspace files (both default and additional) are only written on first boot when they don't already exist on the PVC. If an agent modifies its own SOUL.md or AGENT.md at runtime, those changes persist across pod restarts and are never overwritten by the ConfigMap content. To re-seed a file, delete it from the PVC first.

Full GitOps example with multiple agents:

# kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: my-namespace

generatorOptions:
  disableNameSuffixHash: true

resources:
  - instance.yaml

configMapGenerator:
  - name: main-agent-workspace
    files:
      - agents/main/SOUL.md
      - agents/main/AGENT.md
  - name: scheduler-workspace
    files:
      - agents/scheduler/SOUL.md
      - agents/scheduler/TOOLS.md

Self-configure

Allow agents to modify their own configuration by creating OpenClawSelfConfig resources via the K8s API. The operator validates each request against the instance's allowedActions policy before applying changes:

spec:
  selfConfigure:
    enabled: true
    allowedActions:
      - skills        # add/remove skills
      - config        # patch openclaw.json
      - workspaceFiles # add/remove workspace files
      - envVars       # add/remove environment variables

When enabled, the operator:

Grants the instance's ServiceAccount RBAC permissions to read its own CRD and create OpenClawSelfConfig resources
Enables SA token automounting so the agent can authenticate with the K8s API
Injects a SELFCONFIG.md skill file and selfconfig.sh helper script into the workspace
Opens port 6443 egress in the NetworkPolicy for K8s API access

The agent creates a request like:

apiVersion: openclaw.rocks/v1alpha1
kind: OpenClawSelfConfig
metadata:
  name: add-fetch-skill
spec:
  instanceRef: my-agent
  addSkills:
    - "@anthropic/mcp-server-fetch"

The operator validates the request, applies it to the parent OpenClawInstance, and sets the request's status to Applied, Denied, or Failed. Terminal requests are auto-deleted after 1 hour.

See the API reference for the full OpenClawSelfConfig CRD spec and spec.selfConfigure fields.

Persistent storage

By default the operator creates a 10Gi PVC and retains it when the CR is deleted (orphan behavior). Override size, storage class, or retention:

spec:
  storage:
    persistence:
      size: 20Gi
      storageClass: fast-ssd
      orphan: true   # default -- PVC is RETAINED when the CR is deleted
      # orphan: false  -- PVC is deleted with the CR (garbage collected)

To reuse an existing PVC (e.g., after restoring from a backup):

spec:
  storage:
    persistence:
      existingClaim: my-agent-data

Retention is stateful data protection. Because agent workspaces contain irreplaceable data such as memory, notebooks, and conversation history, the default is orphan: true. To re-attach a retained PVC to a new instance, set existingClaim to its name.

Runtime dependencies

Enable built-in init containers that install pnpm or Python/uv to the data PVC for MCP servers and skills:

spec:
  runtimeDeps:
    pnpm: true    # Installs pnpm via corepack
    python: true  # Installs Python 3.12 + uv

Custom init containers and sidecars

Add custom init containers (run after operator-managed ones) and sidecar containers:

spec:
  initContainers:
    - name: fetch-models
      image: curlimages/curl:8.5.0
      command: ["sh", "-c", "curl -o /data/model.bin https://..."]
      volumeMounts:
        - name: data
          mountPath: /data
  sidecars:
    - name: cloud-sql-proxy
      image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.14.3
      args: ["--structured-logs", "my-project:us-central1:my-db"]
      ports:
        - containerPort: 5432
  sidecarVolumes:
    - name: proxy-creds
      secret:
        secretName: cloud-sql-proxy-sa

Reserved init container names (init-config, init-pnpm, init-python, init-skills, init-ollama) are rejected by the webhook. If your sidecar replaces the built-in gateway proxy, set spec.gateway.enabled: false to avoid running both.

Extra volumes and mounts

Mount additional ConfigMaps, Secrets, or CSI volumes into the main container:

spec:
  extraVolumes:
    - name: shared-data
      persistentVolumeClaim:
        claimName: shared-pvc
  extraVolumeMounts:
    - name: shared-data
      mountPath: /shared

Ingress Basic Auth

Add HTTP Basic Authentication to the Ingress. The operator auto-generates a random password and stores it in a managed Secret:

spec:
  networking:
    ingress:
      enabled: true
      className: nginx
      hosts:
        - host: my-agent.example.com
      security:
        basicAuth:
          enabled: true
          username: admin          # default: "openclaw"
          realm: "My Agent"        # default: "OpenClaw"

The generated Secret is named <name>-basic-auth and contains three keys: auth (htpasswd format for ingress controllers), username, and password (plaintext, for retrieving the auto-generated credentials). It is tracked in status.managedResources.basicAuthSecret. To use your own credentials, provide a pre-formatted htpasswd Secret:

spec:
  networking:
    ingress:
      security:
        basicAuth:
          enabled: true
          existingSecret: my-htpasswd-secret  # must contain key "auth"

For Traefik ingress, a Middleware CRD resource is created automatically (requires Traefik CRDs installed).

Custom service ports

By default the operator creates a Service with the gateway (18789) and canvas (18793) ports. To expose custom ports instead (e.g., for a non-default application), set spec.networking.service.ports:

spec:
  networking:
    service:
      type: ClusterIP
      ports:
        - name: http
          port: 3978
          targetPort: 3978

When ports is set, it fully replaces the default ports -- including the Chromium port if the sidecar is enabled. To keep the defaults alongside custom ports, include them explicitly. If targetPort is omitted it defaults to port. See the API reference for all fields.

CA bundle injection

Inject a custom CA certificate bundle for environments with TLS-intercepting proxies or private CAs:

spec:
  security:
    caBundle:
      configMapName: corporate-ca-bundle  # or secretName
      key: ca-bundle.crt                  # default key name

The bundle is mounted into all containers and the SSL_CERT_FILE / NODE_EXTRA_CA_CERTS environment variables are set automatically.

ServiceAccount annotations

Add annotations to the managed ServiceAccount for cloud provider integrations:

spec:
  security:
    rbac:
      serviceAccountAnnotations:
        # AWS IRSA
        eks.amazonaws.com/role-arn: "arn:aws:iam::123456789:role/openclaw"
        # GCP Workload Identity
        # iam.gke.io/gcp-service-account: "openclaw@project.iam.gserviceaccount.com"

Auto-update

Opt into automatic version tracking so the operator detects new releases and rolls them out without manual intervention:

spec:
  autoUpdate:
    enabled: true
    checkInterval: "24h"         # how often to poll the registry (1h-168h)
    backupBeforeUpdate: true     # back up the PVC before applying an update
    rollbackOnFailure: true      # auto-rollback if the new version fails health checks
    healthCheckTimeout: "10m"    # how long to wait for the pod to become ready (2m-30m)

When enabled, the operator resolves latest to the highest stable semver tag on creation, then polls for newer versions on each checkInterval. Before updating, it optionally runs an S3 backup, then patches the image tag and monitors the rollout. If the pod fails to become ready within healthCheckTimeout, it reverts the image tag and (optionally) restores the PVC from the pre-update snapshot.

Safety mechanisms include failed-version tracking (skips versions that failed health checks), a circuit breaker (pauses after 3 consecutive rollbacks), and full data restore when backupBeforeUpdate is enabled. Auto-update is a no-op for digest-pinned images (spec.image.digest).

See status.autoUpdate for update progress: kubectl get openclawinstance my-agent -o jsonpath='{.status.autoUpdate}'

Backup and restore

The operator uses rclone to back up and restore PVC data to/from S3-compatible storage. All backup operations require a Secret named s3-backup-credentials in the operator namespace:

apiVersion: v1
kind: Secret
metadata:
  name: s3-backup-credentials
  namespace: openclaw-operator-system
stringData:
  S3_ENDPOINT: "https://s3.us-east-1.amazonaws.com"
  S3_BUCKET: "my-openclaw-backups"
  S3_ACCESS_KEY_ID: "<key-id>"            # optional - omit for workload identity
  S3_SECRET_ACCESS_KEY: "<secret-key>"    # optional - omit for workload identity
  # S3_PROVIDER: "Other"    # optional - set to "AWS", "GCS", etc. for native credential chains
  # S3_REGION: "us-east-1"  # optional - needed for MinIO or providers with custom regions

Compatible with AWS S3, Backblaze B2, Cloudflare R2, MinIO, Wasabi, and any S3-compatible API.

Cloud workload identity: Omit S3_ACCESS_KEY_ID and S3_SECRET_ACCESS_KEY and set S3_PROVIDER (e.g., AWS, GCS) to use the provider's native credential chain. Set spec.backup.serviceAccountName to a workload identity-enabled ServiceAccount (IRSA, GKE Workload Identity, AKS Workload Identity) so backup Jobs inherit the cloud IAM role. See the Workload Identity section in the API reference for a full example.

When backups run automatically:

On delete - the operator backs up the PVC before removing any resources. Subject to spec.backup.timeout (default: 30m) - if the backup does not complete in time, it is skipped automatically. Add openclaw.rocks/skip-backup: "true" to skip immediately.
Before auto-update - when spec.autoUpdate.backupBeforeUpdate: true (the default).
On a schedule - when spec.backup.schedule is set (cron expression).

If the Secret does not exist, backups are silently skipped and operations proceed normally.

Periodic scheduled backups:

spec:
  backup:
    schedule: "0 2 * * *"   # Daily at 2 AM UTC
    retentionDays: 7         # Keep 7 days of daily snapshots (default)
    historyLimit: 3          # Successful job runs to retain (default: 3)
    failedHistoryLimit: 1    # Failed job runs to retain (default: 1)
    timeout: "30m"           # Max time for pre-delete backup (default: 30m, min: 5m, max: 24h)
    serviceAccountName: ""   # Optional: IRSA/Pod Identity SA for backup Jobs

The operator creates a Kubernetes CronJob that runs rclone to sync PVC data to S3. The CronJob uses pod affinity to co-locate on the same node as the StatefulSet pod (required for RWO PVCs). Backups use an incremental sync strategy: data is synced to a fixed latest path (only changed files uploaded), a daily snapshot is taken, and snapshots older than retentionDays are automatically pruned.

Restoring from backup:

spec:
  # Path recorded in status.lastBackupPath of the source instance
  restoreFrom: "backups/my-tenant/my-agent/2026-01-15T10:30:00Z"

The operator runs a restore job to populate the PVC before starting the StatefulSet, then clears restoreFrom automatically. Backup paths follow the format backups/<tenantId>/<instanceName>/<timestamp>.

Clone / migrate an instance: restoreFrom works on both existing and brand-new instances. To clone an instance across namespaces, create a new OpenClawInstance with spec.restoreFrom pointing to the source's backup path - the operator creates the PVC, runs the restore Job, then starts the StatefulSet. The new instance gets a fresh gateway token; the source is unaffected. The restore Job uses spec.backup.serviceAccountName when set, so workload identity (IRSA/Pod Identity) works for cross-namespace clones. For ArgoCD users, add spec.restoreFrom to ignoreDifferences since the operator auto-clears it after restore.

For full details see the Backup and Restore section in the API reference.

What the operator manages automatically

These behaviors are always applied - no configuration needed:

Behavior	Details
`gateway.bind`	When the gateway proxy sidecar is enabled (default), binds to loopback and an nginx reverse proxy handles external access. When disabled (`spec.gateway.enabled: false`), binds to `0.0.0.0` so the gateway is reachable directly.
Gateway auth token	Auto-generated Secret per instance; injected into config and env
Control UI origins	`gateway.controlUi.allowedOrigins` auto-injected from localhost + ingress hosts + `spec.gateway.controlUiOrigins`
`OPENCLAW_GATEWAY_HANDSHAKE_TIMEOUT_MS`	`10000` (10s) to work around upstream timeout regression in v2026.3.12 (#46892)
`OPENCLAW_DISABLE_BONJOUR=1`	Always set (mDNS does not work in Kubernetes)
Browser profiles	When Chromium is enabled, `"default"` and `"chrome"` profiles are auto-configured with the sidecar's CDP endpoint
Tailscale serve config	When Tailscale is enabled, a `tailscale-serve.json` key is added to the ConfigMap for the sidecar's `TS_SERVE_CONFIG`
Tailscale state persistence	When Tailscale is enabled, node identity and TLS certs are persisted to a `<instance>-ts-state` Secret via `TS_KUBE_SECRET`
Config hash rollouts	Config changes trigger rolling updates via SHA-256 hash annotation
Config restoration	The init container restores config on every pod restart (overwrite or merge mode)

For the full list of configuration options, see the API reference and the full sample YAML.

Security

The operator follows a secure-by-default philosophy. Every instance ships with hardened settings out of the box, with no extra configuration needed.

Defaults

Non-root execution: containers run as UID 1000; root (UID 0) is blocked by the validating webhook (exception: Ollama sidecar requires root per the official image)
Read-only root filesystem: enabled by default for the main container and the Chromium sidecar; the PVC at ~/.openclaw/ provides writable home, and a /tmp emptyDir handles temp files
All capabilities dropped: no ambient Linux capabilities
Seccomp RuntimeDefault: syscall filtering enabled
Default-deny NetworkPolicy: only DNS (53) and HTTPS (443) egress allowed; ingress limited to same namespace
Minimal RBAC: each instance gets its own ServiceAccount with read-only access to its own ConfigMap; operator can create/update Secrets only for operator-managed gateway tokens
No automatic token mounting: automountServiceAccountToken: false on both ServiceAccounts and pod specs (enabled only when selfConfigure is active)
Secret validation: the operator checks that all referenced Secrets exist and sets a SecretsReady condition
Security context propagation: when podSecurityContext.runAsNonRoot is set to false, the operator propagates this to init containers and applicable sidecars (tailscale, web terminal) so there is no contradiction between pod-level and container-level settings. Self-consistent sidecars (gateway-proxy, chromium, ollama) retain their own security contexts. The containerSecurityContext.runAsNonRoot and containerSecurityContext.runAsUser fields allow granular control over the main container independently of the pod level.

Validating webhook

Check	Severity	Behavior
`runAsUser: 0`	Error	Blocked: root execution not allowed
Reserved init container name	Error	`init-config`, `init-pnpm`, `init-python`, `init-skills`, `init-ollama` are reserved
Invalid skill name	Error	Only alphanumeric, `-`, `_`, `/`, `.`, `@` allowed (max 128 chars). `npm:` prefix for npm packages, `pack:` prefix for skill packs; bare `npm:` or `pack:` is rejected
Invalid CA bundle config	Error	Exactly one of `configMapName` or `secretName` must be set
JSON5 with inline raw config	Error	JSON5 requires `configMapRef` (inline must be valid JSON)
JSON5 with merge mode	Error	JSON5 is not compatible with `mergeMode: merge`
Invalid `checkInterval`	Error	Must be a valid Go duration between 1h and 168h
Invalid `healthCheckTimeout`	Error	Must be a valid Go duration between 2m and 30m

Warning-level checks (deployment proceeds with a warning)

Check	Behavior
NetworkPolicy disabled	Deployment proceeds with a warning
Ingress without TLS	Deployment proceeds with a warning
Chromium without digest pinning	Deployment proceeds with a warning
Ollama without digest pinning	Deployment proceeds with a warning
Web terminal without digest pinning	Deployment proceeds with a warning
Ollama runs as root	Required by official image; informational
Auto-update with digest pin	Digest overrides auto-update; updates won't apply
`readOnlyRootFilesystem` disabled	Proceeds with a security recommendation
No AI provider keys detected	Scans `env`/`envFrom` for known provider env vars
Unknown config keys	Warns on unrecognized top-level keys in `spec.config.raw`

Observability

Prometheus metrics

Metric	Type	Description
`openclaw_reconcile_total`	Counter	Reconciliations by result (success/error)
`openclaw_reconcile_duration_seconds`	Histogram	Reconciliation latency
`openclaw_instance_phase`	Gauge	Current phase per instance
`openclaw_instance_info`	Gauge	Instance metadata for PromQL joins (always 1)
`openclaw_instance_ready`	Gauge	Whether instance pod is ready (1/0)
`openclaw_managed_instances`	Gauge	Total number of managed instances
`openclaw_resource_creation_failures_total`	Counter	Resource creation failures
`openclaw_autoupdate_checks_total`	Counter	Auto-update version checks by result
`openclaw_autoupdate_applied_total`	Counter	Successful auto-updates applied
`openclaw_autoupdate_rollbacks_total`	Counter	Auto-update rollbacks triggered

When metrics.enabled: true (the default), the operator automatically configures a full metrics pipeline: it injects diagnostics.otel config into OpenClaw to push OTLP metrics to a lightweight OTel Collector sidecar (otel/opentelemetry-collector), which exposes a Prometheus scrape endpoint on the configured port (default 9090). No manual OpenClaw configuration is needed. If you already set diagnostics.otel in your instance config, the operator preserves your settings.

ServiceMonitor

spec:
  observability:
    metrics:
      enabled: true
      serviceMonitor:
        enabled: true
        interval: 15s
        labels:
          release: prometheus

OTLP metrics export (operator)

The operator can push its own metrics (reconciliation counters, workqueue stats, client latencies, etc.) to any OTLP-compatible backend via gRPC. This bridges all Prometheus metrics to OpenTelemetry, running alongside the existing Prometheus scrape endpoint.

# values.yaml
otlp:
  enabled: true
  endpoint: "otel-collector.observability.svc:4317"
  insecure: true  # set to false for TLS

The endpoint can also be configured via the OTEL_EXPORTER_OTLP_ENDPOINT environment variable. Metrics are pushed every 30 seconds. If the OTLP endpoint is unreachable, the operator logs a warning and continues operating normally.

PrometheusRule (alerts)

Auto-provisions a PrometheusRule with 7 alerts including runbook URLs:

spec:
  observability:
    metrics:
      prometheusRule:
        enabled: true
        labels:
          release: kube-prometheus-stack  # must match Prometheus ruleSelector
        runbookBaseURL: https://openclaw.rocks/docs/runbooks  # default

Alerts: OpenClawReconcileErrors, OpenClawInstanceDegraded, OpenClawSlowReconciliation, OpenClawPodCrashLooping, OpenClawPodOOMKilled, OpenClawPVCNearlyFull, OpenClawAutoUpdateRollback

Grafana dashboards

Auto-provisions two Grafana dashboard ConfigMaps (discovered via the grafana_dashboard: "1" label):

spec:
  observability:
    metrics:
      grafanaDashboard:
        enabled: true
        folder: OpenClaw  # Grafana folder (default)
        labels:
          grafana_dashboard_instance: my-grafana  # optional extra labels

Dashboards:

OpenClaw Operator - fleet overview with reconciliation metrics, instance table, workqueue, and auto-update panels
OpenClaw Instance - per-instance detail with CPU, memory, storage, network, and pod health panels

Auto-Scaling (HPA)

Enable horizontal pod auto-scaling to automatically adjust the number of replicas based on CPU and memory utilization:

spec:
  availability:
    autoScaling:
      enabled: true
      minReplicas: 1
      maxReplicas: 10
      targetCPUUtilization: 80
      targetMemoryUtilization: 70  # optional

When enabled, the operator creates a HorizontalPodAutoscaler targeting the StatefulSet and sets the StatefulSet's replica count to nil so the HPA manages scaling. The HPA is deleted when auto-scaling is disabled.

When auto-scaling is combined with persistent storage:

Each replica gets its own PVC via StatefulSet VolumeClaimTemplates (named data-<instance>-<ordinal>)
PVCs inherit size, storageClass, and accessModes from spec.storage.persistence
Retention policy is Retain for both scale-down and deletion -- data is preserved
If auto-scaling is later disabled, per-replica PVCs become orphaned and must be cleaned up manually

Topology Spread Constraints

Spread pods across topology domains (zones, nodes) for improved availability:

spec:
  availability:
    topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app.kubernetes.io/instance: my-instance

Runtime Class

Schedule pods on alternative container runtimes (Kata Containers, gVisor, etc.) for VM-level isolation or security hardening:

spec:
  availability:
    runtimeClassName: kata-fc

A matching RuntimeClass resource must exist in the cluster. If unset, the default container runtime is used.

Pod Annotations

Merge extra annotations into the StatefulSet pod template. Operator-managed keys (openclaw.rocks/config-hash, openclaw.rocks/secret-hash) always take precedence and cannot be overridden.

Useful for cloud-provider hints, such as preventing GKE Autopilot from evicting long-running agent pods:

spec:
  podAnnotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "false"

Deployment Guides

Platform-specific deployment guides are available for:

Development

# Clone and set up
git clone https://github.com/OpenClaw-rocks/openclaw-operator.git
cd openclaw-operator
go mod download

# Generate code and manifests
make generate manifests

# Run tests
make test

# Run linter
make lint

# Run locally against a Kind cluster
kind create cluster
make install
make run

See CONTRIBUTING.md for the full development guide.

Roadmap

v1.0.0: API graduation to v1, conformance test suite, semver constraints for auto-update, HPA integration, cert-manager integration, multi-cluster support

See the full roadmap for details.

Don't Want to Self-Host?

OpenClaw.rocks offers fully managed hosting starting at EUR 15/mo. No Kubernetes cluster required. Setup, updates, and 24/7 uptime handled for you.

Contributing

Contributions are welcome. Please open an issue to discuss significant changes before submitting a PR. See CONTRIBUTING.md for guidelines.

Disclaimer: AI-Assisted Development

This repository is developed and maintained collaboratively by a human and Claude Code. This includes writing code, reviewing and commenting on issues, triaging bugs, and merging pull requests. The human reads everything and acts as the final guard, but Claude does the heavy lifting - from diagnosis to implementation to CI.

In the future, this repo may be fully autonomously operated, whether we humans like that or not.

License

Apache License 2.0, the same license used by Kubernetes, Prometheus, and most CNCF projects. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 416 Commits
.claude/worktrees		.claude/worktrees
.github		.github
api/v1alpha1		api/v1alpha1
bundle		bundle
charts/openclaw-operator		charts/openclaw-operator
cmd		cmd
config		config
docs		docs
hack		hack
internal		internal
test		test
.dockerignore		.dockerignore
.gitignore		.gitignore
.golangci.yaml		.golangci.yaml
.goreleaser.yaml		.goreleaser.yaml
.release-please-manifest.json		.release-please-manifest.json
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
PROJECT		PROJECT
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
artifacthub-repo.yml		artifacthub-repo.yml
bundle.Dockerfile		bundle.Dockerfile
go.mod		go.mod
go.sum		go.sum
release-please-config.json		release-please-config.json

Folders and files

Latest commit

History

Repository files navigation

OpenClaw Kubernetes Operator

Why an Operator?

Agents That Adapt Themselves

Features

Architecture

Quick Start

Prerequisites

1. Install the operator

2. Create a secret with your API keys

3. Deploy an OpenClaw instance

4. Verify

Configuration

Inline config (openclaw.json)

External ConfigMap reference

Gateway proxy

Gateway authentication

Control UI allowed origins

Chromium sidecar

Persistent browser profiles

Ollama sidecar

Web terminal sidecar

Tailscale integration

Config merge mode

Skill installation

Skill packs

Plugin installation

Workspace seeding

Self-configure

Persistent storage

Runtime dependencies

Custom init containers and sidecars

Extra volumes and mounts

Ingress Basic Auth

Custom service ports

CA bundle injection

ServiceAccount annotations

Auto-update

Backup and restore

What the operator manages automatically

Security

Defaults

Validating webhook

Observability

Prometheus metrics

ServiceMonitor

OTLP metrics export (operator)

PrometheusRule (alerts)

Grafana dashboards

Auto-Scaling (HPA)

Topology Spread Constraints

Runtime Class

Pod Annotations

Deployment Guides

Development

Roadmap

Don't Want to Self-Host?

Contributing

Disclaimer: AI-Assisted Development

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 94

Packages 0

Uh oh!

Uh oh!

Contributors 14

Languages

Packages