Skip to content

space.cloudnative.nz down :: Disk Pressure Eviction #1

@hh

Description

@hh

Zach noted that space.cloudnative.nz was down.

When available storage drops below 15% on that disk, pods are evicted (deleted).

This affected use due to 85% utilization of the OS / Ubuntu level files system used for imagefs

See https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#hard-eviction-thresholds

Temporary fix was to double the space (100GB to 200GB) allocated to the root Ubuntu logical volume from the physical 500GB volume.

Long term fix will be to setup our nodes with a dedicated imagesfs volume and monitor utilization.

Website down:

curl https://space.cloudnative.nz --head | grep HTTP
HTTP/2 503 

Storage issue at 85%:

ssh root@k8s.cloudnative.nz df -h -t ext4
Filesystem                                              Size  Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-ubuntu--lv                        98G   79G   15G  85% /
/dev/sda2                                               2.0G  253M  1.6G  14% /boot
/dev/longhorn/pvc-c995ff5d-f177-4d8c-a88c-bbc830375827  7.8G  233M  7.6G   3% /var/lib/kubelet/pods/73537501-f49d-4a63-a07c-436bf71b5d5b/volumes/kubernetes.io~csi/pvc-c995ff5d-f177-4d8c-a88c-bbc830375827/mount

Doubled Storage... usage now at 43%:

ssh root@k8s.cloudnative.nz df -h -t ext4 /
Filesystem                         Size  Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-ubuntu--lv  197G   79G  109G  43% /

Website up:

curl https://space.cloudnative.nz --head | grep HTTP
HTTP/2 200 

Background Reading

Ephemeral storage

https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#local-ephemeral-storage https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#configurations-for-local-ephemeral-storage

Eviction

https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/ Node-pressure eviction is the process by which the kubelet proactively terminates pods to reclaim resources on nodes.

https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#hard-eviction-thresholds

A hard eviction threshold has no grace period. When a hard eviction threshold is met, the kubelet kills pods immediately without graceful termination to reclaim the starved resource.

The kubelet has the following default hard eviction thresholds:

  • memory.available<100Mi
  • nodefs.available<10%
  • imagefs.available<15%
  • nodefs.inodesFree<5% (Linux nodes)

These default values of hard eviction thresholds will only be set if none of the parameters is changed. If you changed the value of any parameter, then the values of other parameters will not be inherited as the default values and will be set to zero. In order to provide custom values, you should provide all the thresholds respectively.

https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#node-conditions

The kubelet reports node conditions to reflect that the node is under pressure because hard or soft eviction threshold is met, independent of configured grace periods.

  • DiskPressure
    • nodefs.available, nodefs.inodesFree, imagefs.available, or imagefs.inodesFree
    • Available disk space and inodes on either the node’s root filesystem or image filesystem has satisfied an eviction threshold

Check that it’s down

curl https://space.cloudnative.nz --head | grep HTTP
HTTP/2 503 

check on coder ingress

kubectl -n coder get ingress
NAME    CLASS   HOSTS                                   ADDRESS           PORTS     AGE
coder   nginx   space.cloudnative.nz,*.cloudnative.nz   123.253.178.101   80, 443   10d

check on coder ingress.spec.rules[0].http.paths

Here we look for the http paths that route / to a backend service

kubectl -n coder get ingress coder -o yaml \
    | yq '.spec.rules[0].http.paths'
- backend:
    service:
      name: coder
      port:
        name: http
  path: /
  pathType: Prefix

check on coder svc

kubectl -n coder get svc coder
NAME    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
coder   ClusterIP   10.104.202.123   <none>        80/TCP    10d

determine coder svc ports

kubectl -n coder get svc coder -o yaml \
    | yq .spec.ports
- name: http
  port: 80
  protocol: TCP
  targetPort: http

determine coder svc selector

kubectl -n coder get svc coder -o yaml \
    | yq .spec.selector
app.kubernetes.io/instance: coder
app.kubernetes.io/name: coder

search for coder svc target pods

kubectl -n coder get pods -l app.kubernetes.io/name=coder
NAME                     READY   STATUS                   RESTARTS       AGE
coder-7996486845-6cph8   0/1     ContainerStatusUnknown   1              75m
coder-7996486845-bkffz   0/1     ContainerStatusUnknown   1              114m
coder-7996486845-bqmqp   0/1     ContainerStatusUnknown   1              30m
coder-7996486845-cf577   0/1     ContainerStatusUnknown   1              121m
coder-7996486845-dqnn8   1/1     Running                  0              14m
coder-7996486845-dsrbr   0/1     ContainerStatusUnknown   1              46m
coder-7996486845-ptc6n   0/1     ContainerStatusUnknown   1              107m
coder-7996486845-rtgcj   0/1     ContainerStatusUnknown   1              153m
coder-7996486845-rvkjx   0/1     ContainerStatusUnknown   1              92m
coder-7996486845-sdz9n   0/1     ContainerStatusUnknown   1              70m
coder-7996486845-vdgr9   0/1     ContainerStatusUnknown   1              137m
coder-7996486845-x5cvp   0/1     ContainerStatusUnknown   6 (2d8h ago)   4d11h
coder-7996486845-xz6b7   0/1     ContainerStatusUnknown   1              101m

inspect Events for pods that seem to be having issues

kubectl -n coder events --for=pod/coder-7996486845-bqmqp
LAST SEEN           TYPE      REASON                OBJECT                       MESSAGE
30m (x2 over 35m)   Warning   FailedScheduling      Pod/coder-7996486845-bqmqp   0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
29m                 Normal    Scheduled             Pod/coder-7996486845-bqmqp   Successfully assigned coder/coder-7996486845-bqmqp to srv1
29m                 Normal    Pulling               Pod/coder-7996486845-bqmqp   Pulling image "ghcr.io/coder/coder:v0.27.1"
28m                 Normal    Pulled                Pod/coder-7996486845-bqmqp   Successfully pulled image "ghcr.io/coder/coder:v0.27.1" in 14.957685446s (14.957810454s including waiting)
28m                 Normal    Created               Pod/coder-7996486845-bqmqp   Created container coder
28m                 Normal    Started               Pod/coder-7996486845-bqmqp   Started container coder
28m (x2 over 28m)   Warning   Unhealthy             Pod/coder-7996486845-bqmqp   Readiness probe failed: Get "http://10.0.0.119:8080/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
19m                 Warning   Evicted               Pod/coder-7996486845-bqmqp   The node was low on resource: ephemeral-storage. Threshold quantity: 15763389861, available: 14492980Ki. Container coder was using 421700Ki, request is 0, has larger consumption of ephemeral-storage.
19m                 Normal    Killing               Pod/coder-7996486845-bqmqp   Stopping container coder
19m                 Warning   ExceededGracePeriod   Pod/coder-7996486845-bqmqp   Container runtime did not kill the pod within specified grace period.

inspect status for pods that seem to be having issues

kubectl -n coder get pod/coder-7996486845-bqmqp -o yaml \
    | yq .status \
    | grep ^message:\\\|^phase:\\\|^reason:
message: 'The node was low on resource: ephemeral-storage. Threshold quantity: 15763389861, available: 14492980Ki. Container coder was using 421700Ki, request is 0, has larger consumption of ephemeral-storage. '
phase: Failed
reason: Evicted

inspect status.containerStatuses for pods that seem to be having issues

kubectl -n coder get pod/coder-7996486845-bqmqp -o yaml \
    | yq .status.containerStatuses.0
image: ghcr.io/coder/coder:v0.27.1
imageID: ""
lastState:
  terminated:
    exitCode: 137
    finishedAt: null
    message: The container could not be located when the pod was deleted.  The container used to be Running
    reason: ContainerStatusUnknown
    startedAt: null
name: coder
ready: false
restartCount: 1
started: false
state:
  terminated:
    exitCode: 137
    finishedAt: null
    message: The container could not be located when the pod was terminated
    reason: ContainerStatusUnknown
    startedAt: null

inspect status.conditions for pods that seem to be having issues

kubectl -n coder get pod/coder-7996486845-bqmqp -o yaml \
    | yq .status.conditions.0
lastProbeTime: null
lastTransitionTime: "2023-07-28T06:44:40Z"
message: 'The node was low on resource: ephemeral-storage. Threshold quantity: 15763389861, available: 14492980Ki. Container coder was using 421700Ki, request is 0, has larger consumption of ephemeral-storage. '
reason: TerminationByKubelet
status: "True"
type: DisruptionTarget

figure out node for broken pod

kubectl -n coder get pod/coder-7996486845-bqmqp -o jsonpath="{.spec.nodeName}"
srv1

get nodes

kubectl get nodes
NAME   STATUS   ROLES           AGE   VERSION
srv1   Ready    control-plane   10d   v1.27.3

events for node

kubectl events -A --for=node/srv1
NAMESPACE   LAST SEEN                  TYPE      REASON                  OBJECT      MESSAGE
default     60m                        Warning   FreeDiskSpaceFailed     Node/srv1   Failed to garbage collect required amount of images. Attempted to free 5100226969 bytes, but only found 4423240768 bytes eligible to free.
longhorn    52m                        Warning   Schedulable             Node/srv1   the disk default-disk-e4eb62364051e56c(/var/lib/longhorn/) on the node srv1 has 25585254400 available, but requires reserved 31526778470, minimal 25% to schedule more replicas
default     44m                        Warning   FreeDiskSpaceFailed     Node/srv1   Failed to garbage collect required amount of images. Attempted to free 5104617881 bytes, but only found 4423240768 bytes eligible to free.
longhorn    38m (x2 over 44h)          Warning   Schedulable             Node/srv1   the disk default-disk-e4eb62364051e56c(/var/lib/longhorn/) on the node srv1 has 26109542400 available, but requires reserved 31526778470, minimal 25% to schedule more replicas
default     34m                        Warning   FreeDiskSpaceFailed     Node/srv1   Failed to garbage collect required amount of images. Attempted to free 5029218713 bytes, but only found 301773 bytes eligible to free.
default     29m                        Warning   FreeDiskSpaceFailed     Node/srv1   Failed to garbage collect required amount of images. Attempted to free 5111343513 bytes, but only found 4423240768 bytes eligible to free.
longhorn    21m (x2 over 84m)          Warning   Schedulable             Node/srv1   the disk default-disk-e4eb62364051e56c(/var/lib/longhorn/) on the node srv1 has 26214400000 available, but requires reserved 31526778470, minimal 25% to schedule more replicas
default     17m (x16 over 24h)         Normal    NodeHasDiskPressure     Node/srv1   Node srv1 status is now: NodeHasDiskPressure
longhorn    17m (x929 over 24h)        Warning   Ready                   Node/srv1   Kubernetes node srv1 has pressure: KubeletHasDiskPressure, kubelet has disk pressure
longhorn    5m (x1037 over 2d9h)       Normal    Ready                   Node/srv1   Node srv1 is ready
default     4m18s (x2379 over 2d16h)   Normal    NodeHasNoDiskPressure   Node/srv1   Node srv1 status is now: NodeHasNoDiskPressure
default     2m11s (x72 over 24h)       Warning   EvictionThresholdMet    Node/srv1   Attempting to reclaim ephemeral-storage

node.spec.taints

kubectl get node srv1 -o yaml \
    | yq .spec.taints
- effect: NoSchedule
  key: node.kubernetes.io/disk-pressure
  timeAdded: "2023-07-28T07:38:40Z"

node.status.allocatable

kubectl get node srv1 -o yaml \
    | yq .status.allocatable
cpu: "24"
ephemeral-storage: "94580335255"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 197909196Ki
pods: "110"

node.status.capacity

kubectl get node srv1 -o yaml \
    | yq .status.capacity
cpu: "24"
ephemeral-storage: 102626232Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 198011596Ki
pods: "110"

node.status.conditions

kubectl get node srv1 -o yaml \
    | yq .status.conditions
- lastHeartbeatTime: "2023-07-17T14:56:32Z"
  lastTransitionTime: "2023-07-17T14:56:32Z"
  message: Cilium is running on this node
  reason: CiliumIsUp
  status: "False"
  type: NetworkUnavailable
- lastHeartbeatTime: "2023-07-28T07:45:28Z"
  lastTransitionTime: "2023-07-25T22:14:35Z"
  message: kubelet has sufficient memory available
  reason: KubeletHasSufficientMemory
  status: "False"
  type: MemoryPressure
- lastHeartbeatTime: "2023-07-28T07:45:28Z"
  lastTransitionTime: "2023-07-28T07:44:38Z"
  message: kubelet has no disk pressure
  reason: KubeletHasNoDiskPressure
  status: "False"
  type: DiskPressure
- lastHeartbeatTime: "2023-07-28T07:45:28Z"
  lastTransitionTime: "2023-07-25T22:14:35Z"
  message: kubelet has sufficient PID available
  reason: KubeletHasSufficientPID
  status: "False"
  type: PIDPressure
- lastHeartbeatTime: "2023-07-28T07:45:28Z"
  lastTransitionTime: "2023-07-25T22:14:35Z"
  message: kubelet is posting ready status. AppArmor enabled
  reason: KubeletReady
  status: "True"
  type: Ready

node.status.condition of interest (DiskPressure)

kubectl get node srv1 -o yaml \
    | yq .status.conditions.2
lastHeartbeatTime: "2023-07-28T08:04:53Z"
lastTransitionTime: "2023-07-28T08:00:28Z"
message: kubelet has disk pressure
reason: KubeletHasDiskPressure
status: "True"
type: DiskPressure

node.stats.runtime

kubectl get --raw "/api/v1/nodes/srv1/proxy/stats/summary" \
    | yq -P .node.runtime
imageFs:
  time: "2023-07-28T08:01:03Z"
  availableBytes: 15480643584
  capacityBytes: 105089261568
  usedBytes: 46888923136
  inodesFree: 4701743
  inodes: 6553600
  inodesUsed: 1577494

node.stats.fs

kubectl get --raw "/api/v1/nodes/srv1/proxy/stats/summary" \
    | yq -P .node.fs
time: "2023-07-28T08:01:33Z"
availableBytes: 15624622080
capacityBytes: 105089261568
usedBytes: 84079153152
inodesFree: 4703165
inodes: 6553600
inodesUsed: 1850435

Take a look at node ext4 filesystem from OS level

Looks like the filesystem is filling up to closer to 85% (that’s when pods get evicted)

ssh root@k8s.cloudnative.nz df -h -t ext4
Filesystem                                              Size  Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-ubuntu--lv                        98G   79G   15G  85% /
/dev/sda2                                               2.0G  253M  1.6G  14% /boot
/dev/longhorn/pvc-c995ff5d-f177-4d8c-a88c-bbc830375827  7.8G  233M  7.6G   3% /var/lib/kubelet/pods/73537501-f49d-4a63-a07c-436bf71b5d5b/volumes/kubernetes.io~csi/pvc-c995ff5d-f177-4d8c-a88c-bbc830375827/mount

extend the root logical volume

Looks like the filesystem is filling up to closer to 85% (that’s when pods get evicted)

lvextend -L200G /dev/mapper/ubuntu--vg-ubuntu--lv
  Size of logical volume ubuntu-vg/ubuntu-lv changed from 100.00 GiB (25600 extents) to 200.00 GiB (51200 extents).
  Logical volume ubuntu-vg/ubuntu-lv successfully resized.

Inspect resized logical volumes

Looks like the filesystem is filling up to closer to 85% (that’s when pods get evicted)

ssh root@k8s.cloudnative.nz lvs
  LV        VG        Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  ubuntu-lv ubuntu-vg -wi-ao---- 200.00g

Inspect physical volumes allocation

Looks like the filesystem is filling up to closer to 85% (that’s when pods get evicted)

ssh root@k8s.cloudnative.nz pvs
  PV         VG        Fmt  Attr PSize    PFree
  /dev/sda3  ubuntu-vg lvm2 a--  <463.73g <263.73g

Resize the root filesystem (on top of the now larger Logical Volume)

resize2fs /dev/mapper/ubuntu--vg-ubuntu--lv
resize2fs 1.46.5 (30-Dec-2021)
Filesystem at /dev/mapper/ubuntu--vg-ubuntu--lv is mounted on /; on-line resizing required
old_desc_blocks = 13, new_desc_blocks = 25
The filesystem on /dev/mapper/ubuntu--vg-ubuntu--lv is now 52428800 (4k) blocks long.

check free space at OS now that volume is extended

ssh root@k8s.cloudnative.nz df -h -t ext4
Filesystem                                              Size  Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-ubuntu--lv                       197G   79G  109G  42% /
/dev/sda2                                               2.0G  253M  1.6G  14% /boot
/dev/longhorn/pvc-c995ff5d-f177-4d8c-a88c-bbc830375827  7.8G  233M  7.6G   3% /var/lib/kubelet/pods/73537501-f49d-4a63-a07c-436bf71b5d5b/volumes/kubernetes.io~csi/pvc-c995ff5d-f177-4d8c-a88c-bbc830375827/mount

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions