Zach noted that space.cloudnative.nz was down.
When available storage drops below 15% on that disk, pods are evicted (deleted).
This affected use due to 85% utilization of the OS / Ubuntu level files system used for imagefs
See https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#hard-eviction-thresholds
Temporary fix was to double the space (100GB to 200GB) allocated to the root Ubuntu logical volume from the physical 500GB volume.
Long term fix will be to setup our nodes with a dedicated imagesfs volume and monitor utilization.
Website down:
curl https://space.cloudnative.nz --head | grep HTTP
Storage issue at 85%:
ssh root@k8s.cloudnative.nz df -h -t ext4
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-ubuntu--lv 98G 79G 15G 85% /
/dev/sda2 2.0G 253M 1.6G 14% /boot
/dev/longhorn/pvc-c995ff5d-f177-4d8c-a88c-bbc830375827 7.8G 233M 7.6G 3% /var/lib/kubelet/pods/73537501-f49d-4a63-a07c-436bf71b5d5b/volumes/kubernetes.io~csi/pvc-c995ff5d-f177-4d8c-a88c-bbc830375827/mount
Doubled Storage... usage now at 43%:
ssh root@k8s.cloudnative.nz df -h -t ext4 /
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-ubuntu--lv 197G 79G 109G 43% /
Website up:
curl https://space.cloudnative.nz --head | grep HTTP
Background Reading
Ephemeral storage
https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#local-ephemeral-storage https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#configurations-for-local-ephemeral-storage
Eviction
https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/ Node-pressure eviction is the process by which the kubelet proactively terminates pods to reclaim resources on nodes.
A hard eviction threshold has no grace period. When a hard eviction threshold is met, the kubelet kills pods immediately without graceful termination to reclaim the starved resource.
The kubelet has the following default hard eviction thresholds:
- memory.available<100Mi
- nodefs.available<10%
- imagefs.available<15%
- nodefs.inodesFree<5% (Linux nodes)
These default values of hard eviction thresholds will only be set if none of the parameters is changed. If you changed the value of any parameter, then the values of other parameters will not be inherited as the default values and will be set to zero. In order to provide custom values, you should provide all the thresholds respectively.
The kubelet reports node conditions to reflect that the node is under pressure because hard or soft eviction threshold is met, independent of configured grace periods.
- DiskPressure
- nodefs.available, nodefs.inodesFree, imagefs.available, or imagefs.inodesFree
- Available disk space and inodes on either the node’s root filesystem or image filesystem has satisfied an eviction threshold
Check that it’s down
curl https://space.cloudnative.nz --head | grep HTTP
check on coder ingress
kubectl -n coder get ingress
NAME CLASS HOSTS ADDRESS PORTS AGE
coder nginx space.cloudnative.nz,*.cloudnative.nz 123.253.178.101 80, 443 10d
check on coder ingress.spec.rules[0].http.paths
Here we look for the http paths that route / to a backend service
kubectl -n coder get ingress coder -o yaml \
| yq '.spec.rules[0].http.paths'
- backend:
service:
name: coder
port:
name: http
path: /
pathType: Prefix
check on coder svc
kubectl -n coder get svc coder
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
coder ClusterIP 10.104.202.123 <none> 80/TCP 10d
determine coder svc ports
kubectl -n coder get svc coder -o yaml \
| yq .spec.ports
- name: http
port: 80
protocol: TCP
targetPort: http
determine coder svc selector
kubectl -n coder get svc coder -o yaml \
| yq .spec.selector
app.kubernetes.io/instance: coder
app.kubernetes.io/name: coder
search for coder svc target pods
kubectl -n coder get pods -l app.kubernetes.io/name=coder
NAME READY STATUS RESTARTS AGE
coder-7996486845-6cph8 0/1 ContainerStatusUnknown 1 75m
coder-7996486845-bkffz 0/1 ContainerStatusUnknown 1 114m
coder-7996486845-bqmqp 0/1 ContainerStatusUnknown 1 30m
coder-7996486845-cf577 0/1 ContainerStatusUnknown 1 121m
coder-7996486845-dqnn8 1/1 Running 0 14m
coder-7996486845-dsrbr 0/1 ContainerStatusUnknown 1 46m
coder-7996486845-ptc6n 0/1 ContainerStatusUnknown 1 107m
coder-7996486845-rtgcj 0/1 ContainerStatusUnknown 1 153m
coder-7996486845-rvkjx 0/1 ContainerStatusUnknown 1 92m
coder-7996486845-sdz9n 0/1 ContainerStatusUnknown 1 70m
coder-7996486845-vdgr9 0/1 ContainerStatusUnknown 1 137m
coder-7996486845-x5cvp 0/1 ContainerStatusUnknown 6 (2d8h ago) 4d11h
coder-7996486845-xz6b7 0/1 ContainerStatusUnknown 1 101m
inspect Events for pods that seem to be having issues
kubectl -n coder events --for=pod/coder-7996486845-bqmqp
LAST SEEN TYPE REASON OBJECT MESSAGE
30m (x2 over 35m) Warning FailedScheduling Pod/coder-7996486845-bqmqp 0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
29m Normal Scheduled Pod/coder-7996486845-bqmqp Successfully assigned coder/coder-7996486845-bqmqp to srv1
29m Normal Pulling Pod/coder-7996486845-bqmqp Pulling image "ghcr.io/coder/coder:v0.27.1"
28m Normal Pulled Pod/coder-7996486845-bqmqp Successfully pulled image "ghcr.io/coder/coder:v0.27.1" in 14.957685446s (14.957810454s including waiting)
28m Normal Created Pod/coder-7996486845-bqmqp Created container coder
28m Normal Started Pod/coder-7996486845-bqmqp Started container coder
28m (x2 over 28m) Warning Unhealthy Pod/coder-7996486845-bqmqp Readiness probe failed: Get "http://10.0.0.119:8080/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
19m Warning Evicted Pod/coder-7996486845-bqmqp The node was low on resource: ephemeral-storage. Threshold quantity: 15763389861, available: 14492980Ki. Container coder was using 421700Ki, request is 0, has larger consumption of ephemeral-storage.
19m Normal Killing Pod/coder-7996486845-bqmqp Stopping container coder
19m Warning ExceededGracePeriod Pod/coder-7996486845-bqmqp Container runtime did not kill the pod within specified grace period.
inspect status for pods that seem to be having issues
kubectl -n coder get pod/coder-7996486845-bqmqp -o yaml \
| yq .status \
| grep ^message:\\\|^phase:\\\|^reason:
message: 'The node was low on resource: ephemeral-storage. Threshold quantity: 15763389861, available: 14492980Ki. Container coder was using 421700Ki, request is 0, has larger consumption of ephemeral-storage. '
phase: Failed
reason: Evicted
inspect status.containerStatuses for pods that seem to be having issues
kubectl -n coder get pod/coder-7996486845-bqmqp -o yaml \
| yq .status.containerStatuses.0
image: ghcr.io/coder/coder:v0.27.1
imageID: ""
lastState:
terminated:
exitCode: 137
finishedAt: null
message: The container could not be located when the pod was deleted. The container used to be Running
reason: ContainerStatusUnknown
startedAt: null
name: coder
ready: false
restartCount: 1
started: false
state:
terminated:
exitCode: 137
finishedAt: null
message: The container could not be located when the pod was terminated
reason: ContainerStatusUnknown
startedAt: null
inspect status.conditions for pods that seem to be having issues
kubectl -n coder get pod/coder-7996486845-bqmqp -o yaml \
| yq .status.conditions.0
lastProbeTime: null
lastTransitionTime: "2023-07-28T06:44:40Z"
message: 'The node was low on resource: ephemeral-storage. Threshold quantity: 15763389861, available: 14492980Ki. Container coder was using 421700Ki, request is 0, has larger consumption of ephemeral-storage. '
reason: TerminationByKubelet
status: "True"
type: DisruptionTarget
figure out node for broken pod
kubectl -n coder get pod/coder-7996486845-bqmqp -o jsonpath="{.spec.nodeName}"
get nodes
NAME STATUS ROLES AGE VERSION
srv1 Ready control-plane 10d v1.27.3
events for node
kubectl events -A --for=node/srv1
NAMESPACE LAST SEEN TYPE REASON OBJECT MESSAGE
default 60m Warning FreeDiskSpaceFailed Node/srv1 Failed to garbage collect required amount of images. Attempted to free 5100226969 bytes, but only found 4423240768 bytes eligible to free.
longhorn 52m Warning Schedulable Node/srv1 the disk default-disk-e4eb62364051e56c(/var/lib/longhorn/) on the node srv1 has 25585254400 available, but requires reserved 31526778470, minimal 25% to schedule more replicas
default 44m Warning FreeDiskSpaceFailed Node/srv1 Failed to garbage collect required amount of images. Attempted to free 5104617881 bytes, but only found 4423240768 bytes eligible to free.
longhorn 38m (x2 over 44h) Warning Schedulable Node/srv1 the disk default-disk-e4eb62364051e56c(/var/lib/longhorn/) on the node srv1 has 26109542400 available, but requires reserved 31526778470, minimal 25% to schedule more replicas
default 34m Warning FreeDiskSpaceFailed Node/srv1 Failed to garbage collect required amount of images. Attempted to free 5029218713 bytes, but only found 301773 bytes eligible to free.
default 29m Warning FreeDiskSpaceFailed Node/srv1 Failed to garbage collect required amount of images. Attempted to free 5111343513 bytes, but only found 4423240768 bytes eligible to free.
longhorn 21m (x2 over 84m) Warning Schedulable Node/srv1 the disk default-disk-e4eb62364051e56c(/var/lib/longhorn/) on the node srv1 has 26214400000 available, but requires reserved 31526778470, minimal 25% to schedule more replicas
default 17m (x16 over 24h) Normal NodeHasDiskPressure Node/srv1 Node srv1 status is now: NodeHasDiskPressure
longhorn 17m (x929 over 24h) Warning Ready Node/srv1 Kubernetes node srv1 has pressure: KubeletHasDiskPressure, kubelet has disk pressure
longhorn 5m (x1037 over 2d9h) Normal Ready Node/srv1 Node srv1 is ready
default 4m18s (x2379 over 2d16h) Normal NodeHasNoDiskPressure Node/srv1 Node srv1 status is now: NodeHasNoDiskPressure
default 2m11s (x72 over 24h) Warning EvictionThresholdMet Node/srv1 Attempting to reclaim ephemeral-storage
node.spec.taints
kubectl get node srv1 -o yaml \
| yq .spec.taints
- effect: NoSchedule
key: node.kubernetes.io/disk-pressure
timeAdded: "2023-07-28T07:38:40Z"
node.status.allocatable
kubectl get node srv1 -o yaml \
| yq .status.allocatable
cpu: "24"
ephemeral-storage: "94580335255"
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 197909196Ki
pods: "110"
node.status.capacity
kubectl get node srv1 -o yaml \
| yq .status.capacity
cpu: "24"
ephemeral-storage: 102626232Ki
hugepages-1Gi: "0"
hugepages-2Mi: "0"
memory: 198011596Ki
pods: "110"
node.status.conditions
kubectl get node srv1 -o yaml \
| yq .status.conditions
- lastHeartbeatTime: "2023-07-17T14:56:32Z"
lastTransitionTime: "2023-07-17T14:56:32Z"
message: Cilium is running on this node
reason: CiliumIsUp
status: "False"
type: NetworkUnavailable
- lastHeartbeatTime: "2023-07-28T07:45:28Z"
lastTransitionTime: "2023-07-25T22:14:35Z"
message: kubelet has sufficient memory available
reason: KubeletHasSufficientMemory
status: "False"
type: MemoryPressure
- lastHeartbeatTime: "2023-07-28T07:45:28Z"
lastTransitionTime: "2023-07-28T07:44:38Z"
message: kubelet has no disk pressure
reason: KubeletHasNoDiskPressure
status: "False"
type: DiskPressure
- lastHeartbeatTime: "2023-07-28T07:45:28Z"
lastTransitionTime: "2023-07-25T22:14:35Z"
message: kubelet has sufficient PID available
reason: KubeletHasSufficientPID
status: "False"
type: PIDPressure
- lastHeartbeatTime: "2023-07-28T07:45:28Z"
lastTransitionTime: "2023-07-25T22:14:35Z"
message: kubelet is posting ready status. AppArmor enabled
reason: KubeletReady
status: "True"
type: Ready
node.status.condition of interest (DiskPressure)
kubectl get node srv1 -o yaml \
| yq .status.conditions.2
lastHeartbeatTime: "2023-07-28T08:04:53Z"
lastTransitionTime: "2023-07-28T08:00:28Z"
message: kubelet has disk pressure
reason: KubeletHasDiskPressure
status: "True"
type: DiskPressure
node.stats.runtime
kubectl get --raw "/api/v1/nodes/srv1/proxy/stats/summary" \
| yq -P .node.runtime
imageFs:
time: "2023-07-28T08:01:03Z"
availableBytes: 15480643584
capacityBytes: 105089261568
usedBytes: 46888923136
inodesFree: 4701743
inodes: 6553600
inodesUsed: 1577494
node.stats.fs
kubectl get --raw "/api/v1/nodes/srv1/proxy/stats/summary" \
| yq -P .node.fs
time: "2023-07-28T08:01:33Z"
availableBytes: 15624622080
capacityBytes: 105089261568
usedBytes: 84079153152
inodesFree: 4703165
inodes: 6553600
inodesUsed: 1850435
Take a look at node ext4 filesystem from OS level
Looks like the filesystem is filling up to closer to 85% (that’s when pods get evicted)
ssh root@k8s.cloudnative.nz df -h -t ext4
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-ubuntu--lv 98G 79G 15G 85% /
/dev/sda2 2.0G 253M 1.6G 14% /boot
/dev/longhorn/pvc-c995ff5d-f177-4d8c-a88c-bbc830375827 7.8G 233M 7.6G 3% /var/lib/kubelet/pods/73537501-f49d-4a63-a07c-436bf71b5d5b/volumes/kubernetes.io~csi/pvc-c995ff5d-f177-4d8c-a88c-bbc830375827/mount
extend the root logical volume
Looks like the filesystem is filling up to closer to 85% (that’s when pods get evicted)
lvextend -L200G /dev/mapper/ubuntu--vg-ubuntu--lv
Size of logical volume ubuntu-vg/ubuntu-lv changed from 100.00 GiB (25600 extents) to 200.00 GiB (51200 extents).
Logical volume ubuntu-vg/ubuntu-lv successfully resized.
Inspect resized logical volumes
Looks like the filesystem is filling up to closer to 85% (that’s when pods get evicted)
ssh root@k8s.cloudnative.nz lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
ubuntu-lv ubuntu-vg -wi-ao---- 200.00g
Inspect physical volumes allocation
Looks like the filesystem is filling up to closer to 85% (that’s when pods get evicted)
ssh root@k8s.cloudnative.nz pvs
PV VG Fmt Attr PSize PFree
/dev/sda3 ubuntu-vg lvm2 a-- <463.73g <263.73g
Resize the root filesystem (on top of the now larger Logical Volume)
resize2fs /dev/mapper/ubuntu--vg-ubuntu--lv
resize2fs 1.46.5 (30-Dec-2021)
Filesystem at /dev/mapper/ubuntu--vg-ubuntu--lv is mounted on /; on-line resizing required
old_desc_blocks = 13, new_desc_blocks = 25
The filesystem on /dev/mapper/ubuntu--vg-ubuntu--lv is now 52428800 (4k) blocks long.
check free space at OS now that volume is extended
ssh root@k8s.cloudnative.nz df -h -t ext4
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-ubuntu--lv 197G 79G 109G 42% /
/dev/sda2 2.0G 253M 1.6G 14% /boot
/dev/longhorn/pvc-c995ff5d-f177-4d8c-a88c-bbc830375827 7.8G 233M 7.6G 3% /var/lib/kubelet/pods/73537501-f49d-4a63-a07c-436bf71b5d5b/volumes/kubernetes.io~csi/pvc-c995ff5d-f177-4d8c-a88c-bbc830375827/mount
Zach noted that space.cloudnative.nz was down.
When available storage drops below 15% on that disk, pods are evicted (deleted).
This affected use due to 85% utilization of the OS / Ubuntu level files system used for imagefs
See https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#hard-eviction-thresholds
Temporary fix was to double the space (100GB to 200GB) allocated to the root Ubuntu logical volume from the physical 500GB volume.
Long term fix will be to setup our nodes with a dedicated imagesfs volume and monitor utilization.
Website down:
curl https://space.cloudnative.nz --head | grep HTTPStorage issue at 85%:
Doubled Storage... usage now at 43%:
Website up:
curl https://space.cloudnative.nz --head | grep HTTPBackground Reading
Ephemeral storage
https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#local-ephemeral-storage https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#configurations-for-local-ephemeral-storage
Eviction
https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/ Node-pressure eviction is the process by which the kubelet proactively terminates pods to reclaim resources on nodes.
https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#hard-eviction-thresholds
A hard eviction threshold has no grace period. When a hard eviction threshold is met, the kubelet kills pods immediately without graceful termination to reclaim the starved resource.
The kubelet has the following default hard eviction thresholds:
These default values of hard eviction thresholds will only be set if none of the parameters is changed. If you changed the value of any parameter, then the values of other parameters will not be inherited as the default values and will be set to zero. In order to provide custom values, you should provide all the thresholds respectively.
https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#node-conditions
The kubelet reports node conditions to reflect that the node is under pressure because hard or soft eviction threshold is met, independent of configured grace periods.
Check that it’s down
curl https://space.cloudnative.nz --head | grep HTTPcheck on coder ingress
check on coder ingress.spec.rules[0].http.paths
Here we look for the http paths that route / to a backend service
kubectl -n coder get ingress coder -o yaml \ | yq '.spec.rules[0].http.paths'check on coder svc
determine coder svc ports
kubectl -n coder get svc coder -o yaml \ | yq .spec.portsdetermine coder svc selector
kubectl -n coder get svc coder -o yaml \ | yq .spec.selectorsearch for coder svc target pods
inspect Events for pods that seem to be having issues
inspect status for pods that seem to be having issues
kubectl -n coder get pod/coder-7996486845-bqmqp -o yaml \ | yq .status \ | grep ^message:\\\|^phase:\\\|^reason:inspect status.containerStatuses for pods that seem to be having issues
kubectl -n coder get pod/coder-7996486845-bqmqp -o yaml \ | yq .status.containerStatuses.0inspect status.conditions for pods that seem to be having issues
kubectl -n coder get pod/coder-7996486845-bqmqp -o yaml \ | yq .status.conditions.0figure out node for broken pod
kubectl -n coder get pod/coder-7996486845-bqmqp -o jsonpath="{.spec.nodeName}"srv1get nodes
events for node
node.spec.taints
kubectl get node srv1 -o yaml \ | yq .spec.taintsnode.status.allocatable
kubectl get node srv1 -o yaml \ | yq .status.allocatablenode.status.capacity
kubectl get node srv1 -o yaml \ | yq .status.capacitynode.status.conditions
kubectl get node srv1 -o yaml \ | yq .status.conditionsnode.status.condition of interest (DiskPressure)
kubectl get node srv1 -o yaml \ | yq .status.conditions.2node.stats.runtime
node.stats.fs
Take a look at node ext4 filesystem from OS level
Looks like the filesystem is filling up to closer to 85% (that’s when pods get evicted)
extend the root logical volume
Looks like the filesystem is filling up to closer to 85% (that’s when pods get evicted)
Inspect resized logical volumes
Looks like the filesystem is filling up to closer to 85% (that’s when pods get evicted)
Inspect physical volumes allocation
Looks like the filesystem is filling up to closer to 85% (that’s when pods get evicted)
Resize the root filesystem (on top of the now larger Logical Volume)
check free space at OS now that volume is extended