Describe the bug
When running Nomad system job with Docker containers that use NBD (Network Block Device),
once the NBD remote connection is interrupted, the Linux kernel cannot clean up nbd devices properly.
Massive kernel errors keep spamming dmesg continuously:
- block nbdXX: Receive control failed (result -32)
- block nbdXX: Send disconnect failed -32
- I/O error, dev nbdX, sector XXX (READ)
- Buffer I/O error on dev nbdX
Error code -32 stands for EPIPE (broken pipe), the NBD socket connection is already closed,
but kernel nbd driver still tries to send disconnect commands and fails.
Zombie /dev/nbd* devices remain in the system, cannot be cleaned automatically.
Manual cleanup via sysfs force_offline + device/delete works temporarily,
but the issue reproduces after Nomad tasks restart and create new NBD devices.
Environment
- OS: [ Ubuntu 24.04]
- Kernel: 6.8.0-110-generic
- Nomad version: v1.10.5
- Docker version: 28.2.2
- Job type: system job, runs on all client nodes
- Container: timberio/vector (log collector)
Reproduction steps
- Deploy the attached Nomad job
logs-collector
- Let containers create and connect NBD devices
- Interrupt remote NBD network/storage service
- Check dmesg: continuous nbd errors appear
Expected behavior
- NBD driver should handle broken pipe (EPIPE) gracefully
- Zombie nbd devices should be released automatically
- No endless dmesg log spam after NBD connection lost
Temporary workaround
We've tried the standard workaround to clean up stale NBD devices via sysfs, but it's not effective. Does anyone in the community have a proven temporary workaround or kernel-level fix for this issue?
Additional logs
kern :err : [Tue May 26 15:21:50 2026] block nbd31: Receive control failed (result -32)
kern :info : [Tue May 26 15:21:50 2026] block nbd31: NBD_DISCONNECT
kern :err : [Tue May 26 15:21:50 2026] block nbd31: Send disconnect failed -32
kern :warn : [Tue May 26 15:21:50 2026] block nbd31: shutting down sockets
kern :info : [Tue May 26 15:21:53 2026] nbd5: detected capacity change from 0 to 9588336
kern :err : [Tue May 26 15:22:53 2026] block nbd5: Receive control failed (result -32)
kern :info : [Tue May 26 15:22:53 2026] block nbd5: NBD_DISCONNECT
kern :err : [Tue May 26 15:22:53 2026] block nbd5: Send disconnect failed -32
kern :warn : [Tue May 26 15:22:53 2026] block nbd5: shutting down sockets
kern :err : [Tue May 26 15:22:54 2026] I/O error, dev nbd5, sector 40704 op 0x0:(READ) flags 0x84700 phys_seg 16 prio class 2
kern :err : [Tue May 26 15:22:54 2026] I/O error, dev nbd5, sector 40960 op 0x0:(READ) flags 0x80700 phys_seg 16 prio class 2
kern :err : [Tue May 26 15:22:54 2026] I/O error, dev nbd5, sector 40704 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
kern :err : [Tue May 26 15:22:54 2026] Buffer I/O error on dev nbd5, logical block 5088, async page read
kern :info : [Tue May 26 15:25:14 2026] nbd32: detected capacity change from 0 to 14211536
kern :info : [Tue May 26 15:25:14 2026] nbd33: detected capacity change from 0 to 14396816
kern :info : [Tue May 26 15:25:14 2026] nbd34: detected capacity change from 0 to 10060264
kern :err : [Tue May 26 15:26:14 2026] block nbd32: Receive control failed (result -32)
kern :info : [Tue May 26 15:26:14 2026] block nbd32: NBD_DISCONNECT
kern :err : [Tue May 26 15:26:14 2026] block nbd32: Send disconnect failed -32
kern :warn : [Tue May 26 15:26:14 2026] block nbd32: shutting down sockets
kern :err : [Tue May 26 15:26:14 2026] block nbd33: Receive control failed (result -32)
kern :info : [Tue May 26 15:26:14 2026] block nbd33: NBD_DISCONNECT
kern :err : [Tue May 26 15:26:14 2026] block nbd33: Send disconnect failed -32
kern :err : [Tue May 26 15:26:14 2026] block nbd34: Receive control failed (result -32)
kern :warn : [Tue May 26 15:26:14 2026] block nbd33: shutting down sockets
kern :info : [Tue May 26 15:26:14 2026] block nbd34: NBD_DISCONNECT
kern :err : [Tue May 26 15:26:14 2026] block nbd34: Send disconnect failed -32
kern :warn : [Tue May 26 15:26:14 2026] block nbd34: shutting down sockets
kern :info : [Tue May 26 15:26:45 2026] nbd35: detected capacity change from 0 to 9945672
kern :info : [Tue May 26 15:27:09 2026] nbd36: detected capacity change from 0 to 9588984
kern :info : [Tue May 26 15:27:13 2026] nbd37: detected capacity change from 0 to 9944360
kern :err : [Tue May 26 15:27:45 2026] block nbd35: Receive control failed (result -32)
kern :info : [Tue May 26 15:27:45 2026] block nbd35: NBD_DISCONNECT
kern :err : [Tue May 26 15:27:45 2026] block nbd35: Send disconnect failed -32
kern :warn : [Tue May 26 15:27:45 2026] block nbd35: shutting down sockets
kern :err : [Tue May 26 15:28:09 2026] block nbd36: Receive control failed (result -32)
kern :info : [Tue May 26 15:28:09 2026] block nbd36: NBD_DISCONNECT
kern :err : [Tue May 26 15:28:09 2026] block nbd36: Send disconnect failed -32
kern :warn : [Tue May 26 15:28:09 2026] block nbd36: shutting down sockets
kern :err : [Tue May 26 15:28:13 2026] block nbd37: Receive control failed (result -32)
kern :info : [Tue May 26 15:28:13 2026] block nbd37: NBD_DISCONNECT
kern :err : [Tue May 26 15:28:13 2026] block nbd37: Send disconnect failed -32
kern :warn : [Tue May 26 15:28:13 2026] block nbd37: shutting down sockets
Describe the bug
When running Nomad system job with Docker containers that use NBD (Network Block Device),
once the NBD remote connection is interrupted, the Linux kernel cannot clean up nbd devices properly.
Massive kernel errors keep spamming dmesg continuously:
Error code -32 stands for EPIPE (broken pipe), the NBD socket connection is already closed,
but kernel nbd driver still tries to send disconnect commands and fails.
Zombie /dev/nbd* devices remain in the system, cannot be cleaned automatically.
Manual cleanup via sysfs force_offline + device/delete works temporarily,
but the issue reproduces after Nomad tasks restart and create new NBD devices.
Environment
Reproduction steps
logs-collectorExpected behavior
Temporary workaround
We've tried the standard workaround to clean up stale NBD devices via sysfs, but it's not effective. Does anyone in the community have a proven temporary workaround or kernel-level fix for this issue?
Additional logs
kern :err : [Tue May 26 15:21:50 2026] block nbd31: Receive control failed (result -32)
kern :info : [Tue May 26 15:21:50 2026] block nbd31: NBD_DISCONNECT
kern :err : [Tue May 26 15:21:50 2026] block nbd31: Send disconnect failed -32
kern :warn : [Tue May 26 15:21:50 2026] block nbd31: shutting down sockets
kern :info : [Tue May 26 15:21:53 2026] nbd5: detected capacity change from 0 to 9588336
kern :err : [Tue May 26 15:22:53 2026] block nbd5: Receive control failed (result -32)
kern :info : [Tue May 26 15:22:53 2026] block nbd5: NBD_DISCONNECT
kern :err : [Tue May 26 15:22:53 2026] block nbd5: Send disconnect failed -32
kern :warn : [Tue May 26 15:22:53 2026] block nbd5: shutting down sockets
kern :err : [Tue May 26 15:22:54 2026] I/O error, dev nbd5, sector 40704 op 0x0:(READ) flags 0x84700 phys_seg 16 prio class 2
kern :err : [Tue May 26 15:22:54 2026] I/O error, dev nbd5, sector 40960 op 0x0:(READ) flags 0x80700 phys_seg 16 prio class 2
kern :err : [Tue May 26 15:22:54 2026] I/O error, dev nbd5, sector 40704 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
kern :err : [Tue May 26 15:22:54 2026] Buffer I/O error on dev nbd5, logical block 5088, async page read
kern :info : [Tue May 26 15:25:14 2026] nbd32: detected capacity change from 0 to 14211536
kern :info : [Tue May 26 15:25:14 2026] nbd33: detected capacity change from 0 to 14396816
kern :info : [Tue May 26 15:25:14 2026] nbd34: detected capacity change from 0 to 10060264
kern :err : [Tue May 26 15:26:14 2026] block nbd32: Receive control failed (result -32)
kern :info : [Tue May 26 15:26:14 2026] block nbd32: NBD_DISCONNECT
kern :err : [Tue May 26 15:26:14 2026] block nbd32: Send disconnect failed -32
kern :warn : [Tue May 26 15:26:14 2026] block nbd32: shutting down sockets
kern :err : [Tue May 26 15:26:14 2026] block nbd33: Receive control failed (result -32)
kern :info : [Tue May 26 15:26:14 2026] block nbd33: NBD_DISCONNECT
kern :err : [Tue May 26 15:26:14 2026] block nbd33: Send disconnect failed -32
kern :err : [Tue May 26 15:26:14 2026] block nbd34: Receive control failed (result -32)
kern :warn : [Tue May 26 15:26:14 2026] block nbd33: shutting down sockets
kern :info : [Tue May 26 15:26:14 2026] block nbd34: NBD_DISCONNECT
kern :err : [Tue May 26 15:26:14 2026] block nbd34: Send disconnect failed -32
kern :warn : [Tue May 26 15:26:14 2026] block nbd34: shutting down sockets
kern :info : [Tue May 26 15:26:45 2026] nbd35: detected capacity change from 0 to 9945672
kern :info : [Tue May 26 15:27:09 2026] nbd36: detected capacity change from 0 to 9588984
kern :info : [Tue May 26 15:27:13 2026] nbd37: detected capacity change from 0 to 9944360
kern :err : [Tue May 26 15:27:45 2026] block nbd35: Receive control failed (result -32)
kern :info : [Tue May 26 15:27:45 2026] block nbd35: NBD_DISCONNECT
kern :err : [Tue May 26 15:27:45 2026] block nbd35: Send disconnect failed -32
kern :warn : [Tue May 26 15:27:45 2026] block nbd35: shutting down sockets
kern :err : [Tue May 26 15:28:09 2026] block nbd36: Receive control failed (result -32)
kern :info : [Tue May 26 15:28:09 2026] block nbd36: NBD_DISCONNECT
kern :err : [Tue May 26 15:28:09 2026] block nbd36: Send disconnect failed -32
kern :warn : [Tue May 26 15:28:09 2026] block nbd36: shutting down sockets
kern :err : [Tue May 26 15:28:13 2026] block nbd37: Receive control failed (result -32)
kern :info : [Tue May 26 15:28:13 2026] block nbd37: NBD_DISCONNECT
kern :err : [Tue May 26 15:28:13 2026] block nbd37: Send disconnect failed -32
kern :warn : [Tue May 26 15:28:13 2026] block nbd37: shutting down sockets