Skip to content

fix(utils): use parent directory fd for bind on long Unix socket paths#2050

Merged
giuseppe merged 1 commit intocontainers:mainfrom
jnovy:fix/long-unix-socket-bind
Mar 16, 2026
Merged

fix(utils): use parent directory fd for bind on long Unix socket paths#2050
giuseppe merged 1 commit intocontainers:mainfrom
jnovy:fix/long-unix-socket-bind

Conversation

@jnovy
Copy link
Contributor

@jnovy jnovy commented Mar 16, 2026

Problem

open_unix_domain_socket() fails with EADDRINUSE (or silently binds to the wrong path) when the socket path exceeds the 108-byte sun_path limit of struct sockaddr_un.

The current code attempts to work around long paths by using /proc/self/fd/N where N is the socket file descriptor itself:

if (strlen(path) >= sizeof(addr.sun_path)) {
    get_proc_self_fd_path(name_buf, fd);  // fd = the socket fd
    path = name_buf;
}

This is incorrect because /proc/self/fd/<socket_fd> points to socket:[inode], not to a filesystem path. The kernel cannot resolve this to a bindable address, so bind() fails. This affects the sd-notify socket path which is constructed as <state_dir>/notify/notify and can easily exceed 108 bytes with long container IDs or deep XDG_RUNTIME_DIR paths.

Fix

Instead of using the socket's own fd, open the parent directory of the target path with O_PATH | O_DIRECTORY, then construct the bind address as /proc/self/fd/<dirfd>/<basename>. The kernel resolves /proc/self/fd/<dirfd> to the actual directory path via procfs, keeping the final sun_path well within the 108-byte limit (typically ~30 bytes for the /proc/self/fd/N/ prefix plus the basename).

The fix:

  1. Splits the long path into parent directory + basename using dirname()/strrchr()
  2. Opens the parent directory with open(parent, O_PATH | O_DIRECTORY | O_CLOEXEC)
  3. Constructs sun_path as /proc/self/fd/<dirfd>/<basename>
  4. Validates the constructed path still fits in sun_path before calling bind()

Short paths (< 108 bytes) are unaffected and follow the original direct copy path.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical issue where Unix domain sockets failed to bind or bound incorrectly when their paths exceeded the struct sockaddr_un's sun_path limit of 108 bytes. The solution replaces an ineffective workaround with a robust procfs-based approach, leveraging a file descriptor to the parent directory to shorten the effective bind path, thereby enhancing the reliability of socket creation for long paths without affecting short ones.

Highlights

  • Addressed EADDRINUSE for long Unix socket paths: The open_unix_domain_socket function previously failed or bound incorrectly when socket paths exceeded the 108-byte sun_path limit, particularly for sd-notify sockets.
  • Corrected long path handling mechanism: The prior workaround using /proc/self/fd/<socket_fd> was ineffective as it pointed to an inode, not a bindable filesystem path.
  • Implemented robust procfs-based path resolution: The fix now opens the parent directory with O_PATH | O_DIRECTORY and constructs the sun_path as /proc/self/fd/<dirfd>/<basename>, ensuring the path fits within the sun_path limit.
  • Maintained compatibility for short paths: Socket paths shorter than 108 bytes are handled as before, without changes to their binding mechanism.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/libcrun/utils.c
    • Included libgen.h for path manipulation functions.
    • Refactored open_unix_domain_socket to correctly handle Unix socket paths exceeding sun_path length by using a procfs-based approach involving the parent directory's file descriptor.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a clever fix for handling long Unix domain socket paths by using a file descriptor to the parent directory. The overall approach is sound. However, the new implementation for parsing the path introduces a bug when handling paths with trailing slashes, which could lead to incorrect behavior. I have provided a suggestion to fix this correctness issue. Additionally, I've included a recommendation to refactor the path construction logic for improved safety and readability.

Comment on lines +1105 to +1108
cleanup_free char *path_copy = xstrdup (path);
const char *parent_dir = dirname (path_copy);
const char *base = strrchr (path, '/');
base = base ? base + 1 : path;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic to extract the basename using strrchr on the original path is flawed when the path has trailing slashes. For a path like /tmp/socket/, strrchr finds the trailing /, and base becomes an empty string, which is not the intended socket name.

To correctly handle all path variations, it's safer to use basename() from <libgen.h>, which is designed to handle these edge cases. Since both dirname() and basename() can modify their string arguments, you should operate on separate copies of the path to ensure correctness.

      cleanup_free char *dpath = xstrdup (path);
      cleanup_free char *bpath = xstrdup (path);
      const char *parent_dir = dirname (dpath);
      const char *base = basename (bpath);

Comment on lines +1116 to +1123
size_t proc_len = strlen (name_buf);
size_t base_len = strlen (base);
if (proc_len + 1 + base_len >= sizeof (addr.sun_path))
return crun_make_error (err, ENAMETOOLONG, "socket path too long: `%s`", path);

memcpy (addr.sun_path, name_buf, proc_len);
addr.sun_path[proc_len] = '/';
memcpy (addr.sun_path + proc_len + 1, base, base_len + 1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The manual construction of the socket path using strlen and memcpy can be simplified and made safer by using snprintf. This function handles buffer boundaries and null termination automatically, reducing the risk of off-by-one errors and improving readability.

      int n = snprintf (addr.sun_path, sizeof (addr.sun_path), "%s/%s", name_buf, base);
      if (n < 0 || (size_t) n >= sizeof (addr.sun_path))
        return crun_make_error (err, ENAMETOOLONG, "socket path too long: `%s`", path);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnovy what do you think about this suggestion though?

@giuseppe
Copy link
Member

This is incorrect because /proc/self/fd/<socket_fd> points to socket:[inode], not to a filesystem path. The kernel cannot resolve this to a bindable address, so bind() fails. This affects the sd-notify socket path which is constructed as <state_dir>/notify/notify and can easily exceed 108 bytes with long container IDs or deep XDG_RUNTIME_DIR paths.

do we have any issue or reproducer for this? I've never seen it before

@jnovy
Copy link
Contributor Author

jnovy commented Mar 16, 2026

@giuseppe There's no filed issue - I found this while running the test suite from a deep working directory. The reproducer is straightforward:

When crun is built and tested from a long path like /home/jnovy/tests/rh/crun/testsuite/1234567890/workspace/crun (61 chars), the notify socket path becomes:

<cwd>/.testsuite-run-<pid>/root/<container-id>/notify/notify

which is ~117 bytes - exceeding the 108-byte sun_path limit.

The existing long-path fallback in open_unix_domain_socket() does get_proc_self_fd_path(name_buf, fd) using the socket fd itself. But /proc/self/fd/N for an unbound socket resolves to socket:[inode], not a filesystem path, so bind() fails with EADDRINUSE.

Note that open_unix_domain_client_socket() already has a correct implementation - it opens the parent directory with O_PATH | O_DIRECTORY, then binds via /proc/self/fd/<dirfd>/<basename>. This PR applies the same pattern to the server-side open_unix_domain_socket().

To reproduce:

mkdir -p /tmp/deep/nested/path/that/is/long/enough/to/exceed/the/limit/for/sun_path
cd /tmp/deep/nested/path/that/is/long/enough/to/exceed/the/limit/for/sun_path
git clone https://github.com/containers/crun.git
cd crun
./autogen.sh && ./configure && make
cd tests
./tests_libcrun_utils

The test_open_unix_domain_socket test case will fail without this fix.

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

When the Unix domain socket path exceeds sun_path limit (108 bytes),
open_unix_domain_socket() was incorrectly using /proc/self/fd/N of the
socket fd itself as the bind address. This caused "Address already in
use" errors because /proc/self/fd/N for a socket does not resolve to
a filesystem path suitable for bind().

Fix by opening the parent directory with O_PATH and constructing the
bind address as /proc/self/fd/N/basename, matching the approach already
used in open_unix_domain_client_socket() for connect().

This fixes the sd-notify, sd-notify-file, and sd-notify-env test
failures that occurred when the test working directory path was long
enough to exceed the sun_path limit.

Signed-off-by: Jindrich Novy <jnovy@redhat.com>
@jnovy jnovy force-pushed the fix/long-unix-socket-bind branch from a70f55a to cf27a14 Compare March 16, 2026 14:51
Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@giuseppe giuseppe merged commit 25b0107 into containers:main Mar 16, 2026
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants