Skip to content

fix symlink resolution for noble#200

Open
dudejas wants to merge 3 commits intocloudfoundry:masterfrom
dudejas:fix/noble-symlink-resolution
Open

fix symlink resolution for noble#200
dudejas wants to merge 3 commits intocloudfoundry:masterfrom
dudejas:fix/noble-symlink-resolution

Conversation

@dudejas
Copy link

@dudejas dudejas commented Jan 9, 2026

Fixes #199

Copy link
Member

@aramprice aramprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes, this look good code wise. I would appreciate some other folks looking at it as well, especially to think over any potential security changes from this.

Before this change we allowed an inside-container-mount of an outside-the-container-symlink to resolve at container runtime, from inside the container to the outside-container-symlink-target.

With this change bpm will pre-resolve, not at container runtime, an outside-the-container-symlink to its outside-the-container-symlink-target (not mediated by the container subsystem), and then an inside-the-container-mount will be created which points directly to the outside-the-container-symlink-target

I think this change introduces a difference in the time at which a symlink is resolved (container-creation vs. ad hoc during container runtime)... and I'm curious if this is consequential to the security posture of bpm. I don't think this is the case for the issue mentioned in #199...

I'm wondering if this could have implications in arbitrary cases?

@aramprice aramprice requested review from a team, lnguyen and ragaskar and removed request for a team January 20, 2026 19:25
@dudejas
Copy link
Author

dudejas commented Jan 27, 2026

Summary

This fix resolves symlink handling for bind mounts in BPM containers on Noble (Ubuntu 24.04) stemcells. While initially observed on GCP, this affects all major cloud providers when using instance types without ephemeral storage, which is becoming increasingly common due to cost optimization.

The Problem

When BOSH builds stemcells for instances without ephemeral disks (CreatePartitionIfNoEphemeralDisk=true), it creates:

/var/vcap/packages → /var/vcap/data/packages

Noble's stricter mount namespace handling causes BPM bind mounts to fail on symlinked paths, resulting in permission denied errors.

The Fix

Use filepath.EvalSymlinks() to resolve symlinks before creating bind mounts. This safely handles both symlinked and non-symlinked paths with no performance impact.

Multi-Cloud Evidence

Instance Types Without Ephemeral Storage

Cloud Instance Types Evidence Source
GCP n1, n2, e2 (standard) Local SSDs must be explicitly requested, not default Docs
AWS T3, T4g, M6i, M7i, C6i Marked "EBS-Only" - no instance store Docs
Azure Dv5, Dsv5, Ev5, Esv5 "Local Storage: None" Docs
OpenStack Flavors with ephemeral_gb=0 Optional parameter, defaults to 0 Docs
AliCloud g9i, g9a, g8i, g7, g6 Cloud storage only Docs

All Stemcell Builders Use This Pattern

Cloud Stemcell Builder Config
GCP bosh_google_agent_settings/apply.sh#L11
AWS bosh_aws_agent_settings/apply.sh#L13
Azure bosh_azure_agent_settings/apply.sh#L11
OpenStack bosh_openstack_agent_settings/apply.sh#L14
AliCloud bosh_alicloud_agent_settings/apply.sh#L12

Why Merge This

  1. Multi-cloud impact: Affects GCP, AWS, Azure, OpenStack, and AliCloud with commonly-used instance types
  2. Safe: filepath.EvalSymlinks() are returned unchanged for non-symlinks, so no impact on traditional configurations
  3. Preventative: Noble stemcells are rolling out now; this prevents widespread failures
  4. Best practice: Resolving paths before mounting is standard in container runtimes
  5. No cost: Negligible performance impact (one-time at container creation)

Conclusion

This isn't GCP-specific, it happens across all clouds as they move toward storage-optimized instances. The fix is universal for all cloud providers, safe, and prevents failures as we roll out Noble stemcells.

@aramprice aramprice requested a review from rkoster January 29, 2026 16:13
@aramprice aramprice moved this from Inbox to Pending Review | Discussion in Foundational Infrastructure Working Group Jan 29, 2026
@aramprice
Copy link
Member

Possibly look into https://ebpf.io/ changes between Jammy and Noble as a possible root cause for this (new in Noble?) symlink resolution problem. per @rkoster

@dudejas
Copy link
Author

dudejas commented Feb 5, 2026

Thanks @aramprice! I looked into this and it's actually kernel 6.8 mount namespace changes, not eBPF.

From the LWN article on mounting images in user namespaces:

Christian Brauner pointed out that the superblock is not owned by the user namespace where the mount is being done, "which means that all of the destructive ioctl()s" that exist for Btrfs or XFS are not available to the container. But the container does own the mount, which means it can unmount it. The ownership of the mount is separate from the ownership of the superblock, he said, which is a nice side effect.

This separation of mount vs superblock ownership in kernel 6.8 is probably what's preventing symlink resolution across namespace boundaries. This PR pre-resolves symlinks on the host side.

@aramprice aramprice requested a review from mariash February 12, 2026 16:06
Comment on lines +39 to +41
resolvedFrom := from
if resolved, err := filepath.EvalSymlinks(from); err == nil {
resolvedFrom = resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If EvalSymlinks fails (e.g., path doesn't exist yet, permission denied), the error is silently ignored and the original path is used. This works for backward compatibility, but it means symlink resolution failure might make mount issues hard to diagnose.

I'm not sure the tradeoff of adding logging down at this level is the right move though....

@aramprice
Copy link
Member

Perhaps symlinks could be resolved in BuildSpec just before specbuilder.Build() is called runc/adapter/adapter.go#L260-L274

This would make it possible to log issues with symlink evaluation, and potentially, limit the symlink resolution to a set of white-listed paths (like packageDir and dataPackageDir). This method also already throws errors which would allow a symlink resolution error as in in runc/adapter/mount.go#L40-L42 to be surfaced rather than swallowed.

@aramprice aramprice moved this from Pending Review | Discussion to Waiting for Changes | Open for Contribution in Foundational Infrastructure Working Group Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Waiting for Changes | Open for Contribution

Development

Successfully merging this pull request may close these issues.

Fix Symlink Resolution in Bind Mounts for Noble Stemcell Compatibility

2 participants