[occm] Tag load balancers with cluster identity to prevent name collisions by enginrect · Pull Request #3103 · kubernetes/cloud-provider-openstack

enginrect · 2026-04-30T00:54:36Z

What this PR does / why we need it:

OCCM identifies an existing Octavia load balancer for a Service by name on
the first reconcile (via getLoadbalancerByName). The name format
kube_service_<cluster-name>_<namespace>_<service> defaults to a
<cluster-name> of kubernetes, so two Kubernetes clusters in the same
OpenStack project that happen to use the default cluster-name and have
Services with identical namespace/name produce identical load balancer
names. Octavia does not enforce uniqueness of names, so OCCM in cluster B
ends up adopting and overwriting cluster A's load balancer. This has been
reported repeatedly (see #2241, #2571, #2624) and the standing guidance
"set a unique --cluster-name" is correct but does not actually defend
against the failure mode.

This PR adds a stable Kubernetes cluster identifier - the UID of the
kube-system namespace - as a load balancer tag of the form
kube_cluster_id_<uid>. Lookup behaviour:

LBs that carry the matching kube_cluster_id_<our-uid> tag are kept.
LBs that carry no kube_cluster_id_* tag fall back to the legacy
behaviour (preserves existing deployments and externally-created LBs).
LBs that carry only foreign kube_cluster_id_* tags are treated as
NotFound, with a warning. OCCM will then create its own load balancer
rather than overwriting one that belongs to another cluster.

The cluster UID is read once at controller-manager start-up. If the
lookup fails (RBAC denial, missing namespace, etc.) the safeguard is
disabled and OCCM falls back to the legacy name-based behaviour, so the
change is strictly additive. Pre-existing load balancers also gain the
kube_cluster_id_* tag during the next reconciliation.

Which issue this PR fixes(if applicable):
fixes #3102

Special notes for reviewers:

Backward compatibility:
- Load balancers without any kube_cluster_id_* tag keep the previous
  behaviour. They are tagged on the next successful reconcile.
- Load balancers looked up via the existing
  loadbalancer.openstack.org/load-balancer-id annotation (i.e. on
  every reconcile after the first one) go through GetLoadbalancerByID,
  which is unaffected.
New RBAC: get on namespaces is added to both the manifest
ClusterRole (manifests/controller-manager/cloud-controller-manager-roles.yaml)
and the helm chart (charts/openstack-cloud-controller-manager/templates/clusterrole.yaml).
If the verb is unavailable the safeguard simply degrades to the legacy
behaviour with a warning log; OCCM does not refuse to start.
Octavia API >= v2.5 (Stein) is required for the tag feature. This is
already gated by svcConf.supportLBTags and behaves as before on older
clouds.
New unit tests:
- TestFilterLoadBalancersByClusterID covers the matching, legacy,
  foreign-only, and mixed cases.
- TestFetchClusterUID covers happy path and graceful degradation
  (missing namespace, forbidden) with a fake clientset.

How to verify manually:

go test ./pkg/openstack/...

A reproduction of the original failure mode (two clusters in the same
project, same --cluster-name, same Service ns/name) is described in
#3102.

Release note:

[openstack-cloud-controller-manager] Octavia load balancers now carry a
stable cluster-identity tag (`kube_cluster_id_<kube-system-uid>`) so OCCM
will no longer adopt a load balancer that belongs to a different
Kubernetes cluster sharing the same OpenStack project, even when the load
balancer name collides. Pre-existing load balancers gain the tag on the
next reconcile; load balancers without the tag keep the previous
behaviour. The cloud-controller-manager ClusterRole gains `get` on
`namespaces`.

linux-foundation-easycla · 2026-04-30T00:54:44Z

The committers listed above are authorized under a signed CLA.

✅ login: enginrect / name: enginrect (56756ea, b620ccf, bbf5f4f)

k8s-ci-robot · 2026-04-30T00:54:46Z

Welcome @enginrect!

It looks like this is your first PR to kubernetes/cloud-provider-openstack 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/cloud-provider-openstack has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2026-04-30T00:54:47Z

Hi @enginrect. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2026-04-30T00:54:47Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kayrus for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

enginrect · 2026-04-30T01:11:03Z

Hi @kayrus @stephenfin @zetaab — first-time contributor here. This PR addresses the long-standing cross-cluster LB collision issue (refs #2241, #2571, #2624) with an additive, backward-compatible kube_cluster_id_<kube-system-uid> tag on Octavia load balancers. Lookups now reject LBs tagged for a different cluster, fall back to legacy behaviour for untagged LBs, and tag pre-existing LBs on the next reconcile. The safeguard degrades gracefully (warning log + legacy behaviour) if the new get on namespaces RBAC is not granted, so the change is strictly additive.

Could one of you take a look and add /ok-to-test when convenient? The failing "Lint Charts" check is unrelated to this PR — it is a pre-existing repository-policy issue on master where the workflow uses unpinned action tags, and it currently fails on every PR.

Thanks!

…sions OCCM constructs Octavia load balancer names as kube_service_<cluster-name>_<namespace>_<service>. When two Kubernetes clusters share the same OpenStack project and use the same --cluster-name (default "kubernetes"), services with identical namespace/name produce identical load balancer names. Octavia does not enforce uniqueness on load balancer names, so OCCM's first-time name-based lookup can adopt and overwrite a load balancer that actually belongs to a different cluster (see issues kubernetes#2241, kubernetes#2571, kubernetes#2624). This commit adds a stable Kubernetes cluster identifier - the UID of the kube-system namespace - as a load balancer tag of the form kube_cluster_id_<uid>. getLoadbalancerByName now ignores load balancers that carry a cluster-id tag for a different cluster and falls back to the legacy behaviour for load balancers without any cluster-id tag, so existing deployments keep working unchanged. Pre-existing load balancers gain the new tag during the next reconciliation. The cluster UID is read once at controller-manager start-up via the kube-system namespace; failure to read it (RBAC denial, missing namespace) is non-fatal and disables the safeguard, falling back to legacy name-based lookup. The cloud-controller-manager ClusterRole and the helm chart gain "get" on namespaces. Made-with: Cursor

The previous commit added a "get" on "namespaces" rule to the Helm chart's ClusterRole template. chart-testing requires a version bump on any chart modification. Bumping the patch version since the change is additive and backward-compatible.

enginrect · 2026-05-04T05:45:15Z

Quick update: I've rebased on top of master (which now has the SHA-pinned actions from #3100) and bumped the helm chart patch version to 2.35.1 — Lint Charts and EasyCLA are both green now. The PR is still blocked on needs-ok-to-test. @kayrus @stephenfin @zetaab — would one of you mind adding /ok-to-test when convenient? Happy to address review feedback as it comes in.

kayrus · 2026-05-06T10:21:22Z

+	// identifier on OpenStack load balancer tags. May be empty if the lookup
+	// failed or RBAC does not allow it; in that case OCCM falls back to the
+	// legacy name-based load balancer identification.
+	clusterUID string


clusterUID can be defined only in LoadBalancer struct, please remove it from here.

Good catch — thanks! You're absolutely right, that was a cohesion miss on my part. clusterUID is only consumed by the LB code path so it shouldn't pollute the global OpenStack struct. Removed in 56756ea; the field now lives only on LoadBalancer where it belongs.

kayrus · 2026-05-06T10:22:50Z

 	klog.V(1).Info("Claiming to support LoadBalancer")

-	return &LbaasV2{LoadBalancer{secret, network, lb, os.lbOpts, os.kclient, os.eventRecorder}}, true
+	return &LbaasV2{LoadBalancer{secret, network, lb, os.lbOpts, os.kclient, os.eventRecorder, os.clusterUID}}, true


you can set clusterUID value here with:

clusterID := fetchClusterUID(os.kclient)

Thank you, that's a much cleaner pattern. Fetching lazily inside LoadBalancer() keeps the kube-system namespace lookup out of the global Initialize() path and limits the change to the LB construction site. As a nice side effect, clusters that disable LB now skip the lookup entirely. Done in 56756ea.

kayrus · 2026-05-06T10:46:21Z

 	opts := loadbalancers.ListOpts{
 		Name: name,
 	}


~~you can filter by tags using ListOpts.Tags. filterLoadBalancersByClusterID doesn't make sense.~~
UPD: please ignore this comment

kayrus · 2026-05-06T10:49:06Z

+	// balancer with a matching name that belongs to a different Kubernetes
+	// cluster (different cluster-id tag). The lookup is treated as NotFound
+	// so OCCM creates a new load balancer instead of stealing an existing one.
+	eventLBStolen = "LoadBalancerNameCollision"


is it used somewhere?

Great question — and the honest answer is: no, it's not used anywhere, sorry about the leftover. The original intent was to emit a Warning event via eventRecorder from getLoadbalancerByName when we drop a foreign-tagged LB, but plumbing the event recorder and *v1.Service into that free-standing function felt out of scope for this PR (the existing warning log already covers operator visibility), so I dropped the emission and forgot to remove the constant. Cleaned up in 56756ea.

- Remove duplicate clusterUID field from the OpenStack struct so the identifier lives only on the LoadBalancer struct that actually uses it (better cohesion). - Drop fetchClusterUID() out of Initialize() and call it lazily inside the LoadBalancer() factory instead. Clusters that disable LB now skip the kube-system namespace lookup entirely, and the change touches only the LB construction path. - Remove the unused eventLBStolen constant. It was a leftover from an earlier draft that intended to emit a Warning event from getLoadbalancerByName(); plumbing the eventRecorder + *v1.Service into that free-standing function felt out of scope, so the emission was dropped but the constant was left behind.

enginrect · 2026-05-06T23:41:13Z

Thanks for the review @kayrus! Pushed the changes in 56756ea:

Dropped the duplicate clusterUID field from the OpenStack struct
(cohesion fix; thanks for catching this).
Moved the kube-system UID lookup into the LoadBalancer() factory
as a lazy fetch, so it lives in the LB code path only.
Removed the unused eventLBStolen constant — explained in the inline
thread.

go test ./pkg/openstack/... is green locally and the existing checks
(Lint Charts, EasyCLA) are still passing.

kayrus · 2026-05-07T06:41:31Z

/ok-to-test

The pull-cloud-provider-openstack-check prow job runs golangci-lint v2.3.1 with staticcheck enabled, which flags fake.NewSimpleClientset as SA1019 (deprecated). TestFetchClusterUID, added earlier in this PR, used the deprecated function. Swap it for fake.NewClientset; the signature is identical (objects ...runtime.Object) and the unit tests still pass.

enginrect · 2026-05-07T06:59:31Z

The check job was tripping on SA1019 because the new TestFetchClusterUID was using the deprecated fake.NewSimpleClientset. Pushed bdabd79 swapping it for fake.NewClientset (identical signature, same behaviour for our tests).

Verified locally:

go test ./pkg/openstack/... → pass
go run github.com/golangci/golangci-lint/v2/cmd/golangci-lint@v2.3.1 run --timeout=20m ./... → 0 issues

k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Apr 30, 2026

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 30, 2026

k8s-ci-robot requested review from anguslees and stephenfin April 30, 2026 00:54

enginrect force-pushed the occm-cluster-id-tag branch from 123ffe4 to bbf5f4f Compare May 4, 2026 05:35

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 4, 2026

enginrect force-pushed the occm-cluster-id-tag branch from 1b0cbe1 to 9bb737f Compare May 4, 2026 05:40

enginrect force-pushed the occm-cluster-id-tag branch from 9bb737f to b620ccf Compare May 4, 2026 05:43

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels May 4, 2026

kayrus reviewed May 6, 2026

View reviewed changes

k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label May 7, 2026

k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 7, 2026

Conversation

enginrect commented Apr 30, 2026

Uh oh!

linux-foundation-easycla Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Apr 30, 2026

Uh oh!

k8s-ci-robot commented Apr 30, 2026

Uh oh!

k8s-ci-robot commented Apr 30, 2026

Uh oh!

enginrect commented Apr 30, 2026

Uh oh!

enginrect commented May 4, 2026

Uh oh!

kayrus May 6, 2026

Choose a reason for hiding this comment

Uh oh!

enginrect May 6, 2026

Choose a reason for hiding this comment

Uh oh!

kayrus May 6, 2026

Choose a reason for hiding this comment

Uh oh!

enginrect May 6, 2026

Choose a reason for hiding this comment

Uh oh!

kayrus May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kayrus May 6, 2026

Choose a reason for hiding this comment

Uh oh!

enginrect May 6, 2026

Choose a reason for hiding this comment

Uh oh!

enginrect commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kayrus commented May 7, 2026

Uh oh!

enginrect commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

linux-foundation-easycla Bot commented Apr 30, 2026 •

edited

Loading

kayrus May 6, 2026 •

edited

Loading

enginrect commented May 6, 2026 •

edited

Loading

enginrect commented May 7, 2026 •

edited

Loading