Skip to content

OCPSTRAT-1986: Enhancement: API-driven Azure topology and Swift private connectivity for HyperShift#1985

Open
muraee wants to merge 3 commits intoopenshift:masterfrom
muraee:hypershift-azure-topology-api-driven
Open

OCPSTRAT-1986: Enhancement: API-driven Azure topology and Swift private connectivity for HyperShift#1985
muraee wants to merge 3 commits intoopenshift:masterfrom
muraee:hypershift-azure-topology-api-driven

Conversation

@muraee
Copy link
Copy Markdown

@muraee muraee commented Apr 27, 2026

Summary

  • Replace hardcoded IsAroHCP() / env-var checks with API-driven per-cluster decisions using Topology and Private fields on HostedCluster/HostedControlPlane
  • Extend AzurePrivateType enum with Swift variant and introduce AzureSwiftSpec to replace the SwiftPodNetworkInstanceAnnotation annotation
  • Enable ARO HCP clusters to express fully private intent (Topology=Private), causing shared ingress to drop all public endpoint exposure

Test plan

  • Unit tests for all topology/private-type combinations in visibility, router, infra, and shared ingress controllers
  • Integration tests for day-2 topology transitions (PublicAndPrivate ↔ Private)
  • E2E tests verifying private clusters have no public DNS/routes
  • Serialization compatibility tests for N-1/N+1 scenarios

🤖 Generated with Claude Code

@openshift-ci openshift-ci Bot requested a review from csrwng April 27, 2026 12:18
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 27, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign csrwng for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot requested a review from enxebre April 27, 2026 12:18
…yperShift

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@muraee muraee force-pushed the hypershift-azure-topology-api-driven branch from 00b0101 to f7111cd Compare April 27, 2026 12:21
@muraee muraee changed the title Enhancement: API-driven Azure topology and private connectivity for HyperShift Enhancement: API-driven Azure topology and Swift private connectivity for HyperShift Apr 27, 2026
@muraee muraee changed the title Enhancement: API-driven Azure topology and Swift private connectivity for HyperShift OCPSTRAT-1986: Enhancement: API-driven Azure topology and Swift private connectivity for HyperShift Apr 27, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 27, 2026

@muraee: This pull request references OCPSTRAT-1986 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the feature to target either version "5.0." or "openshift-5.0.", but it targets "openshift-4.22" instead.

Details

In response to this:

Summary

  • Replace hardcoded IsAroHCP() / env-var checks with API-driven per-cluster decisions using Topology and Private fields on HostedCluster/HostedControlPlane
  • Extend AzurePrivateType enum with Swift variant and introduce AzureSwiftSpec to replace the SwiftPodNetworkInstanceAnnotation annotation
  • Enable ARO HCP clusters to express fully private intent (Topology=Private), causing shared ingress to drop all public endpoint exposure

Test plan

  • Unit tests for all topology/private-type combinations in visibility, router, infra, and shared ingress controllers
  • Integration tests for day-2 topology transitions (PublicAndPrivate ↔ Private)
  • E2E tests verifying private clusters have no public DNS/routes
  • Serialization compatibility tests for N-1/N+1 scenarios

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 27, 2026
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
`omitzero`; old code ignores unknown fields during
deserialization. The `Type` value "Swift" is preserved as
a string.
- **N-1 round-trip risk:** Old code re-serializing an object
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't an API version be the way to mitigate this? Old client writing old API version doesn't have their intent squash fields they don't know about

Copy link
Copy Markdown
Author

@muraee muraee Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what you mean here? HyperShift currently has a single API version (v1beta1) with no conversion webhooks — are you suggesting this enhancement should introduce a new API version?

Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
- Clarify "private-only" scope (control-plane only, not openshift-ingress)
- Expand scope to include all per-cluster isAroHCP() call sites (CNO, CCM, storage, registry)
- Add exact CEL validation rules as kubebuilder markers on types
- Add AzurePlatformSpec-level CEL rule for topology→private.type requirement
- Change validation from operator-side to CEL admission
- Add PublicEndpointExposed status condition with full API spec
- Expand Day-2 transition section with HO→HCP propagation, reverse flow, and transition scope
- Add CPO rollout considerations including backport scenario
- Add Phase 2 verification query and Phase 3 CPO version prerequisite
- Fix annotation precedence inconsistency (API field takes precedence, annotation is fallback)
- Fix CRD schema ordering to consistently say "before" (not "before or simultaneously")
- Update graduation criteria for expanded scope

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
type AzureSwiftSpec struct {
// +required
// +kubebuilder:validation:MinLength=1
PodNetworkInstance string `json:"podNetworkInstance"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would changing this on a live cluster break Swift networking? Should it be immutable?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing it doesn't permanently break the cluster — the router pod restarts and gets new IPs from the new pod network instance. If the new one is valid, connectivity resumes after the restart. It causes a temporary disruption but is recoverable, so we're keeping it mutable.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's easier to make it immutable later, if we decide to.

Copy link
Copy Markdown
Member

@enxebre enxebre May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's easier to make it immutable later, if we decide to.

This would be a breaking change, while relaxing from immutable to mutable is allowed. Let's remove ambiguity and decide now what it needs to be.
@bennerv is there a known case for this to be mutable? let's make it immutable otherwise.

Please include CEL for this here, make sure to include prevention for unsetting at the parent level. We can also include the field docs already here for clarity

@bryan-cox
Copy link
Copy Markdown
Member

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label May 5, 2026
version carrying the updated CPO. Until then, the N-1 CPO
continues to function via the legacy annotation fallback.
If clusters cannot be upgraded to the new minor stream in a
reasonable timeframe, a CPO backport to the previous minor
Copy link
Copy Markdown
Member

@enxebre enxebre May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please specify concrete ocp versions and timelines we need to support in ARO?

Also would it be possible we express intent in ARO-RP/Maestro for new HCs to be created without the annotation and no-op on existing so they remain on the old path until they upgrade and we don't care?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phase 1 is developed in 4.23 and backported to 4.22. Phase 3 cleanup proceeds once all clusters are running CPO 4.22+.

For the migration strategy: yes, for new HCs ARO-RP/Maestro sets the API fields from day one and skips the annotation. For existing HCs, ARO-RP/Maestro can set the API fields at any time — the old CPO ignores them and continues using the annotation. Once the cluster upgrades to 4.22+ the new CPO starts consuming the API fields. No active coordinated migration needed.

Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
Comment thread enhancements/hypershift/api-driven-azure-topology-and-private-connectivity.md Outdated
@muraee muraee force-pushed the hypershift-azure-topology-api-driven branch from e264200 to 766513c Compare May 6, 2026 16:53
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label May 6, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 6, 2026

New changes are detected. LGTM label has been removed.

- Call out CEL immutability rule change as deliberate validation loosening
- Clarify UseSharedIngressHCP/HC naming distinct from isAroHCP() in main.go
- Add envtests for CEL rule validation to integration test plan

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@muraee muraee force-pushed the hypershift-azure-topology-api-driven branch from 766513c to bfa8ff7 Compare May 6, 2026 16:59
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 6, 2026

@muraee: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants