Skip to content

feat(HCCO): stop reconciling over componentRoutes and appsDomain on guest Ingress config#8471

Draft
joshbranham wants to merge 1 commit intoopenshift:mainfrom
joshbranham:OCPSTRAT-2884-preserve-ingress-guest-fields
Draft

feat(HCCO): stop reconciling over componentRoutes and appsDomain on guest Ingress config#8471
joshbranham wants to merge 1 commit intoopenshift:mainfrom
joshbranham:OCPSTRAT-2884-preserve-ingress-guest-fields

Conversation

@joshbranham
Copy link
Copy Markdown
Contributor

@joshbranham joshbranham commented May 9, 2026

What this PR does / why we need it:

The HCCO was overwriting the entire Ingress.Spec on every reconciliation, which prevented customers from setting componentRoutes or appsDomain directly on the guest cluster. Preserve these fields so the Ingress Operator can act on customer-configured custom routes (e.g. Console, Downloads) and alternative apps domains.

Which issue(s) this PR fixes:

Fixes

Special notes for your reviewer:

Checklist:

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

Summary by CodeRabbit

  • Bug Fixes

    • Preserve existing guest cluster component routes and apps domain when reconciling ingress configuration; ensure domain is defaulted if missing to avoid unintended loss or overwrite of cluster settings.
  • Tests

    • Added a comprehensive test suite for ingress reconciliation covering defaulting, overrides, and preservation/merge semantics for cluster-specific fields.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 9, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 9, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 9, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: a04f26fc-c822-4baa-88ca-8aa1fbe64f2d

📥 Commits

Reviewing files that changed from the base of the PR and between efef27a and 7b2d435.

📒 Files selected for processing (2)
  • support/globalconfig/ingress.go
  • support/globalconfig/ingress_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • support/globalconfig/ingress_test.go
  • support/globalconfig/ingress.go

📝 Walkthrough

Walkthrough

The ReconcileIngressConfig function in support/globalconfig/ingress.go was changed to snapshot cfg.Spec.ComponentRoutes and cfg.Spec.AppsDomain from the guest cluster before copying cfg.Spec from hcp.Spec.Configuration.Ingress. After applying the hosted control plane spec and defaulting cfg.Spec.Domain when empty, the function restores the preserved ComponentRoutes and AppsDomain into the reconciled ingress config. A new table-driven unit test TestReconcileIngressConfig was added to verify defaulting, copying/overriding, and preservation/merge behaviors.

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality ⚠️ Warning Assertion lacks meaningful failure message. Line 467 has g.Expect(...).To(BeEquivalentTo(...)) without diagnostic context. Test case name not included in assertion failure output. Add assertion message: g.Expect(tc.inputIngressConfig).To(BeEquivalentTo(tc.expectedIngressConfig), "test case: %s", tc.name) to identify failing test case and improve debugging.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and specifically describes the main change: preserving componentRoutes and appsDomain fields during ingress config reconciliation for guest clusters.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Test file uses standard Go testing (t.Run), not Ginkgo framework. Custom check targets Ginkgo tests. All test case names are stable and deterministic with no dynamic values.
Microshift Test Compatibility ✅ Passed PR adds a standard Go unit test, not a Ginkgo e2e test. The check applies only to Ginkgo tests with It(), Describe(), Context(), When() patterns. Not applicable here.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The PR adds a standard Go unit test (TestReconcileIngressConfig with t.Run), not a Ginkgo e2e test. The custom check applies only to new Ginkgo e2e tests. This check does not apply.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies cluster Ingress configuration reconciliation, not deployment/controller manifests. No scheduling constraints or topology-awareness issues introduced.
Ote Binary Stdout Contract ✅ Passed No OTE stdout violations. ingress.go uses only fmt.Sprintf (safe). Test file is standard Go test with no process-level output code or Ginkgo suite setup.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Test file contains a standard Go unit test, not Ginkgo e2e tests. Custom check applies to Ginkgo e2e tests only. No IPv4 or external connectivity issues present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 9, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: joshbranham
Once this PR has been reviewed and has the lgtm label, please assign devguyio for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels May 9, 2026
@joshbranham joshbranham force-pushed the OCPSTRAT-2884-preserve-ingress-guest-fields branch from 6ac0288 to efef27a Compare May 9, 2026 04:39
@joshbranham joshbranham changed the title HCCO: Stop reconciling over componentRoutes and appsDomain on guest Ingress config feat(HCCO): stop reconciling over componentRoutes and appsDomain on guest Ingress config May 9, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
support/globalconfig/ingress_test.go (1)

250-349: ⚡ Quick win

Add explicit “guest empty, HCP populated” cases for preserved fields.

Please add two cases to lock in that ComponentRoutes and AppsDomain remain guest-owned even when only HCP sets them. This makes the new contract unambiguous and prevents accidental regression.

✅ Suggested test additions
+		{
+			name:               "When guest cluster has no componentRoutes and HCP has componentRoutes it should keep componentRoutes nil",
+			inputIngressConfig: IngressConfig(),
+			inputHCP: &hyperv1.HostedControlPlane{
+				ObjectMeta: metav1.ObjectMeta{Name: "cluster"},
+				Spec: hyperv1.HostedControlPlaneSpec{
+					DNS: hyperv1.DNSSpec{BaseDomain: "example.com"},
+					Configuration: &hyperv1.ClusterConfiguration{
+						Ingress: &configv1.IngressSpec{
+							Domain: "apps.cluster.example.com",
+							ComponentRoutes: []configv1.ComponentRouteSpec{{
+								Namespace: "openshift-console",
+								Name:      "console",
+								Hostname:  "hcp-console.example.com",
+							}},
+						},
+					},
+				},
+			},
+			expectedIngressConfig: &configv1.Ingress{
+				ObjectMeta: metav1.ObjectMeta{Name: "cluster"},
+				Spec: configv1.IngressSpec{
+					Domain: "apps.cluster.example.com",
+				},
+			},
+		},
+		{
+			name:               "When guest cluster has no appsDomain and HCP has appsDomain it should keep appsDomain empty",
+			inputIngressConfig: IngressConfig(),
+			inputHCP: &hyperv1.HostedControlPlane{
+				ObjectMeta: metav1.ObjectMeta{Name: "cluster"},
+				Spec: hyperv1.HostedControlPlaneSpec{
+					DNS: hyperv1.DNSSpec{BaseDomain: "example.com"},
+					Configuration: &hyperv1.ClusterConfiguration{
+						Ingress: &configv1.IngressSpec{
+							Domain:     "apps.cluster.example.com",
+							AppsDomain: "hcp-apps.example.com",
+						},
+					},
+				},
+			},
+			expectedIngressConfig: &configv1.Ingress{
+				ObjectMeta: metav1.ObjectMeta{Name: "cluster"},
+				Spec: configv1.IngressSpec{
+					Domain: "apps.cluster.example.com",
+				},
+			},
+		},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@support/globalconfig/ingress_test.go` around lines 250 - 349, Add two
table-driven test cases in ingress_test.go mirroring the existing patterns that
assert guest-owned fields remain when HCP alone populates them: (1) a case where
inputIngressConfig has nil/empty Spec.ComponentRoutes and
inputHCP.Spec.Configuration.Ingress has ComponentRoutes populated, and
expectedIngressConfig must keep ComponentRoutes nil (verify ComponentRoutes is
preserved from guest); (2) a case where inputIngressConfig has empty
Spec.AppsDomain and inputHCP.Spec.Configuration.Ingress.AppsDomain is set, and
expectedIngressConfig must keep AppsDomain empty. Use the same structure and
names as other tests (IngressConfig(), &hyperv1.HostedControlPlane{...},
expectedIngressConfig: &configv1.Ingress{...}) so the reconciliation behavior
for ComponentRoutes and AppsDomain is explicitly locked in.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@support/globalconfig/ingress_test.go`:
- Around line 250-349: Add two table-driven test cases in ingress_test.go
mirroring the existing patterns that assert guest-owned fields remain when HCP
alone populates them: (1) a case where inputIngressConfig has nil/empty
Spec.ComponentRoutes and inputHCP.Spec.Configuration.Ingress has ComponentRoutes
populated, and expectedIngressConfig must keep ComponentRoutes nil (verify
ComponentRoutes is preserved from guest); (2) a case where inputIngressConfig
has empty Spec.AppsDomain and inputHCP.Spec.Configuration.Ingress.AppsDomain is
set, and expectedIngressConfig must keep AppsDomain empty. Use the same
structure and names as other tests (IngressConfig(),
&hyperv1.HostedControlPlane{...}, expectedIngressConfig: &configv1.Ingress{...})
so the reconciliation behavior for ComponentRoutes and AppsDomain is explicitly
locked in.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 7c807022-16b6-4b3b-b95a-b30385432b50

📥 Commits

Reviewing files that changed from the base of the PR and between bded456 and 6ac0288.

📒 Files selected for processing (2)
  • support/globalconfig/ingress.go
  • support/globalconfig/ingress_test.go

…uest Ingress config

The HCCO was overwriting the entire Ingress.Spec on every reconciliation,
which prevented customers from setting componentRoutes or appsDomain
directly on the guest cluster. Preserve these fields so the Ingress
Operator can act on customer-configured custom routes (e.g. Console,
Downloads) and alternative apps domains.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@joshbranham joshbranham force-pushed the OCPSTRAT-2884-preserve-ingress-guest-fields branch from efef27a to 7b2d435 Compare May 9, 2026 04:43
@codecov
Copy link
Copy Markdown

codecov Bot commented May 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 37.55%. Comparing base (bded456) to head (7b2d435).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8471      +/-   ##
==========================================
+ Coverage   37.53%   37.55%   +0.02%     
==========================================
  Files         751      751              
  Lines       92026    92035       +9     
==========================================
+ Hits        34544    34568      +24     
+ Misses      54841    54825      -16     
- Partials     2641     2642       +1     
Files with missing lines Coverage Δ
support/globalconfig/ingress.go 77.41% <100.00%> (+77.41%) ⬆️
Flag Coverage Δ
cmd-support 32.83% <100.00%> (+0.06%) ⬆️
cpo-hostedcontrolplane 36.77% <ø> (ø)
cpo-other 37.76% <ø> (ø)
hypershift-operator 47.93% <ø> (ø)
other 27.77% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hypershift-jira-solve-ci
Copy link
Copy Markdown

Now I have the complete picture. Let me compile the final report.

Test Failure Analysis Complete

Job Information

  • Prow Job: codecov/project (GitHub Check Run)
  • Build ID: Check Run 75131809307
  • PR: #8471 — feat(HCCO): stop reconciling over componentRoutes and appsDomain on guest Ingress config
  • Commit: 7b2d435
  • Base: bded456 (main)

Test Failure Analysis

Error

codecov/project: 34.93% (-2.61%) compared to bded456
State: failure — project coverage dropped from 37.53% to 34.93%

Summary

This is not a code coverage problem introduced by the PR. Codecov itself confirms "All modified and coverable lines are covered by tests" with 100% coverage on changed lines. The failure is caused by 2 of 5 unit test shards (hypershift-operator and other) being permanently stuck in GitHub Actions queued status on arc-runner-set self-hosted runners. Because the codecov.yml is configured with wait_for_ci: false, Codecov computed the project coverage report using only 3 of 5 shards (cmd-support, cpo-other, cpo-hostedcontrolplane), making it appear as though 177 files and 33,212 lines of code vanished — resulting in an artificial 2.61% coverage drop.

Root Cause

The root cause is a GitHub Actions runner infrastructure issue combined with a Codecov race condition:

  1. Stuck runners: The Unit Tests workflow (run ID 25592008109, created 2026-05-09T04:43:45Z) uses a matrix strategy with 5 shards. Two shards — hypershift-operator and other — entered queued status at 04:47:29Z but never transitioned to in_progress. They remain stuck indefinitely on arc-runner-set self-hosted runners, indicating the runner pool has insufficient capacity or is experiencing a scheduling failure.

  2. Premature coverage report: The codecov.yml configuration sets wait_for_ci: false, which tells Codecov to finalize the coverage report without waiting for all CI jobs to complete. Codecov finalized its report at 04:56:26Z — after only 3 of 5 shards had uploaded their coverage profiles. The two missing shards (hypershift-operator covering ./hypershift-operator/... and other covering 20+ package paths) account for the 177 missing files and 33,212 missing lines.

  3. Apparent coverage drop: With only partial data, Codecov computed project coverage as 34.93% instead of the true ~37.53%. Since Codecov's default project threshold requires coverage to not decrease compared to the base branch, the 2.61% apparent drop triggered the failure status.

  4. The PR itself is clean: The only files changed are support/globalconfig/ingress.go and support/globalconfig/ingress_test.go, both with 100% diff coverage. The cmd-support shard (which covers ./support/...) ran successfully and uploaded its coverage profile showing +0.06% coverage increase.

Recommendations
  1. Re-run the Unit Tests workflow: The most immediate fix is to re-trigger the Unit Tests GitHub Actions workflow. This should schedule fresh runners for the stuck hypershift-operator and other shards. Once all 5 shards upload their coverage profiles, Codecov will recompute with complete data and the check should pass.

  2. Investigate arc-runner-set capacity: The self-hosted runners are failing to pick up jobs. This could be a runner auto-scaling issue, pod scheduling problem in the ARC (Actions Runner Controller) cluster, or resource exhaustion. Check the ARC controller logs and runner pod status.

  3. Consider enabling wait_for_ci: true: Changing the codecov.yml to wait_for_ci: true would prevent Codecov from reporting partial coverage results. This trades faster feedback for accuracy — the codecov check would remain pending until all CI jobs complete rather than reporting a false failure.

  4. Add carryforward flags: As an alternative to wait_for_ci, configure carryforward flags in codecov.yml for each shard. This tells Codecov to use the last-known coverage for a flag if its upload is missing, preventing partial-data drops.

Evidence
Evidence Detail
Codecov report "All modified and coverable lines are covered by tests" — 100% diff coverage
Coverage drop 37.53% → 34.93% (-2.61%) — artificial, caused by missing shard data
Files missing 751 → 574 (-177 files) — the 177 files belong to unuploaded shards
Lines missing 92,026 → 58,814 (-33,212 lines) — same root cause
Stuck shards hypershift-operator and other — both stuck in queued since 04:47:29Z
Completed shards cmd-support, cpo-other, cpo-hostedcontrolplane — all succeeded
Flag coverage hypershift-operator: ?, other: ? — Codecov received no data for these flags
Flag coverage cmd-support: 32.83% (+0.06%), cpo-hostedcontrolplane: 36.77% (ø), cpo-other: 37.76% (ø) — healthy
Codecov config wait_for_ci: false — Codecov reported before all shards completed
Codecov timing Report finalized at 04:56:26Z, while 2 shards were still queued
Runner type arc-runner-set (self-hosted Actions Runner Controller)
PR files changed support/globalconfig/ingress.go, support/globalconfig/ingress_test.go only

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant