OCPBUGS-54790: Move packageserver PDB from guest cluster to management cluster#8459
OCPBUGS-54790: Move packageserver PDB from guest cluster to management cluster#8459dhgautam99 wants to merge 2 commits intoopenshift:mainfrom
Conversation
…ster The packageserver PDB was being created in the guest cluster's openshift-operator-lifecycle-manager namespace by CVO, but packageserver pods run on the management cluster in the clusters-<hosted-cluster> namespace. This moves the PDB to the correct location. - Add packageserver PDB manifest to manifestsToOmit to prevent CVO from creating it in guest clusters - Add packageserver-pdb to resourcesToRemove for all platforms to clean up the orphaned PDB on existing clusters during upgrade - Register PDB manifest adapter in packageserver component to create the PDB in the management cluster namespace
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@dhgautam99: This pull request references Jira Issue OCPBUGS-54790, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: ⛔ Files ignored due to path filters (15)
📒 Files selected for processing (1)
📝 WalkthroughWalkthroughA PodDisruptionBudget manifest for the packageserver workload was added (packageserver-pdb with minAvailable: 1). The packageserver component now adapts this PDB manifest. The CVO/deployment logic was updated to add a metrics-access label/annotation and set the pod ServiceAccountName when metrics access is enabled, to omit the PDB from generated release payloads, and to include the packageserver PDB in the resources-to-remove list for the default platform branch. Sequence Diagram(s)sequenceDiagram
participant Operator as HostedControlPlane Operator
participant Component as packageserver Component
participant CVO as CVO / payload generator
participant Kube as Kubernetes API
Operator->>Component: NewComponent()
Component->>Component: WithManifestAdapter(pdb.yaml -> AdaptPodDisruptionBudget)
Operator->>CVO: adaptDeployment (metrics access enabled)
CVO->>CVO: add label/annotation config.NeedMetricsServerAccessLabel="true"
CVO->>CVO: set PodTemplate.Spec.ServiceAccountName=ComponentName
CVO->>CVO: preparePayloadScript (manifestsToOmit += packageserver.pdb.yaml)
CVO->>CVO: resourcesToRemove includes packageserver-pdb (default case)
Operator->>Kube: apply manifests (excluding omitted pdb in payload)
Operator->>Kube: ensure packageserver-pdb exists in cleanup/remove operations
🚥 Pre-merge checks | ✅ 9 | ❌ 3❌ Failed checks (3 warnings)
✅ Passed checks (9 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: dhgautam99 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@dhgautam99: This pull request references Jira Issue OCPBUGS-54790, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8459 +/- ##
==========================================
+ Coverage 37.50% 37.52% +0.01%
==========================================
Files 751 751
Lines 91992 91998 +6
==========================================
+ Hits 34505 34524 +19
+ Misses 54844 54831 -13
Partials 2643 2643
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
Regenerate CVO deployment and packageserver component test fixtures to reflect the packageserver PDB being omitted from the guest cluster CVO payload and added to the management cluster namespace.
46bdd14 to
ffe96be
Compare
|
/lgtm |
|
Scheduling tests matching the |
Test Resultse2e-aws
e2e-aks
|
AI Test Failure AnalysisJob: Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6 |
AI Test Failure AnalysisJob: Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6 |
|
/test e2e-aws |
AI Test Failure AnalysisJob: Generated by hypershift-analyze-e2e-failure post-step using Claude claude-opus-4-6 |
|
No job labels. Now I have all the evidence needed. Let me produce the final report. Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryAll 5 test failures are exclusively in teardown phases (TestKarpenter/Teardown and TestNodePool/HostedCluster2/Teardown) — every functional test passed. The teardown failures are caused by AWS infrastructure resources (EC2 volumes and NLBs) not being deleted within the timeout window. This is a pre-existing, flaky infrastructure timing issue unrelated to the PR's packageserver PDB changes. The same teardown logic succeeded for other shards in the same run (e.g., TestNodePool/HostedCluster0/Teardown passed at 1549s). Root CauseThe failures are caused by AWS infrastructure resource cleanup exceeding the teardown timeout. The teardown sequence works as follows:
TestKarpenter is particularly prone because it creates multiple Karpenter NodePools ( TestNodePool/HostedCluster2 had 4 remaining resources (3 EC2 volumes from This is NOT related to the PR changes. The PR modifies packageserver PDB placement and CVO deployment testdata — none of which affect AWS resource lifecycle, HostedCluster finalization, or the teardown framework. The identical teardown logic succeeded in other tests in the same run (TestNodePool/HostedCluster0 at 1549s, TestAutoscaling at 477s, TestUpgradeControlPlane at 499s). Recommendations
Evidence
|
|
/test e2e-aws |
|
@dhgautam99: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What this PR does / why we need it:
The packageserver PodDisruptionBudget was being created in the guest cluster's
openshift-operator-lifecycle-managernamespace by CVO. However, packageserverpods run on the management cluster in the
clusters-<hosted-cluster>namespace,making the guest cluster PDB ineffective.
This PR:
manifestsToOmit)resourcesToRemove)Which issue(s) this PR fixes:
Fixes OCPBUGS-54790
Special notes for your reviewer:
The PDB cleanup applies to all platforms (both IBM/PowerVS and default) since
packageserver runs on the management cluster regardless of platform.
Checklist:
Summary by CodeRabbit
Improvements
Tests