AUTOSCALE-615: include Karpenter node vCPUs in billing metric by maxcao13 · Pull Request #8265 · openshift/hypershift

maxcao13 · 2026-04-16T19:09:06Z

What this PR does / why we need it:

Add VCPUs field to AutoNodeStatus, computed by karpenter-operator from NodeClaim capacity and cross-referenced against live Node objects. The metrics collector seeds per-cluster vCPU count from this field before accumulating native NodePool vCPUs on top.

NodeClaims and karpenter NodePools exist directly in the guest cluster, so we cannot source this info directly from the hypershift nodepool metrics package using the existing code/clients, which only has access to the management cluster.

We need this because ROSA HCP uses the hypershift_cluster_vcpus metric for billing our customers. This metric is not correctly reflecting the vCPUs for Nodes that are provisoned by Karpenter. This will result in under-billing of customers that are using Karpenter to provision Nodes within their cluster.

Which issue(s) this PR fixes:

Fixes https://redhat.atlassian.net/browse/AUTOSCALE-615

Special notes for your reviewer:

Alternatively, we could have karpenter-operator source its own billing metric and ask ROSA to aggregate using this new exposed metric, but considering that we already expose information about nodes and nodeclaims in the AutoNodeStatus, I think it makes sense to be consistent here.

Made-with: Cursor

Checklist:

Subject and description added to both, commit and PR.
Relevant issues have been referenced.
This change includes docs.
This change includes unit tests.

Summary by CodeRabbit

New Features
- HostedCluster status now includes an optional AutoNode vCPU count; metrics now seed and aggregate AutoNode plus native node vCPUs for improved billing and capacity visibility.
- Karpenter now sums live NodeClaim CPU capacity when computing AutoNode vCPUs and reacts to NodeClaim changes that affect CPU or assignment.
Tests
- Added unit and e2e tests covering vCPU aggregation, error reporting, metric convergence, and billing/vCPU validation during provisioning and consolidation.

openshift-merge-bot · 2026-04-16T19:09:09Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

openshift-ci · 2026-04-16T19:09:10Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci-robot · 2026-04-16T19:09:10Z

@maxcao13: This pull request references AUTOSCALE-615 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

What this PR does / why we need it:

Add VCPUs field to AutoNodeStatus, computed by karpenter-operator from NodeClaim capacity and cross-referenced against live Node objects. The metrics collector seeds per-cluster vCPU count from this field before accumulating native NodePool vCPUs on top.

NodeClaims and karpenter NodePools exist directly in the guest cluster, so we cannot source this info directly from the hypershift nodepool metrics package using the existing code/clients, which only has access to the management cluster.

We need this because ROSA HCP uses the hypershift_cluster_vcpus metric for billing our customers. This metric is not correctly reflecting the vCPUs for Nodes that are provisoned by Karpenter. This will result in under-billing of customers that are using Karpenter to provision Nodes within their cluster.

Which issue(s) this PR fixes:

Fixes https://redhat.atlassian.net/browse/AUTOSCALE-615

Special notes for your reviewer:

Alternatively, we could have karpenter-operator source its own billing metric and ask ROSA to aggregate using this new exposed metric, but considering that we already expose information about nodes and nodeclaims in the AutoNodeStatus, I think it makes sense to be consistent here.

Made-with: Cursor

Checklist:

Subject and description added to both, commit and PR.

Relevant issues have been referenced.

This change includes docs.

This change includes unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-04-16T19:09:24Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds vCPU capacity tracking for Karpenter-managed AutoNodes: a new optional VCPUs *int32 on HostedCluster.Status.AutoNode; the guest-side Karpenter controller now watches NodeClaim create/delete and updates that change CPU capacity or Status.NodeName, filters NodeClaims to those whose Status.NodeName corresponds to a live corev1.Node, sums their CPU capacity, and writes the total to Status.AutoNode.VCPUs. The nodepool metrics collector seeds per-cluster vCPU totals from AutoNode.VCPUs (when set), aggregates native NodePool vCPUs, and emits a computation-error metric on lookup failures. Unit and e2e tests validate summation, nil behavior, lookup errors, metric emission, and billing integration.

Sequence Diagram(s)

sequenceDiagram
    participant KC as Karpenter Controller
    participant NC as NodeClaim Resources
    participant N as corev1.Node List
    participant HCP as HostedCluster (Status.AutoNode)
    participant MC as Metrics Collector
    participant Prom as Prometheus

    Note over KC,NC: NodeClaim watch + vCPU computation
    KC->>NC: Receive create/update/delete events
    NC-->>KC: NodeClaim.Status.Capacity[CPU], Status.NodeName
    KC->>N: Query list of Nodes (verify backing node exists)
    N-->>KC: List of node names

    Note over KC: Sum only NodeClaims with NodeName in Nodes
    KC->>KC: sumNodeClaimVCPUs(nodeClaims, liveNodes)

    Note over KC,HCP: Status update
    KC->>HCP: Update Status.AutoNode.VCPUs with computed sum
    HCP-->>HCP: Persist AutoNode.VCPUs

    Note over MC,Prom: Metrics aggregation
    MC->>HCP: Read Status.AutoNode.VCPUs
    MC->>MC: Seed hclusterData.vCpusCount with AutoNode.VCPUs (if present)
    MC->>MC: Add native NodePool vCPUs (from instance-type lookup)
    MC->>Prom: Emit hypershift_cluster_vcpus and error metrics
    Prom-->>Prom: Store/serve metrics

🚥 Pre-merge checks | ✅ 9 | ❌ 3

❌ Failed checks (3 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 26.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Single Node Openshift (Sno) Test Compatibility	⚠️ Warning	The testBillingConsolidationAndPDB test uses pod anti-affinity requiring different nodes, which is incompatible with Single Node OpenShift deployments that have only one node.	Add [Skipped:SingleReplicaTopology] label to the test or guard with runtime topology checks using exutil.IsSingleNode() to skip on SNO clusters.
Ipv6 And Disconnected Network Test Compatibility	⚠️ Warning	The new testBillingConsolidationAndPDB e2e test requires pulling container image quay.io/openshift/origin-pod:4.22.0 from public Quay.io registry, which will fail in IPv6-only disconnected CI environments without internet access. The test lacks the [Skipped:Disconnected] annotation.	Add the [Skipped:Disconnected] annotation to the test name to skip it on disconnected clusters, or use an internally-available container image instead of quay.io.

✅ Passed checks (9 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: adding Karpenter node vCPUs to the billing metric, which is the core objective of this multi-file changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	All test names in the pull request are stable and deterministic without any dynamic information. The test titles clearly describe what is being validated using static, descriptive strings.
Test Structure And Quality	✅ Passed	Tests demonstrate solid Ginkgo and Go testing practices with proper timeout specifications, meaningful assertions, and cleanup patterns.
Microshift Test Compatibility	✅ Passed	The new e2e tests in test/e2e/karpenter_test.go are protected by a platform guard that prevents execution on MicroShift.
Topology-Aware Scheduling Compatibility	✅ Passed	Pull request adds vCPU tracking for Karpenter nodes through API types and metrics without introducing pod scheduling constraints incompatible with SNO, Two-Node, or HyperShift topologies.
Ote Binary Stdout Contract	✅ Passed	PR's e2e test code properly uses crzap logger (outputs to stderr by default) for all process-level logging with no direct fmt.Print calls to stdout.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

maxcao13 · 2026-04-16T19:09:32Z

/test e2e-aws-techpreview

openshift-ci-robot · 2026-04-16T19:10:46Z

@maxcao13: This pull request references AUTOSCALE-615 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

What this PR does / why we need it:

Add VCPUs field to AutoNodeStatus, computed by karpenter-operator from NodeClaim capacity and cross-referenced against live Node objects. The metrics collector seeds per-cluster vCPU count from this field before accumulating native NodePool vCPUs on top.

NodeClaims and karpenter NodePools exist directly in the guest cluster, so we cannot source this info directly from the hypershift nodepool metrics package using the existing code/clients, which only has access to the management cluster.

We need this because ROSA HCP uses the hypershift_cluster_vcpus metric for billing our customers. This metric is not correctly reflecting the vCPUs for Nodes that are provisoned by Karpenter. This will result in under-billing of customers that are using Karpenter to provision Nodes within their cluster.

Which issue(s) this PR fixes:

Fixes https://redhat.atlassian.net/browse/AUTOSCALE-615

Special notes for your reviewer:

Alternatively, we could have karpenter-operator source its own billing metric and ask ROSA to aggregate using this new exposed metric, but considering that we already expose information about nodes and nodeclaims in the AutoNodeStatus, I think it makes sense to be consistent here.

Made-with: Cursor

Checklist:

Subject and description added to both, commit and PR.

Relevant issues have been referenced.

This change includes docs.

This change includes unit tests.

Summary by CodeRabbit

New Features

Added vCPU capacity tracking and reporting for Karpenter-managed nodes, enabling better visibility into resource allocation and billing metrics.

Tests

Added comprehensive test coverage for vCPU metric aggregation and reporting across various scenarios.

Updated end-to-end tests to validate vCPU metric convergence during node provisioning and consolidation workflows.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

karpenter-operator/controllers/karpenter/karpenter_controller.go (1)

398-414: Precompute the live-node set before summing vCPUs.

This helper is currently quadratic because every NodeClaim re-scans the full node list. On larger clusters that turns each Node/NodeClaim event into an avoidable hot path in the billing reconcile loop.

♻️ Suggested refactor

 func sumNodeClaimVCPUs(nodeClaims []karpenterv1.NodeClaim, allNodes []corev1.Node) int32 {
+	liveNodes := make(map[string]struct{}, len(allNodes))
+	for i := range allNodes {
+		liveNodes[allNodes[i].Name] = struct{}{}
+	}
+
 	var total int64
 	for i := range nodeClaims {
 		nc := &nodeClaims[i]
-		if !slices.ContainsFunc(allNodes, func(n corev1.Node) bool {
-			return n.Name == nc.Status.NodeName
-		}) {
+		if _, ok := liveNodes[nc.Status.NodeName]; !ok {
 			continue
 		}
 		if cpu, ok := nc.Status.Capacity[corev1.ResourceCPU]; ok {
 			total += cpu.Value()
 		}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@karpenter-operator/controllers/karpenter/karpenter_controller.go` around
lines 398 - 414, The sumNodeClaimVCPUs helper is O(N*M) because it calls
slices.ContainsFunc(allNodes, ...) for each NodeClaim; precompute a set of live
node names from allNodes (e.g., map[string]struct{} or map[string]bool) once at
the start of sumNodeClaimVCPUs, then iterate nodeClaims and do O(1) lookups
against that map (check nc.Status.NodeName exists) before summing
nc.Status.Capacity[corev1.ResourceCPU].Value(); update references in
sumNodeClaimVCPUs accordingly to use the precomputed map instead of
slices.ContainsFunc.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/karpenter_test.go`:
- Around line 1368-1421: The predicate in e2eutil.EventuallyObject currently
treats a nil hc.Status.AutoNode.VCPUs as zero which can falsely pass; change the
predicate (inside the anonymous func passed to e2eutil.EventuallyObject that
reads hc.Status.AutoNode.VCPUs) to explicitly require the pointer be non-nil
before considering the value (return false if nil), and only succeed when
*hc.Status.AutoNode.VCPUs == expectedVCPUs. Likewise, tighten the metric check
in the wait.PollUntilContextTimeout loop (the loop that reads
npmetrics.VCpusCountByHClusterMetricName) to match the metric series by both the
"name" label and the "namespace" label (require l.GetName()=="name" &&
l.GetValue()==hostedCluster.Name and also require a label "namespace" with value
hostedCluster.Namespace) so the correct series is selected.

---

Nitpick comments:
In `@karpenter-operator/controllers/karpenter/karpenter_controller.go`:
- Around line 398-414: The sumNodeClaimVCPUs helper is O(N*M) because it calls
slices.ContainsFunc(allNodes, ...) for each NodeClaim; precompute a set of live
node names from allNodes (e.g., map[string]struct{} or map[string]bool) once at
the start of sumNodeClaimVCPUs, then iterate nodeClaims and do O(1) lookups
against that map (check nc.Status.NodeName exists) before summing
nc.Status.Capacity[corev1.ResourceCPU].Value(); update references in
sumNodeClaimVCPUs accordingly to use the precomputed map instead of
slices.ContainsFunc.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 41073511-5238-4fd7-b7a0-bb8022ab9509

📥 Commits

Reviewing files that changed from the base of the PR and between 533aad6 and 29df349.

⛔ Files ignored due to path filters (12)

api/hypershift/v1beta1/zz_generated.deepcopy.go is excluded by !**/zz_generated*.go, !**/zz_generated*
api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/AutoNodeKarpenter.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/AutoNodeKarpenter.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
client/applyconfiguration/hypershift/v1beta1/autonodestatus.go is excluded by !client/**
cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-CustomNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-TechPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-CustomNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-TechPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
docs/content/reference/aggregated-docs.md is excluded by !docs/content/reference/aggregated-docs.md
docs/content/reference/api.md is excluded by !docs/content/reference/api.md
vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_types.go is excluded by !vendor/**, !**/vendor/**
vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/zz_generated.deepcopy.go is excluded by !vendor/**, !**/vendor/**, !**/zz_generated*.go, !**/zz_generated*

📒 Files selected for processing (6)

api/hypershift/v1beta1/hostedcluster_types.go
hypershift-operator/controllers/nodepool/metrics/metrics.go
hypershift-operator/controllers/nodepool/metrics/metrics_test.go
karpenter-operator/controllers/karpenter/karpenter_controller.go
karpenter-operator/controllers/karpenter/karpenter_controller_test.go
test/e2e/karpenter_test.go

codecov · 2026-04-16T19:23:55Z

Codecov Report

❌ Patch coverage is 43.58974% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 36.42%. Comparing base (a9b5118) to head (8147c32).
⚠️ Report is 20 commits behind head on main.

Files with missing lines	Patch %	Lines
...ator/controllers/karpenter/karpenter_controller.go	31.25%	22 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #8265   +/-   ##
=======================================
  Coverage   36.42%   36.42%           
=======================================
  Files         765      765           
  Lines       93300    93336   +36     
=======================================
+ Hits        33981    33996   +15     
- Misses      56604    56625   +21     
  Partials     2715     2715

Files with missing lines	Coverage Δ
...t-operator/controllers/nodepool/metrics/metrics.go	`70.20% <100.00%> (+0.43%)`	⬆️
...ator/controllers/karpenter/karpenter_controller.go	`26.76% <31.25%> (+0.55%)`	⬆️

Flag	Coverage Δ
cmd-support	`30.37% <ø> (ø)`
cpo-hostedcontrolplane	`37.08% <ø> (ø)`
cpo-other	`35.69% <ø> (ø)`
hypershift-operator	`47.89% <100.00%> (+0.01%)`	⬆️
other	`27.77% <31.25%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

maxcao13 · 2026-04-16T22:23:06Z

/test e2e-aws-techpreview

openshift-ci-robot · 2026-04-16T22:29:16Z

@maxcao13: This pull request references AUTOSCALE-615 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

What this PR does / why we need it:

Add VCPUs field to AutoNodeStatus, computed by karpenter-operator from NodeClaim capacity and cross-referenced against live Node objects. The metrics collector seeds per-cluster vCPU count from this field before accumulating native NodePool vCPUs on top.

NodeClaims and karpenter NodePools exist directly in the guest cluster, so we cannot source this info directly from the hypershift nodepool metrics package using the existing code/clients, which only has access to the management cluster.

We need this because ROSA HCP uses the hypershift_cluster_vcpus metric for billing our customers. This metric is not correctly reflecting the vCPUs for Nodes that are provisoned by Karpenter. This will result in under-billing of customers that are using Karpenter to provision Nodes within their cluster.

Which issue(s) this PR fixes:

Fixes https://redhat.atlassian.net/browse/AUTOSCALE-615

Special notes for your reviewer:

Alternatively, we could have karpenter-operator source its own billing metric and ask ROSA to aggregate using this new exposed metric, but considering that we already expose information about nodes and nodeclaims in the AutoNodeStatus, I think it makes sense to be consistent here.

Made-with: Cursor

Checklist:

Subject and description added to both, commit and PR.

Relevant issues have been referenced.

This change includes docs.

This change includes unit tests.

Summary by CodeRabbit

New Features

HostedCluster status now exposes AutoNode vCPU counts; cluster vCPU metrics include both AutoNode and native node contributions for improved billing and capacity visibility.

Karpenter reconciliation now accounts for live NodeClaim vCPU capacity when computing AutoNode vCPUs.

Tests

Added unit and integration tests validating vCPU aggregation, error handling, and end-to-end metric convergence during provisioning and consolidation.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

maxcao13 · 2026-04-17T16:59:38Z

/test e2e-aws-techpreview

openshift-ci-robot · 2026-04-17T17:00:52Z

@maxcao13: This pull request references AUTOSCALE-615 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

What this PR does / why we need it:

Add VCPUs field to AutoNodeStatus, computed by karpenter-operator from NodeClaim capacity and cross-referenced against live Node objects. The metrics collector seeds per-cluster vCPU count from this field before accumulating native NodePool vCPUs on top.

NodeClaims and karpenter NodePools exist directly in the guest cluster, so we cannot source this info directly from the hypershift nodepool metrics package using the existing code/clients, which only has access to the management cluster.

We need this because ROSA HCP uses the hypershift_cluster_vcpus metric for billing our customers. This metric is not correctly reflecting the vCPUs for Nodes that are provisoned by Karpenter. This will result in under-billing of customers that are using Karpenter to provision Nodes within their cluster.

Which issue(s) this PR fixes:

Fixes https://redhat.atlassian.net/browse/AUTOSCALE-615

Special notes for your reviewer:

Alternatively, we could have karpenter-operator source its own billing metric and ask ROSA to aggregate using this new exposed metric, but considering that we already expose information about nodes and nodeclaims in the AutoNodeStatus, I think it makes sense to be consistent here.

Made-with: Cursor

Checklist:

Subject and description added to both, commit and PR.

Relevant issues have been referenced.

This change includes docs.

This change includes unit tests.

Summary by CodeRabbit

New Features

HostedCluster status now reports AutoNode vCPU counts; cluster vCPU metrics aggregate AutoNode and native node contributions for improved billing and capacity visibility.

Karpenter now includes live NodeClaim CPU capacity when computing AutoNode vCPUs.

Tests

Added unit and end-to-end tests validating vCPU aggregation, error reporting, and billing-metric convergence during provisioning and consolidation.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/karpenter_test.go`:
- Around line 1171-1178: The baseline vCPU snapshot is taken before AutoNode
convergence so it may include transient Karpenter vCPUs; change the order so you
call waitForAutoNodeStatusVCPUs(t, ctx, mgtClient, hostedCluster, 0) first to
ensure Karpenter vCPUs are zero, then call getVCPUsMetric(t, ctx, mgtClient,
hostedCluster) to set baseline and assert found—update the code around
getVCPUsMetric and waitForAutoNodeStatusVCPUs to reflect this ordering and keep
the baseline variable logic the same.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: ad81a3ff-ac87-4709-8c6f-23548d604839

📥 Commits

Reviewing files that changed from the base of the PR and between 5cd4832 and 3d550fe.

⛔ Files ignored due to path filters (12)

api/hypershift/v1beta1/zz_generated.deepcopy.go is excluded by !**/zz_generated*.go, !**/zz_generated*
api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/AutoNodeKarpenter.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/AutoNodeKarpenter.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
client/applyconfiguration/hypershift/v1beta1/autonodestatus.go is excluded by !client/**
cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-CustomNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-TechPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-CustomNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-TechPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/**, !cmd/install/assets/**/*.yaml
docs/content/reference/aggregated-docs.md is excluded by !docs/content/reference/aggregated-docs.md
docs/content/reference/api.md is excluded by !docs/content/reference/api.md
vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_types.go is excluded by !vendor/**, !**/vendor/**
vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/zz_generated.deepcopy.go is excluded by !vendor/**, !**/vendor/**, !**/zz_generated*.go, !**/zz_generated*

📒 Files selected for processing (6)

api/hypershift/v1beta1/hostedcluster_types.go
hypershift-operator/controllers/nodepool/metrics/metrics.go
hypershift-operator/controllers/nodepool/metrics/metrics_test.go
karpenter-operator/controllers/karpenter/karpenter_controller.go
karpenter-operator/controllers/karpenter/karpenter_controller_test.go
test/e2e/karpenter_test.go

✅ Files skipped from review due to trivial changes (1)

karpenter-operator/controllers/karpenter/karpenter_controller_test.go

🚧 Files skipped from review as they are similar to previous changes (2)

hypershift-operator/controllers/nodepool/metrics/metrics.go
api/hypershift/v1beta1/hostedcluster_types.go

joshbranham · 2026-04-18T00:15:30Z

For HyperShift NodePools we return -1 if we hit an error calculating a vCPU count. We are not doing that here, as proposed, but I think that is fine since that error/calculation happens downstream in karpenter.

/lgtm
/test e2e-aws-techpreview

enxebre · 2026-04-30T07:34:22Z

/lgtm

openshift-merge-bot · 2026-04-30T07:34:36Z

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

maxcao13 · 2026-04-30T13:17:57Z

/retest-required
/verified by e2e

openshift-ci-robot · 2026-04-30T13:18:10Z

@maxcao13: This PR has been marked as verified by e2e.

Details

In response to this:

/retest-required
/verified by e2e

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

…us billing metric Add VCPUs field to AutoNodeStatus, computed by karpenter-operator from NodeClaim capacity and cross-referenced against live Node objects. The metrics collector seeds per-cluster vCPU count from this field before accumulating native NodePool vCPUs on top. - NodeClaim is the authority for Karpenter ownership (no label dependency) - e2e tests validate vCPU status + metric at 0, scale-up, and consolidation Made-with: Cursor Signed-off-by: Max Cao <macao@redhat.com>

joshbranham · 2026-04-30T15:35:50Z

/lgtm

openshift-merge-bot · 2026-04-30T15:36:02Z

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aks-4-22
/test e2e-aws-4-22
/test e2e-aks
/test e2e-aws
/test e2e-aws-upgrade-hypershift-operator
/test e2e-azure-self-managed
/test e2e-kubevirt-aws-ovn-reduced
/test e2e-v2-aws

maxcao13 · 2026-04-30T15:38:44Z

this has gone through so many rebases now, that it would be a disservice not to check if autonode tests still pass

/test e2e-aws-autonode

maxcao13 · 2026-04-30T18:56:31Z

/test e2e-aws-4-22

maxcao13 · 2026-04-30T22:05:16Z

/verified by e2e-tests

openshift-ci-robot · 2026-04-30T22:05:29Z

@maxcao13: This PR has been marked as verified by e2e-tests.

Details

In response to this:

/verified by e2e-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

maxcao13 · 2026-04-30T22:07:16Z

/test e2e-aws-4-22

openshift-merge-bot · 2026-05-01T00:31:37Z

/retest-required

Remaining retests: 0 against base HEAD 0ee6567 and 2 for PR HEAD 8147c32 in total

openshift-ci · 2026-05-01T00:35:50Z

@maxcao13: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/verify-workflows	`8147c32`	link	true	`/test verify-workflows`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

hypershift-jira-solve-ci · 2026-05-01T11:05:52Z

✅ Analysis complete!
📄 Report: .work/prow-job-analyze-test-failure/2049925899784032256/analysis.md

💡 Tip: The Markdown report can be copied directly into a JIRA Description field.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 16, 2026

openshift-ci Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-area labels Apr 16, 2026

coderabbitai Bot reviewed Apr 16, 2026

View reviewed changes

Comment thread test/e2e/karpenter_test.go

maxcao13 force-pushed the karpenter-node-billing branch from 29df349 to 5cd4832 Compare April 16, 2026 22:22

maxcao13 marked this pull request as ready for review April 17, 2026 16:53

openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 17, 2026

openshift-ci Bot requested review from bryan-cox and jparrill April 17, 2026 16:55

maxcao13 force-pushed the karpenter-node-billing branch from 5cd4832 to 3d550fe Compare April 17, 2026 16:59

coderabbitai Bot reviewed Apr 17, 2026

View reviewed changes

Comment thread test/e2e/karpenter_test.go Outdated

openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Apr 30, 2026

openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 30, 2026

openshift-ci Bot assigned enxebre Apr 30, 2026

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 30, 2026

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Apr 30, 2026

maxcao13 force-pushed the karpenter-node-billing branch from d75fe34 to 8147c32 Compare April 30, 2026 15:31

openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 30, 2026

openshift-ci-robot removed the verified Signifies that the PR passed pre-merge verification criteria label Apr 30, 2026

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 30, 2026

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Apr 30, 2026

openshift-merge-bot Bot merged commit fb6955d into openshift:main May 1, 2026
52 checks passed

maxcao13 deleted the karpenter-node-billing branch May 1, 2026 15:34

Conversation

maxcao13 commented Apr 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Checklist:

Summary by CodeRabbit

Uh oh!

openshift-merge-bot Bot commented Apr 16, 2026

Uh oh!

openshift-ci Bot commented Apr 16, 2026

Uh oh!

openshift-ci-robot commented Apr 16, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Checklist:

Uh oh!

coderabbitai Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Sequence Diagram(s)

❌ Failed checks (3 warnings)

Uh oh!

maxcao13 commented Apr 16, 2026

Uh oh!

openshift-ci-robot commented Apr 16, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Checklist:

Summary by CodeRabbit

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

maxcao13 commented Apr 16, 2026

Uh oh!

openshift-ci-robot commented Apr 16, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Checklist:

Summary by CodeRabbit

Uh oh!

maxcao13 commented Apr 17, 2026

Uh oh!

openshift-ci-robot commented Apr 17, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Checklist:

Summary by CodeRabbit

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

joshbranham commented Apr 18, 2026

Uh oh!

enxebre commented Apr 30, 2026

Uh oh!

openshift-merge-bot Bot commented Apr 30, 2026

Uh oh!

maxcao13 commented Apr 30, 2026

Uh oh!

openshift-ci-robot commented Apr 30, 2026

Uh oh!

joshbranham commented Apr 30, 2026

maxcao13 commented Apr 16, 2026 •

edited by coderabbitai Bot

Loading

openshift-ci-robot commented Apr 16, 2026 •

edited by openshift-ci Bot

Loading

coderabbitai Bot commented Apr 16, 2026 •

edited

Loading

openshift-ci-robot commented Apr 16, 2026 •

edited by openshift-ci Bot

Loading

codecov Bot commented Apr 16, 2026 •

edited

Loading

openshift-ci-robot commented Apr 16, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Apr 17, 2026 •

edited by openshift-ci Bot

Loading