OCPCLOUD-1401: refactor to run same test against both mapi and capi#442
OCPCLOUD-1401: refactor to run same test against both mapi and capi#442sunzhaohua2 wants to merge 4 commits intoopenshift:masterfrom
Conversation
|
@sunzhaohua2: This pull request references OCPCLOUD-1401 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
90f9021 to
e62ff3b
Compare
|
/lgtm |
|
/assign @theobarberbany @mdbooth |
50ca874 to
932e595
Compare
932e595 to
a01d339
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: RadekManak The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| } | ||
|
|
||
| // Read configuration from environment variables | ||
| if backendType := os.Getenv("TEST_BACKEND_TYPE"); backendType != "" { |
There was a problem hiding this comment.
Do we have CI jobs setting this? How do we plan on using this? I reckon a follow up PR would be good to demonstrate :)
|
Code looks OK. I'm not sure i get the why of why we've done this. I think I understand the desire to de-duplicate a lot of code between our CAPI and MAPI tests. Is doing this in this way a good idea? To add additional abstraction to e2es? I'm struggling to understand how we'll use this, in order to review it - e.g - how do I know which backend I'm using, the logs look identical across CAPI and MAPI? Unified MachineSet creation on aws creates a new MachineSet via unified backend [disruptive]
/Users/zhsun/go/src/github.com/openshift/cluster-api-actuator-pkg/pkg/unified/e2e/aws_machineset.go:44
• [175.443 seconds]
------------------------------
Unified MachineSet creation on aws creates a spot instance MachineSet via unified backend [disruptive]
/Users/zhsun/go/src/github.com/openshift/cluster-api-actuator-pkg/pkg/unified/e2e/aws_machineset.go:55
STEP: Verifying spot instance configuration contains spot-related fields @ 10/24/25 16:37:05.831
• [178.894 seconds]I also wonder, do we then need tests for our testing framework? Does it reduce confidence in the tests built this way, or make debugging harder? |
| "sigs.k8s.io/controller-runtime/pkg/envtest/komega" | ||
| ) | ||
|
|
||
| var _ = Describe("Unified MachineSet creation on aws", testframework.LabelDisruptive, Ordered, func() { |
There was a problem hiding this comment.
I don't think these Describes contain enough context for someone without context to understand what's going on.
a01d339 to
31b84a6
Compare
|
New changes are detected. LGTM label has been removed. |
|
Hi @theobarberbany thanks for your questions.
This idea comes from Joel: https://redhat-internal.slack.com/archives/GE2HQ9QP4/p1727430915984039?thread_ts=1727419633.062849&cid=GE2HQ9QP4. Based on my previous understanding from @shellyyang1989 , MAPI and CAPI will coexist for multiple releases during the migration period and may not even be droped. This way can ensure same test coverage for both APIs and we can test mixed authority cases, such as MachineSet is MAPI-authoritative and a Machine is CAPI-authoritative, these cases are real production cases we must support. Now QE are required to migrate all the cases from OpenShift-test-private repo to public repo, almost all the mapi cases are in OpenShift-test-private repo For some complex cases, if this way is not suitable, we can use the original method. when selecting cases in the Makefile, we can choose the unified||mapi and unified||capi cases.
I updated the Describe description, do you think this is ok?
yes, good concern. I think if MAPI and CAPI both pass the same test, this prove the framework is working and we could add unit tests for some functions (like configureAWSMAPIProviderSpec and configureAWSCAPIProviderSpec) to increase confidence. |
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
WalkthroughAdds a unified E2E testing framework with pluggable MAPI/CAPI backends, two backend implementations, template configuration and test config loading, AWS-specific e2e tests and helpers, Make targets for backend-specific test runs, a small .gitignore update, and test bootstrap wiring for controller-runtime komega. Changes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
/remove-lifecycle rotten |
|
@coderabbitai full |
|
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/unified/backends/capi_backend.go`:
- Around line 25-28: The function getDefaultAWSCAPIMachineSpec is currently
non-deterministic because it picks the first AWSMachineTemplate from a list and
falls back to a hardcoded DefaultAMIID when none are found; change it to
deterministically select a known worker template (e.g., look up by a specific
name or label/annotation) instead of using the list order, and if no matching
AWSMachineTemplate exists return an explicit error rather than substituting
DefaultAMIID; update the code paths that reference AWSMachineTemplate and
DefaultAMIID to either use the selected template's AMI/fields or propagate the
failure so callers can handle the missing template rather than relying on a
region-specific hardcoded AMI.
- Line 82: The Eventually(...) calls are invoking client.Create(...) immediately
so transient failures aren't retried; wrap the call in a closure so Eventually
re-executes it. Replace patterns like Eventually(client.Create(ctx, machineSet),
framework.WaitMedium, framework.RetryMedium).Should(...) with Eventually(func()
error { return client.Create(ctx, machineSet) }, framework.WaitMedium,
framework.RetryMedium).Should(Succeed(), ...). Do the same for the other
client.Create(...) occurrence (the create for the other resource), keeping the
original wait/retry args and message.
In `@pkg/unified/backends/mapi_backend.go`:
- Around line 147-150: CreateMachineTemplate currently drops ctx and client when
calling createAWSMachineTemplate, causing getDefaultAWSMAPIProviderSpec to rely
on global komega state (komega.List / komega.SetClient) and to pick the first
MachineSet arbitrarily; update the call chain to pass ctx and client through:
change createAWSMachineTemplate(params) to createAWSMachineTemplate(ctx, client,
params) and update getDefaultAWSMAPIProviderSpec to accept (ctx, client, ...)
and use the provided runtimeclient.Client to list/fetch MachineSets (instead of
komega.List) and choose a deliberate source (e.g., a named or filtered
MachineSet) for defaults; apply the same ctx/client pass-through refactor for
the other platform-specific creation paths that currently discard ctx/client
(the other create*MachineTemplate callers in this file).
- Line 79: The Eventually call currently evaluates client.Create(ctx,
machineSet) immediately, so wrap the create in a closure like Eventually(func()
error { ... }) to ensure it is executed on each poll; inside the closure call
client.Create(ctx, machineSet) and treat k8s apierrors.IsAlreadyExists(err) as a
nil/success return (i.e., return nil when AlreadyExists) so transient failures
are retried and a late-created object doesn’t fail the test; update the call
site that references machineSet and client to this pattern and keep the same
timeout/assertion (Should(Succeed(), ...)).
In `@pkg/unified/config/template_config.go`:
- Around line 23-32: The MAPI path ignores AWSMachineConfig fields
NonRootVolumes and KMSKey causing inconsistent machine specs; update
configureAWSMAPIProviderSpec to map AWSMachineConfig.KMSKey and NonRootVolumes
into the MAPI provider spec (or explicitly return an error when those fields are
set) so behavior matches the CAPI path; locate the configureAWSMAPIProviderSpec
function and add handling to translate KMSKey into the MAPI KMS/CSE fields and
convert each VolumeConfig from NonRootVolumes into the equivalent MAPI block, or
add validation that rejects unsupported fields when present.
In `@pkg/unified/config/test_config.go`:
- Around line 29-31: The TestConfig loader currently defaults BackendType and
AuthoritativeAPI to BackendTypeMAPI when env values are unknown; change it to
trim incoming TEST_BACKEND_TYPE and TEST_AUTHORITATIVE_API, validate against
allowed backend enum values, and return an error (fail fast) if the trimmed
value is empty or unrecognized instead of silently assigning BackendTypeMAPI;
update the TestConfig construction/loader (fields BackendType and
AuthoritativeAPI and any function that reads those env vars) to perform
strings.TrimSpace on inputs and explicit validation against the allowed backend
constants, returning an error to the caller on invalid input so CI will fail for
typos or space-padded values.
In `@pkg/unified/e2e/test_helpers.go`:
- Around line 77-93: The check in WaitForMachinesRunningOrSkipOnCapacityError
currently marshals the original machineSet (parameter machineSet) which is stale
and misses per-machine InsufficientInstanceCapacity errors; instead, poll the
backing Machine and AWSMachine resources for updated status/errors before
deciding to Skip: inside WaitForMachinesRunningOrSkipOnCapacityError (which
currently calls helper.framework.WaitForMachinesRunning), query the cluster for
the current Machines (and their provider-specific AWSMachine status) that belong
to the MachineSet, inspect each machine/AWSMachine status/conditions/messages
for "InsufficientInstanceCapacity" (or equivalent provider error), and only call
Skip when such an error is observed on a current backing resource; otherwise
continue to call helper.framework.WaitForMachinesRunning as the success
condition.
- Around line 158-166: The test currently marshals the entire awsTemplate and
searches that string, which can match metadata/status; change it to marshal and
search only awsTemplate.Spec.Template.Spec (or directly assert the structured
fields) so the assertions verify the applied machine spec; update the code
around the fullTemplateBytes/configString and the loop over searchStrings to use
the Spec.Template.Spec object (or its marshalled string) when calling
Expect(...).To(ContainSubstring(...)) so matches come from the actual machine
spec rather than the whole AWSMachineTemplate.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 29bc2f36-ac76-4ff6-bcbb-e385264af812
📒 Files selected for processing (13)
.gitignoreMakefilepkg/e2e_test.gopkg/framework/ginkgo-labels.gopkg/framework/utils.gopkg/unified/backends/backend_interface.gopkg/unified/backends/capi_backend.gopkg/unified/backends/mapi_backend.gopkg/unified/config/template_config.gopkg/unified/config/test_config.gopkg/unified/e2e/aws_machineset.gopkg/unified/e2e/test_helpers.gopkg/unified/framework.go
| const ( | ||
| // DefaultAMIID is a example AMI ID for testing. | ||
| DefaultAMIID = "ami-0c55b159cbfafe1d0" | ||
| ) |
There was a problem hiding this comment.
getDefaultAWSCAPIMachineSpec is non-deterministic and region-specific.
Line 215 blindly copies the first listed AWSMachineTemplate, and Line 223 falls back to a single hardcoded AMI when none exist. That makes the CAPI path depend on API list order and AWS region/account state, which will make these unified tests flaky across environments. Prefer selecting a known worker template explicitly, or fail fast if no suitable template can be discovered.
Also applies to: 203-247
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/unified/backends/capi_backend.go` around lines 25 - 28, The function
getDefaultAWSCAPIMachineSpec is currently non-deterministic because it picks the
first AWSMachineTemplate from a list and falls back to a hardcoded DefaultAMIID
when none are found; change it to deterministically select a known worker
template (e.g., look up by a specific name or label/annotation) instead of using
the list order, and if no matching AWSMachineTemplate exists return an explicit
error rather than substituting DefaultAMIID; update the code paths that
reference AWSMachineTemplate and DefaultAMIID to either use the selected
template's AMI/fields or propagate the failure so callers can handle the missing
template rather than relying on a region-specific hardcoded AMI.
There was a problem hiding this comment.
@sunzhaohua2 I think we may want to look at this? It's surprising behaviour. If we're selecting this default we should let the user / test viewer know.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
| machineSet.Spec.Template.Spec.ProviderSpec = *providerSpec | ||
| } | ||
|
|
||
| Eventually(client.Create(ctx, machineSet), framework.WaitMedium, framework.RetryMedium).Should(Succeed(), "Should create MAPI MachineSet %s", machineSet.Name) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -nP 'Eventually\(\s*client\.(Create|Update|Patch|Delete)\s*\(' pkg/unified/backendsRepository: openshift/cluster-api-actuator-pkg
Length of output: 699
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check imports in mapi_backend.go
echo "=== Checking imports in mapi_backend.go ==="
head -30 pkg/unified/backends/mapi_backend.go | grep -E "^import|apierrors"
echo ""
echo "=== Full imports section ==="
sed -n '/^import/,/^)/p' pkg/unified/backends/mapi_backend.go
echo ""
echo "=== Checking for apierrors usage in the file ==="
rg -n "apierrors" pkg/unified/backends/mapi_backend.goRepository: openshift/cluster-api-actuator-pkg
Length of output: 1057
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check if there are any correct Eventually patterns with function closures in the codebase
echo "=== Looking for Eventually patterns with closures (correct usage) ==="
rg -B2 -A2 'Eventually\(func\(\)' pkg/unified/backends --max-count=5Repository: openshift/cluster-api-actuator-pkg
Length of output: 850
Fix Eventually to actually retry the create.
client.Create(ctx, machineSet) is evaluated immediately before Eventually starts polling, so transient API failures are never retried. Wrap the call in a closure so it executes on each poll attempt. Also handle AlreadyExists as success to avoid failing if the object was created but the response was lost.
The codebase already demonstrates the correct pattern—see the client.Delete calls that use Eventually(func() error { ... }).
Suggested fix
- Eventually(client.Create(ctx, machineSet), framework.WaitMedium, framework.RetryMedium).Should(Succeed(), "Should create MAPI MachineSet %s", machineSet.Name)
+ Eventually(func() error {
+ err := client.Create(ctx, machineSet)
+ if apierrors.IsAlreadyExists(err) {
+ return nil
+ }
+ return err
+ }, framework.WaitMedium, framework.RetryMedium).Should(Succeed(), "Should create MAPI MachineSet %s", machineSet.Name)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Eventually(client.Create(ctx, machineSet), framework.WaitMedium, framework.RetryMedium).Should(Succeed(), "Should create MAPI MachineSet %s", machineSet.Name) | |
| Eventually(func() error { | |
| err := client.Create(ctx, machineSet) | |
| if apierrors.IsAlreadyExists(err) { | |
| return nil | |
| } | |
| return err | |
| }, framework.WaitMedium, framework.RetryMedium).Should(Succeed(), "Should create MAPI MachineSet %s", machineSet.Name) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/unified/backends/mapi_backend.go` at line 79, The Eventually call
currently evaluates client.Create(ctx, machineSet) immediately, so wrap the
create in a closure like Eventually(func() error { ... }) to ensure it is
executed on each poll; inside the closure call client.Create(ctx, machineSet)
and treat k8s apierrors.IsAlreadyExists(err) as a nil/success return (i.e.,
return nil when AlreadyExists) so transient failures are retried and a
late-created object doesn’t fail the test; update the call site that references
machineSet and client to this pattern and keep the same timeout/assertion
(Should(Succeed(), ...)).
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
|
@sunzhaohua2 Can we address these please? :) We need to cover all the things coderabbit flagged (i've confirmed above) plus this: (claude assisted review with the test standards skill from CCAPIO) Existing Feedback
New Issues1. Pervasive use of
|
| Standard | Status |
|---|---|
| Right test level | Good — these are e2e tests requiring real infrastructure |
| Existing helpers | Mixed — framework.WaitForMachineSet, framework.DeleteCAPIMachineSets etc. are reused, but WaitForSpotMachineSet / WaitForCAPIMachinesRunningWithRetry are ignored in favour of a broken hand-rolled replacement (see issue 3) |
| Debuggable failures | Mixed — most Expect calls have messages, but helpers call Expect without GinkgoHelper() so failures will point at the wrong line (see issue 4) |
| BDD names | It blocks use action descriptions ("creates a new MachineSet") rather than behaviour specs ("should create...") — minor style nit |
| Concise assertions | Fine |
| No sleeps | No time.Sleep found |
Summary
Issue 3 is the most serious — it's a combination of dead code (capacity skip never fires), misleading structure (outer Eventually never retries), and reinvention of existing helpers that already handle this correctly. Issue 1 will cause increasing
maintenance pain as more backends and tests are added. The rest are cleanup.
|
@coderabbitai full |
|
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (3)
pkg/unified/backends/mapi_backend.go (2)
214-221:⚠️ Potential issue | 🟠 MajorDon’t derive defaults from an arbitrary
MachineSet.Kubernetes list order is not stable, and different worker MachineSets can carry different subnet, security-group, or zone defaults. Basing new provider specs on
Items[0]makes these unified tests nondeterministic and can stamp out an invalid config on multi-set clusters. Select a deliberate source MachineSet before unmarshalling its ProviderSpec.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/unified/backends/mapi_backend.go` around lines 214 - 221, Don't pick an arbitrary MachineSet by using machineSetList.Items[0]; instead choose a deterministic source MachineSet before reading its ProviderSpec. Update the code in mapi_backend.go so that after listing (machineSetList) you iterate and select a MachineSet by explicit criteria (e.g., matching a known label/role, matching cluster infra ID/worker pool name, or a specific zone/subnet tag) and assign that element to the machineSet variable; then unmarshal machineSet.Spec.Template.Spec.ProviderSpec from that selected MachineSet. Ensure any helper or selector logic is clearly referenced (machineSetList, machineSet, ProviderSpec) so the ProviderSpec is derived from a deliberate, deterministic MachineSet rather than Items[0].
81-83:⚠️ Potential issue | 🟠 MajorTreat
AlreadyExistsas a successful retry outcome.If the create succeeds server-side and the client retries, the next poll sees
AlreadyExistsand thisEventuallywill fail until timeout. Handle that as success inside the closure.Suggested change
Eventually(func() error { - return client.Create(ctx, machineSet) + err := client.Create(ctx, machineSet) + if apierrors.IsAlreadyExists(err) { + return nil + } + return err }, framework.WaitMedium, framework.RetryMedium).Should(Succeed(), "Should create MAPI MachineSet %s", machineSet.Name)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/unified/backends/mapi_backend.go` around lines 81 - 83, The Eventually closure that calls client.Create(ctx, machineSet) should treat an AlreadyExists response as success; update the closure used in Eventually(...) so it captures the error from client.Create, returns nil when the error satisfies apierrors.IsAlreadyExists(err) and otherwise returns the original error, ensuring the Eventually succeeds on idempotent create retries; also add or ensure the import for "k8s.io/apimachinery/pkg/api/errors" (commonly aliased apierrors) is available for the IsAlreadyExists check.pkg/unified/backends/capi_backend.go (1)
217-231:⚠️ Potential issue | 🟠 MajorPick a deterministic source for the default AWS machine spec.
Choosing
Items[0]/workers[0]makes the generated template depend on Kubernetes list order. In clusters with multipleAWSMachineTemplates or workerMachineSets, the same test can inherit a different spec across runs. Select a source by stable criteria, or fail when the selection is ambiguous.Also applies to: 235-243
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/unified/backends/capi_backend.go` around lines 217 - 231, The code currently picks Items[0] (in getDefaultAWSCAPIMachineSpec) which is non-deterministic; change selection to a stable rule (e.g., sort awsTemplateList.Items by a stable key such as metadata.Name or by a specific label value) and then return the spec from the first item after sorting, or explicitly fail when more than one candidate exists to avoid ambiguity; apply the same deterministic-selection or ambiguity-fail logic to the analogous selection at the other block (the createDefaultAWSCAPIMachineSpec/worker MachineSet selection referenced in the comment) so both functions use the same stable criteria.
🧹 Nitpick comments (1)
pkg/unified/backends/backend_interface.go (1)
15-25: Tighten this backend contract before it spreads further.The
interface{}inputs/outputs here force every implementation to downcast at runtime and turn backend mismatches into helper-side assertions instead of ordinary returned errors. A typed boundary here (or at leastruntimeclient.Objectplus backend-specific param types) would make the unified layer much safer to evolve.Also applies to: 29-45
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/unified/backends/backend_interface.go` around lines 15 - 25, The MachineBackend interface uses interface{} for template and machines which forces unsafe downcasts; change the signatures to use typed Kubernetes objects (runtimeclient.Object) and concrete param/result types instead of interface{}: update MachineBackend, specifically CreateMachineTemplate, DeleteMachineTemplate, CreateMachineSet, DeleteMachineSet, WaitForMachineSetDeleted, WaitForMachinesRunning, GetMachineSetStatus and GetNodesFromMachineSet to accept/return runtimeclient.Object (or a backend-specific typed struct) and keep BackendMachineTemplateParams/BackendMachineSetParams and *MachineSetStatus as the typed parameter/result types; then update all implementing types to return/accept these typed objects and handle conversion at the backend implementation boundary rather than in the unified layer.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Makefile`:
- Around line 96-109: The test-matrix is inconsistent because test-mapi uses
--label-filter='unified||mapi' while test-capi and test-mapi-with-capi-auth use
'unified', causing mismatched suites; change the Makefile so all three matrix
targets (test-mapi, test-capi, test-mapi-with-capi-auth) use the same
--label-filter='unified' to ensure parity, and if you still need legacy-only
MAPI runs extract the 'unified||mapi' filter into a separate PHONY target (e.g.,
legacy-mapi) instead of including it in the matrix.
In `@pkg/unified/backends/capi_backend.go`:
- Around line 120-135: GetMachineSetStatus currently reads status from the
original machineSet object instead of the live API object; update
capiBackend.GetMachineSetStatus to re-fetch the MachineSet from the API using
the provided runtimeclient.Client before reading
AvailableReplicas/ReadyReplicas: perform the type assertion on the input
(machineSet.(*clusterv1beta1.MachineSet)) to obtain name/namespace, then use
client.Get to load the current clusterv1beta1.MachineSet into a new variable and
build MachineSetStatus from that live object (keep existing handling of
ms.Spec.Replicas).
In `@pkg/unified/config/template_config.go`:
- Around line 120-143: The template currently replaces spec.NonRootVolumes (and
elsewhere similar code at the other block) instead of appending, which causes
inconsistent semantics between CAPI and MAPI; change the merge to append
semantics and ensure tag merging is consistent: for AdditionalTags keep the
current per-key assignment into spec.AdditionalTags (creating the map if nil)
and for NonRootVolumes, build the slice from config.NonRootVolumes then if
spec.NonRootVolumes is nil set it to that slice else append the new volumes to
the existing spec.NonRootVolumes (do the same fix in the analogous block around
lines 207-233).
In `@pkg/unified/e2e/test_helpers.go`:
- Around line 141-147: The current TestHelper.DeleteMachineSet only issues the
delete and returns immediately; modify it so after calling
helper.framework.DeleteMachineSet(helper.ctx, helper.client, machineSet) it
waits for the MachineSet and its machines to be fully cleaned up before
returning. Implement a poll-with-timeout using helper.ctx/helper.client (or
existing framework wait helpers) to repeatedly check that the MachineSet
resource no longer exists and that there are no associated Machine/Node objects
(or until helper.framework reports cleanup complete), failing if the timeout
elapses; keep references to DeleteMachineSet, helper.framework.DeleteMachineSet,
helper.ctx and helper.client to locate and update the code.
---
Duplicate comments:
In `@pkg/unified/backends/capi_backend.go`:
- Around line 217-231: The code currently picks Items[0] (in
getDefaultAWSCAPIMachineSpec) which is non-deterministic; change selection to a
stable rule (e.g., sort awsTemplateList.Items by a stable key such as
metadata.Name or by a specific label value) and then return the spec from the
first item after sorting, or explicitly fail when more than one candidate exists
to avoid ambiguity; apply the same deterministic-selection or ambiguity-fail
logic to the analogous selection at the other block (the
createDefaultAWSCAPIMachineSpec/worker MachineSet selection referenced in the
comment) so both functions use the same stable criteria.
In `@pkg/unified/backends/mapi_backend.go`:
- Around line 214-221: Don't pick an arbitrary MachineSet by using
machineSetList.Items[0]; instead choose a deterministic source MachineSet before
reading its ProviderSpec. Update the code in mapi_backend.go so that after
listing (machineSetList) you iterate and select a MachineSet by explicit
criteria (e.g., matching a known label/role, matching cluster infra ID/worker
pool name, or a specific zone/subnet tag) and assign that element to the
machineSet variable; then unmarshal machineSet.Spec.Template.Spec.ProviderSpec
from that selected MachineSet. Ensure any helper or selector logic is clearly
referenced (machineSetList, machineSet, ProviderSpec) so the ProviderSpec is
derived from a deliberate, deterministic MachineSet rather than Items[0].
- Around line 81-83: The Eventually closure that calls client.Create(ctx,
machineSet) should treat an AlreadyExists response as success; update the
closure used in Eventually(...) so it captures the error from client.Create,
returns nil when the error satisfies apierrors.IsAlreadyExists(err) and
otherwise returns the original error, ensuring the Eventually succeeds on
idempotent create retries; also add or ensure the import for
"k8s.io/apimachinery/pkg/api/errors" (commonly aliased apierrors) is available
for the IsAlreadyExists check.
---
Nitpick comments:
In `@pkg/unified/backends/backend_interface.go`:
- Around line 15-25: The MachineBackend interface uses interface{} for template
and machines which forces unsafe downcasts; change the signatures to use typed
Kubernetes objects (runtimeclient.Object) and concrete param/result types
instead of interface{}: update MachineBackend, specifically
CreateMachineTemplate, DeleteMachineTemplate, CreateMachineSet,
DeleteMachineSet, WaitForMachineSetDeleted, WaitForMachinesRunning,
GetMachineSetStatus and GetNodesFromMachineSet to accept/return
runtimeclient.Object (or a backend-specific typed struct) and keep
BackendMachineTemplateParams/BackendMachineSetParams and *MachineSetStatus as
the typed parameter/result types; then update all implementing types to
return/accept these typed objects and handle conversion at the backend
implementation boundary rather than in the unified layer.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d28df0c9-e108-46a0-a61d-055905d8ae07
⛔ Files ignored due to path filters (2)
vendor/k8s.io/apimachinery/pkg/util/rand/rand.gois excluded by!vendor/**,!**/vendor/**vendor/modules.txtis excluded by!vendor/**,!**/vendor/**
📒 Files selected for processing (13)
.gitignoreMakefilepkg/e2e_test.gopkg/framework/ginkgo-labels.gopkg/framework/utils.gopkg/unified/backends/backend_interface.gopkg/unified/backends/capi_backend.gopkg/unified/backends/mapi_backend.gopkg/unified/config/template_config.gopkg/unified/config/test_config.gopkg/unified/e2e/aws_machineset.gopkg/unified/e2e/test_helpers.gopkg/unified/framework.go
| .PHONY: test-mapi | ||
| test-mapi: ## Run MAPI authoritative testing (MAPI auth, CAPI mirrors) | ||
| TEST_BACKEND_TYPE=MAPI TEST_AUTHORITATIVE_API=MAPI hack/ci-integration.sh $(GINKGO_ARGS) --label-filter='unified||mapi' | ||
|
|
||
| .PHONY: test-capi | ||
| test-capi: ## Run pure CAPI testing (no MAPI involvement) | ||
| TEST_BACKEND_TYPE=CAPI TEST_AUTHORITATIVE_API=CAPI hack/ci-integration.sh $(GINKGO_ARGS) --label-filter='unified' | ||
|
|
||
| .PHONY: test-mapi-with-capi-auth | ||
| test-mapi-with-capi-auth: ## Run tests with MAPI backend and CAPI authority (conversion layer) | ||
| TEST_BACKEND_TYPE=MAPI TEST_AUTHORITATIVE_API=CAPI hack/ci-integration.sh $(GINKGO_ARGS) --label-filter='unified' | ||
|
|
||
| .PHONY: test-all | ||
| test-all: test-mapi test-capi test-mapi-with-capi-auth ## Run all unified framework test scenarios |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's confirm the exact content of the Makefile at lines 96-109
sed -n '96,109p' MakefileRepository: openshift/cluster-api-actuator-pkg
Length of output: 840
🏁 Script executed:
# Search for references to these test targets and label filters
rg 'test-mapi|test-capi|label-filter' -A 2 -B 2Repository: openshift/cluster-api-actuator-pkg
Length of output: 8592
🏁 Script executed:
# Check the hack/ci-integration.sh to understand how label-filter is used
head -50 hack/ci-integration.shRepository: openshift/cluster-api-actuator-pkg
Length of output: 502
🏁 Script executed:
# Search for test definitions to understand what unified, mapi labels mean
fd -e go | xargs grep -l 'Label.*unified\|Label.*mapi' 2>/dev/null | head -5Repository: openshift/cluster-api-actuator-pkg
Length of output: 180
🏁 Script executed:
# Look for any documentation or comments about the test matrix strategy
rg 'test matrix|parity|backend.*authority' -iRepository: openshift/cluster-api-actuator-pkg
Length of output: 20847
🏁 Script executed:
# Check the ginkgo-labels.go file to understand test labels
cat pkg/framework/ginkgo-labels.goRepository: openshift/cluster-api-actuator-pkg
Length of output: 1762
🏁 Script executed:
# Search for tests with "mapi" label to see what's included
rg 'Label.*mapi|Label.*unified' -A 2 -B 2Repository: openshift/cluster-api-actuator-pkg
Length of output: 2555
🏁 Script executed:
# Check if there's any documentation or comments about the test matrix design
rg 'test-mapi|test-capi|test-all' -B 5 -A 5 | head -100Repository: openshift/cluster-api-actuator-pkg
Length of output: 3860
Keep the backend matrix on the same suite.
test-mapi runs unified||mapi while test-capi and test-mapi-with-capi-auth both run only unified, so test-all does not compare the same test suite across backends. This breaks parity interpretation and allows legacy MAPI-only tests to skew the matrix results.
Suggested change
.PHONY: test-mapi
test-mapi: ## Run MAPI authoritative testing (MAPI auth, CAPI mirrors)
- TEST_BACKEND_TYPE=MAPI TEST_AUTHORITATIVE_API=MAPI hack/ci-integration.sh $(GINKGO_ARGS) --label-filter='unified||mapi'
+ TEST_BACKEND_TYPE=MAPI TEST_AUTHORITATIVE_API=MAPI hack/ci-integration.sh $(GINKGO_ARGS) --label-filter='unified'If legacy MAPI coverage is needed, keep it under a separate target instead of including it in this parity matrix.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| .PHONY: test-mapi | |
| test-mapi: ## Run MAPI authoritative testing (MAPI auth, CAPI mirrors) | |
| TEST_BACKEND_TYPE=MAPI TEST_AUTHORITATIVE_API=MAPI hack/ci-integration.sh $(GINKGO_ARGS) --label-filter='unified||mapi' | |
| .PHONY: test-capi | |
| test-capi: ## Run pure CAPI testing (no MAPI involvement) | |
| TEST_BACKEND_TYPE=CAPI TEST_AUTHORITATIVE_API=CAPI hack/ci-integration.sh $(GINKGO_ARGS) --label-filter='unified' | |
| .PHONY: test-mapi-with-capi-auth | |
| test-mapi-with-capi-auth: ## Run tests with MAPI backend and CAPI authority (conversion layer) | |
| TEST_BACKEND_TYPE=MAPI TEST_AUTHORITATIVE_API=CAPI hack/ci-integration.sh $(GINKGO_ARGS) --label-filter='unified' | |
| .PHONY: test-all | |
| test-all: test-mapi test-capi test-mapi-with-capi-auth ## Run all unified framework test scenarios | |
| .PHONY: test-mapi | |
| test-mapi: ## Run MAPI authoritative testing (MAPI auth, CAPI mirrors) | |
| TEST_BACKEND_TYPE=MAPI TEST_AUTHORITATIVE_API=MAPI hack/ci-integration.sh $(GINKGO_ARGS) --label-filter='unified' | |
| .PHONY: test-capi | |
| test-capi: ## Run pure CAPI testing (no MAPI involvement) | |
| TEST_BACKEND_TYPE=CAPI TEST_AUTHORITATIVE_API=CAPI hack/ci-integration.sh $(GINKGO_ARGS) --label-filter='unified' | |
| .PHONY: test-mapi-with-capi-auth | |
| test-mapi-with-capi-auth: ## Run tests with MAPI backend and CAPI authority (conversion layer) | |
| TEST_BACKEND_TYPE=MAPI TEST_AUTHORITATIVE_API=CAPI hack/ci-integration.sh $(GINKGO_ARGS) --label-filter='unified' | |
| .PHONY: test-all | |
| test-all: test-mapi test-capi test-mapi-with-capi-auth ## Run all unified framework test scenarios |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Makefile` around lines 96 - 109, The test-matrix is inconsistent because
test-mapi uses --label-filter='unified||mapi' while test-capi and
test-mapi-with-capi-auth use 'unified', causing mismatched suites; change the
Makefile so all three matrix targets (test-mapi, test-capi,
test-mapi-with-capi-auth) use the same --label-filter='unified' to ensure
parity, and if you still need legacy-only MAPI runs extract the 'unified||mapi'
filter into a separate PHONY target (e.g., legacy-mapi) instead of including it
in the matrix.
There was a problem hiding this comment.
I chose to make test-capi and test-mapi both running unified and backend-specific tests in order to support full backend testing.
There was a problem hiding this comment.
@sunzhaohua2, understood — that's a valid design goal. One thing worth noting for consistency: if the intent is full backend coverage per target, then test-capi would also need --label-filter='unified||capi' (assuming a capi label exists or gets added) to achieve the same depth of coverage on the CAPI side. As it stands, MAPI gets more test coverage than CAPI in test-all, which is an asymmetry. If that's intentional (e.g., CAPI doesn't have backend-specific tests yet), it's fine to leave as-is — just worth documenting.
Happy to leave this as-is given your explanation.
| func (c *capiBackend) GetMachineSetStatus(ctx context.Context, client runtimeclient.Client, machineSet interface{}) (*MachineSetStatus, error) { | ||
| GinkgoHelper() | ||
|
|
||
| ms, ok := machineSet.(*clusterv1beta1.MachineSet) | ||
| Expect(ok).To(BeTrue(), "Should be CAPI MachineSet, got %T", machineSet) | ||
|
|
||
| status := &MachineSetStatus{ | ||
| Replicas: 0, | ||
| AvailableReplicas: ms.Status.AvailableReplicas, | ||
| ReadyReplicas: ms.Status.ReadyReplicas, | ||
| } | ||
| if ms.Spec.Replicas != nil { | ||
| status.Replicas = *ms.Spec.Replicas | ||
| } | ||
|
|
||
| return status, nil |
There was a problem hiding this comment.
Re-read the MachineSet before reporting status.
This builds status from the original object passed in, not the live object in the API. After CreateMachineSet, AvailableReplicas and ReadyReplicas here can stay stale unless the caller refreshes first, so backend assertions can report zero even after the wait helper succeeds.
Suggested fix
func (c *capiBackend) GetMachineSetStatus(ctx context.Context, client runtimeclient.Client, machineSet interface{}) (*MachineSetStatus, error) {
GinkgoHelper()
ms, ok := machineSet.(*clusterv1beta1.MachineSet)
Expect(ok).To(BeTrue(), "Should be CAPI MachineSet, got %T", machineSet)
+
+ current := &clusterv1beta1.MachineSet{}
+ if err := client.Get(ctx, runtimeclient.ObjectKeyFromObject(ms), current); err != nil {
+ return nil, err
+ }
status := &MachineSetStatus{
Replicas: 0,
- AvailableReplicas: ms.Status.AvailableReplicas,
- ReadyReplicas: ms.Status.ReadyReplicas,
+ AvailableReplicas: current.Status.AvailableReplicas,
+ ReadyReplicas: current.Status.ReadyReplicas,
}
- if ms.Spec.Replicas != nil {
- status.Replicas = *ms.Spec.Replicas
+ if current.Spec.Replicas != nil {
+ status.Replicas = *current.Spec.Replicas
}
return status, nil
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (c *capiBackend) GetMachineSetStatus(ctx context.Context, client runtimeclient.Client, machineSet interface{}) (*MachineSetStatus, error) { | |
| GinkgoHelper() | |
| ms, ok := machineSet.(*clusterv1beta1.MachineSet) | |
| Expect(ok).To(BeTrue(), "Should be CAPI MachineSet, got %T", machineSet) | |
| status := &MachineSetStatus{ | |
| Replicas: 0, | |
| AvailableReplicas: ms.Status.AvailableReplicas, | |
| ReadyReplicas: ms.Status.ReadyReplicas, | |
| } | |
| if ms.Spec.Replicas != nil { | |
| status.Replicas = *ms.Spec.Replicas | |
| } | |
| return status, nil | |
| func (c *capiBackend) GetMachineSetStatus(ctx context.Context, client runtimeclient.Client, machineSet interface{}) (*MachineSetStatus, error) { | |
| GinkgoHelper() | |
| ms, ok := machineSet.(*clusterv1beta1.MachineSet) | |
| Expect(ok).To(BeTrue(), "Should be CAPI MachineSet, got %T", machineSet) | |
| current := &clusterv1beta1.MachineSet{} | |
| if err := client.Get(ctx, runtimeclient.ObjectKeyFromObject(ms), current); err != nil { | |
| return nil, err | |
| } | |
| status := &MachineSetStatus{ | |
| Replicas: 0, | |
| AvailableReplicas: current.Status.AvailableReplicas, | |
| ReadyReplicas: current.Status.ReadyReplicas, | |
| } | |
| if current.Spec.Replicas != nil { | |
| status.Replicas = *current.Spec.Replicas | |
| } | |
| return status, nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/unified/backends/capi_backend.go` around lines 120 - 135,
GetMachineSetStatus currently reads status from the original machineSet object
instead of the live API object; update capiBackend.GetMachineSetStatus to
re-fetch the MachineSet from the API using the provided runtimeclient.Client
before reading AvailableReplicas/ReadyReplicas: perform the type assertion on
the input (machineSet.(*clusterv1beta1.MachineSet)) to obtain name/namespace,
then use client.Get to load the current clusterv1beta1.MachineSet into a new
variable and build MachineSetStatus from that live object (keep existing
handling of ms.Spec.Replicas).
| // Configure additional tags | ||
| if len(config.AdditionalTags) > 0 { | ||
| if spec.AdditionalTags == nil { | ||
| spec.AdditionalTags = make(map[string]string) | ||
| } | ||
|
|
||
| for k, v := range config.AdditionalTags { | ||
| spec.AdditionalTags[k] = v | ||
| } | ||
| } | ||
|
|
||
| // Configure non-root volumes | ||
| if len(config.NonRootVolumes) > 0 { | ||
| volumes := make([]awsv1.Volume, len(config.NonRootVolumes)) | ||
| for i, v := range config.NonRootVolumes { | ||
| volumes[i] = awsv1.Volume{ | ||
| DeviceName: v.DeviceName, | ||
| Size: v.Size, | ||
| Type: awsv1.VolumeType(v.Type), | ||
| } | ||
| } | ||
|
|
||
| spec.NonRootVolumes = volumes | ||
| } |
There was a problem hiding this comment.
Normalize merge semantics across CAPI and MAPI.
The same MachineTemplateConfig does not mean the same thing on both paths today: CAPI merges/replaces existing tags and non-root volumes, while MAPI appends new entries. If the base template/provider spec already contains tags or block devices, unified tests will create materially different machine specs depending on backend.
Also applies to: 207-233
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/unified/config/template_config.go` around lines 120 - 143, The template
currently replaces spec.NonRootVolumes (and elsewhere similar code at the other
block) instead of appending, which causes inconsistent semantics between CAPI
and MAPI; change the merge to append semantics and ensure tag merging is
consistent: for AdditionalTags keep the current per-key assignment into
spec.AdditionalTags (creating the map if nil) and for NonRootVolumes, build the
slice from config.NonRootVolumes then if spec.NonRootVolumes is nil set it to
that slice else append the new volumes to the existing spec.NonRootVolumes (do
the same fix in the analogous block around lines 207-233).
| // DeleteMachineSet safely deletes a machine set. | ||
| func (helper *TestHelper) DeleteMachineSet(machineSet interface{}) { | ||
| GinkgoHelper() | ||
|
|
||
| err := helper.framework.DeleteMachineSet(helper.ctx, helper.client, machineSet) | ||
| Expect(err).NotTo(HaveOccurred(), "Should delete MachineSet") | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, verify the file exists and check the code at lines 141-147
cat -n pkg/unified/e2e/test_helpers.go | sed -n '141,147p'Repository: openshift/cluster-api-actuator-pkg
Length of output: 408
🏁 Script executed:
# Search for WaitForMachineSetDeleted or similar wait methods in the framework
rg "WaitForMachineSetDeleted|WaitForMachineSet|DeleteMachineSet" --type go -A 3Repository: openshift/cluster-api-actuator-pkg
Length of output: 18562
🏁 Script executed:
# Check the test_helpers.go file structure to understand the TestHelper type
grep -n "type TestHelper" pkg/unified/e2e/test_helpers.go -A 10Repository: openshift/cluster-api-actuator-pkg
Length of output: 551
🏁 Script executed:
# Search for framework interface/struct definition to understand available methods
fd "framework" pkg/unified/e2e --type go -exec grep -l "DeleteMachineSet\|WaitForMachineSet" {} \;Repository: openshift/cluster-api-actuator-pkg
Length of output: 247
Wait for MachineSet cleanup to finish before returning.
This helper only issues the delete. In the Ordered AWS suite, the next spec can start while the previous machines are still terminating, which leaves quota/capacity behind and makes the later spot/EFA cases flaky.
Suggested change
func (helper *TestHelper) DeleteMachineSet(machineSet interface{}) {
GinkgoHelper()
err := helper.framework.DeleteMachineSet(helper.ctx, helper.client, machineSet)
Expect(err).NotTo(HaveOccurred(), "Should delete MachineSet")
+ err = helper.framework.WaitForMachineSetDeleted(helper.ctx, helper.client, machineSet)
+ Expect(err).NotTo(HaveOccurred(), "Should wait for MachineSet deletion")
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/unified/e2e/test_helpers.go` around lines 141 - 147, The current
TestHelper.DeleteMachineSet only issues the delete and returns immediately;
modify it so after calling helper.framework.DeleteMachineSet(helper.ctx,
helper.client, machineSet) it waits for the MachineSet and its machines to be
fully cleaned up before returning. Implement a poll-with-timeout using
helper.ctx/helper.client (or existing framework wait helpers) to repeatedly
check that the MachineSet resource no longer exists and that there are no
associated Machine/Node objects (or until helper.framework reports cleanup
complete), failing if the timeout elapses; keep references to DeleteMachineSet,
helper.framework.DeleteMachineSet, helper.ctx and helper.client to locate and
update the code.
|
/test e2e-aws-operator |
|
@sunzhaohua2: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Test result:
For backend MAPI, authority CAPI
For MAPI
For CAPI