Merge https://github.com/vmware-tanzu/velero:main (e6bdff6) into oadp-dev#483
Merge https://github.com/vmware-tanzu/velero:main (e6bdff6) into oadp-dev#483oadp-rebasebot-app[bot] wants to merge 75 commits intoopenshift:oadp-devfrom
Conversation
Signed-off-by: dongqingcc <dongqingcc@vmware.com>
…estore exec hooks. Signed-off-by: dongqingcc <dongqingcc@vmware.com>
Signed-off-by: dongqingcc <dongqingcc@vmware.com>
Revert PR 6105. Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
…-PR9366 Add e2e test case for PR 9366
…mware-tanzu#9581) * Fix DBR stuck when CSI snapshot no longer exists in cloud provider During backup deletion, VolumeSnapshotContentDeleteItemAction creates a new VSC with the snapshot handle from the backup and polls for readiness. If the underlying snapshot no longer exists (e.g., deleted externally), the CSI driver reports Status.Error but checkVSCReadiness() only checks ReadyToUse, causing it to poll for the full 10-minute timeout instead of failing fast. Additionally, the newly created VSC is never cleaned up on failure, leaving orphaned resources in the cluster. This commit: - Adds Status.Error detection in checkVSCReadiness() to fail immediately on permanent CSI driver errors (e.g., InvalidSnapshot.NotFound) - Cleans up the dangling VSC when readiness polling fails Fixes vmware-tanzu#9579 Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com> * Add changelog for PR vmware-tanzu#9581 Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com> * Fix typo in pod_volume_test.go: colume -> volume Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com> --------- Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
The itemOperationTimeout field was missing from the Schedule API type documentation even though it is supported in the Schedule CRD template. This led users to believe the field was not available per-schedule. Fixes vmware-tanzu#9598 Signed-off-by: Shubham Pampattiwar <spampatt@redhat.com>
…disable_search_in_site Disable Algolia docs search
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds fail-fast and cleanup for CSI VolumeSnapshotContent handling (including deletion of legacy original VSC), extends unit tests, removes Algolia/DocSearch site integration, documents Schedule.spec.template.itemOperationTimeout, adds two E2E tests, updates install command execution order and tests, tweaks node-agent test helpers, and bumps Go module deps. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 4❌ Failed checks (4 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Hi @oadp-rebasebot-app[bot]. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
internal/delete/actions/csi/volumesnapshotcontent_action_test.go (2)
97-109: Test coverage could be more explicit about cleanup verification.The test case validates that an error is returned when CSI reports an error, but it doesn't explicitly verify that the dangling VSC was cleaned up (deleted). Consider adding an assertion to check that the VSC no longer exists in the fake client after
Executereturns, to fully validate the cleanup behavior described in the test name.💡 Suggested enhancement to verify cleanup
After
p.Execute(...)returns, you could add:// Verify VSC was cleaned up vscList := &snapshotv1api.VolumeSnapshotContentList{} require.NoError(t, crClient.List(context.Background(), vscList)) require.Empty(t, vscList.Items, "Expected VSC to be cleaned up")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/delete/actions/csi/volumesnapshotcontent_action_test.go` around lines 97 - 109, The test should explicitly verify that the dangling VolumeSnapshotContent was deleted after Execute; after calling p.Execute(...) use the test client (crClient) to List VolumeSnapshotContent objects (snapshotv1api.VolumeSnapshotContentList) and assert no items remain by calling require.NoError for the List result and require.Empty on the list.Items (referencing the existing vsc variable and the Execute call) so the test confirms cleanup behavior rather than only checking expectErr.
242-248: Consider reusing existing pointer utilities.The codebase already has
pkg/util/boolptr(used in the main implementation file). For consistency, consider reusing that package forboolPtr. ForstringPtr, this local helper is acceptable since there doesn't appear to be a corresponding utility in the codebase.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/delete/actions/csi/volumesnapshotcontent_action_test.go` around lines 242 - 248, The local test helper boolPtr should be replaced with the existing utility to ensure consistency: remove the local boolPtr function and use the package pkg/util/boolptr (used in the implementation) wherever boolPtr() is referenced in this test (e.g., in volumesnapshotcontent_action_test.go); keep the local stringPtr helper as-is since no shared string pointer utility exists. Ensure the import for the util boolptr package is added and update references from boolPtr(...) to the util's API.test/e2e/basic/restore_exec_hooks.go (1)
51-54: Unusual format string for namespace numbering.The format
%00000dis unusual and likely not producing the intended result. The sequence00000means the minimum width is 0 (the rightmost digit), so there's no zero-padding. If the intent is 5-digit zero-padded numbers, use%05dinstead.💡 Suggested fix
for nsNum := 0; nsNum < r.NamespacesTotal; nsNum++ { - createNSName := fmt.Sprintf("%s-%00000d", r.CaseBaseName, nsNum) + createNSName := fmt.Sprintf("%s-%05d", r.CaseBaseName, nsNum) *r.NSIncluded = append(*r.NSIncluded, createNSName) }Note: This also appears at lines 78 and 142 in
CreateResources()andVerify().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/basic/restore_exec_hooks.go` around lines 51 - 54, The namespace name formatting in the loop that builds createNSName uses the incorrect format string "%00000d" which doesn't produce zero-padded numbers; change it to "%05d" in the loop that constructs createNSName (inside CreateResources()/the namespace-building loop) and update the other occurrences of the same format (the instances referenced in Verify() and the second CreateResources() occurrence) to use "%05d" so namespace numbers are zero-padded to 5 digits.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@site/content/docs/main/api-types/schedule.md`:
- Around line 66-69: Update the grammatical errors in the documentation values:
change "The default value is 4 hour." to "The default value is 4 hours." for the
itemOperationTimeout entry (referenced by the symbol itemOperationTimeout) and
also change "10 minute" to "10 minutes" in the related schedule/config entry
mentioned earlier (ensure the corresponding field name remains correct in the
same doc), keeping plural "minutes" and "hours" for the default value
descriptions.
---
Nitpick comments:
In `@internal/delete/actions/csi/volumesnapshotcontent_action_test.go`:
- Around line 97-109: The test should explicitly verify that the dangling
VolumeSnapshotContent was deleted after Execute; after calling p.Execute(...)
use the test client (crClient) to List VolumeSnapshotContent objects
(snapshotv1api.VolumeSnapshotContentList) and assert no items remain by calling
require.NoError for the List result and require.Empty on the list.Items
(referencing the existing vsc variable and the Execute call) so the test
confirms cleanup behavior rather than only checking expectErr.
- Around line 242-248: The local test helper boolPtr should be replaced with the
existing utility to ensure consistency: remove the local boolPtr function and
use the package pkg/util/boolptr (used in the implementation) wherever boolPtr()
is referenced in this test (e.g., in volumesnapshotcontent_action_test.go); keep
the local stringPtr helper as-is since no shared string pointer utility exists.
Ensure the import for the util boolptr package is added and update references
from boolPtr(...) to the util's API.
In `@test/e2e/basic/restore_exec_hooks.go`:
- Around line 51-54: The namespace name formatting in the loop that builds
createNSName uses the incorrect format string "%00000d" which doesn't produce
zero-padded numbers; change it to "%05d" in the loop that constructs
createNSName (inside CreateResources()/the namespace-building loop) and update
the other occurrences of the same format (the instances referenced in Verify()
and the second CreateResources() occurrence) to use "%05d" so namespace numbers
are zero-padded to 5 digits.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9dacb3df-e509-4af3-8614-195975c0171b
📒 Files selected for processing (9)
changelogs/unreleased/9581-shubham-pampattiwarinternal/delete/actions/csi/volumesnapshotcontent_action.gointernal/delete/actions/csi/volumesnapshotcontent_action_test.gosite/algolia-crawler.jsonsite/content/docs/main/api-types/schedule.mdsite/layouts/docs/docs.htmlsite/layouts/partials/head-docs.htmltest/e2e/basic/restore_exec_hooks.gotest/e2e/e2e_suite_test.go
💤 Files with no reviewable changes (3)
- site/layouts/docs/docs.html
- site/layouts/partials/head-docs.html
- site/algolia-crawler.json
| # ItemOperationTimeout specifies the time used to wait for | ||
| # asynchronous BackupItemAction operations | ||
| # The default value is 4 hour. | ||
| itemOperationTimeout: 4h |
There was a problem hiding this comment.
Fix grammatical error in documentation.
Line 68 contains a grammatical error: "The default value is 4 hour." should be "The default value is 4 hours." (plural).
Note: The same grammatical issue exists in line 65 ("10 minute" should be "10 minutes"). Consider fixing both for consistency.
📝 Proposed fix
# ItemOperationTimeout specifies the time used to wait for
# asynchronous BackupItemAction operations
- # The default value is 4 hour.
+ # The default value is 4 hours.
itemOperationTimeout: 4hIf fixing line 65 as well:
# CSISnapshotTimeout specifies the time used to wait for
# CSI VolumeSnapshot status turns to ReadyToUse during creation, before
- # returning error as timeout. The default value is 10 minute.
+ # returning error as timeout. The default value is 10 minutes.
csiSnapshotTimeout: 10m📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # ItemOperationTimeout specifies the time used to wait for | |
| # asynchronous BackupItemAction operations | |
| # The default value is 4 hour. | |
| itemOperationTimeout: 4h | |
| # ItemOperationTimeout specifies the time used to wait for | |
| # asynchronous BackupItemAction operations | |
| # The default value is 4 hours. | |
| itemOperationTimeout: 4h |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@site/content/docs/main/api-types/schedule.md` around lines 66 - 69, Update
the grammatical errors in the documentation values: change "The default value is
4 hour." to "The default value is 4 hours." for the itemOperationTimeout entry
(referenced by the symbol itemOperationTimeout) and also change "10 minute" to
"10 minutes" in the related schedule/config entry mentioned earlier (ensure the
corresponding field name remains correct in the same doc), keeping plural
"minutes" and "hours" for the default value descriptions.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: oadp-rebasebot-app[bot], sseago The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
From 1.18.1, Velero adds some default affinity in the backup/restore pod, so we can't directly compare the whole affinity, but we can verify if the expected affinity is contained in the pod affinity. Signed-off-by: Xun Jiang <xun.jiang@broadcom.com>
…r-PR9255 Add e2e test case for PR 9255
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
…process Signed-off-by: Priyansh Choudhary <im1706@gmail.com>
3e35877 to
26e68c0
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/delete/actions/csi/volumesnapshotcontent_action_test.go`:
- Around line 97-109: The test never asserts the expectErr==true case and it
globally overrides the checkVSCReadiness stub without restoring it, which breaks
isolation; update the table-driven test in volumesnapshotcontent_action_test.go
so each case checks both success and failure paths (assert when expectErr is
true as well as false) and ensure any override of the package-level
checkVSCReadiness function is saved and restored (e.g., save original :=
checkVSCReadiness and defer restoring it) around the test case where function is
replaced; reference the table entry fields expectErr and function, and the
global symbol checkVSCReadiness (and TestCheckVSCReadiness) when making these
changes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: db69be39-38dd-4fad-88ca-3a059b6b5379
📒 Files selected for processing (10)
changelogs/unreleased/9581-shubham-pampattiwarinternal/delete/actions/csi/volumesnapshotcontent_action.gointernal/delete/actions/csi/volumesnapshotcontent_action_test.gosite/algolia-crawler.jsonsite/content/docs/main/api-types/schedule.mdsite/layouts/docs/docs.htmlsite/layouts/partials/head-docs.htmltest/e2e/basic/restore_exec_hooks.gotest/e2e/e2e_suite_test.gotest/e2e/resource-filtering/wildcard_namespaces.go
💤 Files with no reviewable changes (3)
- site/layouts/partials/head-docs.html
- site/layouts/docs/docs.html
- site/algolia-crawler.json
✅ Files skipped from review due to trivial changes (1)
- changelogs/unreleased/9581-shubham-pampattiwar
🚧 Files skipped from review as they are similar to previous changes (2)
- site/content/docs/main/api-types/schedule.md
- test/e2e/basic/restore_exec_hooks.go
| { | ||
| name: "Error case with CSI error, dangling VSC should be cleaned up", | ||
| vsc: builder.ForVolumeSnapshotContent("bar").ObjectMeta(builder.WithLabelsMap(map[string]string{velerov1api.BackupNameLabel: "backup"})).Status(&snapshotv1api.VolumeSnapshotContentStatus{SnapshotHandle: &snapshotHandleStr}).Result(), | ||
| backup: builder.ForBackup("velero", "backup").ObjectMeta(builder.WithAnnotationsMap(map[string]string{velerov1api.ResourceTimeoutAnnotation: "5s"})).Result(), | ||
| expectErr: true, | ||
| function: func( | ||
| ctx context.Context, | ||
| vsc *snapshotv1api.VolumeSnapshotContent, | ||
| client crclient.Client, | ||
| ) (bool, error) { | ||
| return false, errors.Errorf("VolumeSnapshotContent %s has error: InvalidSnapshot.NotFound", vsc.Name) | ||
| }, | ||
| }, |
There was a problem hiding this comment.
Test assertions and global stub isolation are currently broken.
Two issues weaken the new coverage:
expectErr: truepaths are never asserted (onlyexpectErr == falseis checked at Line 133-Line 135).checkVSCReadinessis globally overridden at Line 116 and not restored, soTestCheckVSCReadinesscan execute a leftover stub instead of the real implementation.
This can let the new fast-fail/cleanup cases pass without actually validating behavior.
✅ Suggested fix
func TestVSCExecute(t *testing.T) {
@@
for _, test := range tests {
t.Run(test.name, func(t *testing.T) {
crClient := velerotest.NewFakeControllerRuntimeClient(t)
logger := logrus.StandardLogger()
- checkVSCReadiness = test.function
+ if test.function != nil {
+ originalCheck := checkVSCReadiness
+ checkVSCReadiness = test.function
+ t.Cleanup(func() {
+ checkVSCReadiness = originalCheck
+ })
+ }
@@
- if test.expectErr == false {
- require.NoError(t, err)
- }
+ if test.expectErr {
+ require.Error(t, err)
+ } else {
+ require.NoError(t, err)
+ }
})
}
}Also applies to: 116-117, 133-136, 206-223
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/delete/actions/csi/volumesnapshotcontent_action_test.go` around
lines 97 - 109, The test never asserts the expectErr==true case and it globally
overrides the checkVSCReadiness stub without restoring it, which breaks
isolation; update the table-driven test in volumesnapshotcontent_action_test.go
so each case checks both success and failure paths (assert when expectErr is
true as well as false) and ensure any override of the package-level
checkVSCReadiness function is saved and restored (e.g., save original :=
checkVSCReadiness and defer restoring it) around the test case where function is
replaced; reference the table entry fields expectErr and function, and the
global symbol checkVSCReadiness (and TestCheckVSCReadiness) when making these
changes.
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.77.0 to 1.79.3. - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](grpc/grpc-go@v1.77.0...v1.79.3) --- updated-dependencies: - dependency-name: google.golang.org/grpc dependency-version: 1.79.3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>
…modules/google.golang.org/grpc-1.79.3 Bump google.golang.org/grpc from 1.77.0 to 1.79.3
…NodeAgentConfig_e2e_error [main][cherry-pick] Compare affinity by string instead of exactly same compare.
(cherry picked from commit ccb545f) Update PR-BZ automation mapping (openshift#84) (cherry picked from commit aa2b019) Update PR-BZ automation (openshift#92) Co-authored-by: Rayford Johnson <rjohnson@redhat.com> (cherry picked from commit ecc563f) Add publish workflow (openshift#108) (cherry picked from commit f87b779)
Code-gen no longer required on verify due to vmware-tanzu#6039 Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com> oadp-1.2: Update Makefile.prow to velero-restore-helper
…nshift#280) Signed-off-by: Scott Seago <sseago@redhat.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
* fix: ARM images Signed-off-by: Mateus Oliveira <msouzaol@redhat.com> * fixup! fix: ARM images Signed-off-by: Mateus Oliveira <msouzaol@redhat.com> --------- Signed-off-by: Mateus Oliveira <msouzaol@redhat.com>
…#336) Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
…openshift#334) (openshift#338) add missing unit test for kopia hashing algo (openshift#337) Introduction of downstream only option to override Kopia default: - hashing algorithm - splitting algorithm - encryption algorithm With introduction of 3 environment variables it is possible to override Kopia algorithms used by Velero: KOPIA_HASHING_ALGORITHM KOPIA_SPLITTER_ALGORITHM KOPIA_ENCRYPTION_ALGORITHM If the env algorithms are not set or they are not within Kopia SupportedAlgorithms, the default algorithm will be used. This behavior is consistent with current behavior without this change. Signed-off-by: Michal Pryc <mpryc@redhat.com> Signed-off-by: Shubham Pampattiwar <shubhampampattiwar7@gmail.com>
The rework of Makefile to make it more readable and inclusion of lint as a target as well extract golangci-lint version from the upstream Dockerfile, so we test in PROW or locally on the same version as upstream. Signed-off-by: Michal Pryc <mpryc@redhat.com>
This fixes the PR openshift#334 where one additional line was in the code. This was not exposed previously as we did not had downstream CI Lint jobs. Signed-off-by: Michal Pryc <mpryc@redhat.com>
* run oadp-operator e2e test from the velero repo execute openshift/oadp-operator e2e tests directly against the velero repo locally or via prow ci Signed-off-by: Wesley Hayutin <weshayutin@gmail.com> * update variable names, add a cleanup * make sure env variable overrides default velero_image Signed-off-by: Wesley Hayutin <weshayutin@gmail.com> * add options to build, push, and only test Signed-off-by: Wesley Hayutin <weshayutin@gmail.com> * add arch to name Signed-off-by: Wesley Hayutin <weshayutin@gmail.com> * remove duplicated clean/rm operator checkout * simplify by dropping export var and use a oneliner Co-authored-by: Tiger Kaovilai <passawit.kaovilai@gmail.com> * drop export and use oneliner Co-authored-by: Tiger Kaovilai <passawit.kaovilai@gmail.com> * just in case, allow oadp to be deployed from makefile Signed-off-by: Wesley Hayutin <weshayutin@gmail.com> * Update Makefile.prow Co-authored-by: Tiger Kaovilai <passawit.kaovilai@gmail.com> --------- Signed-off-by: Wesley Hayutin <weshayutin@gmail.com> Co-authored-by: Tiger Kaovilai <passawit.kaovilai@gmail.com>
…t#436) Signed-off-by: Scott Seago <sseago@redhat.com>
Fixes linting error. Signed-off-by: oadp-team-rebase-bot <oadp-maintainers@redhat.com>
Fix golangci-lint version extraction and disable concat-loop check. Signed-off-by: Michal Pryc <mpryc@redhat.com>
d262e00 to
8fd776b
Compare
Summary by CodeRabbit
Bug Fixes
Documentation
itemOperationTimeoutfield to Schedule API type documentation.Tests