Skip to content

CNTRLPLANE-2685, CNTRLPLANE-2707: inject pre-signed etcd snapshot URL during OADP restore#239

Merged
openshift-merge-bot[bot] merged 3 commits intoopenshift:mainfrom
jparrill:CNTRLPLANE-2685-restore
Apr 13, 2026
Merged

CNTRLPLANE-2685, CNTRLPLANE-2707: inject pre-signed etcd snapshot URL during OADP restore#239
openshift-merge-bot[bot] merged 3 commits intoopenshift:mainfrom
jparrill:CNTRLPLANE-2685-restore

Conversation

@jparrill
Copy link
Copy Markdown
Contributor

@jparrill jparrill commented Apr 9, 2026

Summary

  • During restore, reads the etcd snapshot S3 URL from the hypershift.openshift.io/etcd-snapshot-url annotation (persisted by the backup plugin, since Velero strips status fields during restore)
  • Converts s3:// URLs to pre-signed HTTPS GET URLs using AWS SigV4 (pure Go stdlib, no AWS SDK dependency) so the etcd-init container can download the snapshot via curl
  • Injects the pre-signed URL into spec.etcd.managed.storage.restoreSnapshotURL on the HostedCluster
  • Supports virtual-hosted/path-style S3 addressing, custom endpoints (MinIO, RHOCS), and STS session tokens

Dependencies

This PR depends on the following PRs being merged first:

Note for reviewers

Only the last commit (b168ebd8) needs review. The previous commits belong to PR #238 and will be rebased out once that PR is merged.

New files

File Purpose
pkg/s3presign/presign.go AWS SigV4 pre-signed URL generation (stdlib only)
pkg/s3presign/presign_test.go Unit tests (21 tests)
pkg/core/restore_test.go Restore plugin unit tests (4 tests)
tests/integration/s3presign/presign_integration_test.go Integration tests with real S3 (ephemeral buckets)

Test plan

  • go test ./pkg/s3presign/ — 21 unit tests pass
  • go test ./pkg/core/ -run TestPresign — 4 unit tests pass
  • go test -tags=integration ./tests/integration/s3presign/ — 4 integration tests pass against real S3
  • End-to-end: OADP backup with etcdSnapshot method → delete HC → OADP restore → etcd-init downloads snapshot via pre-signed URL → etcd restores successfully

Known issues

  • OCPBUGS-82166etcd-init.sh fails on OCP 4.21+ because etcd 3.6 removed etcdctl snapshot restore (must use etcdutl). This is a HyperShift issue, not in this plugin.

🤖 Generated with Claude Code

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 9, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 9, 2026

@jparrill: This pull request references CNTRLPLANE-2685 which is a valid jira issue.

Details

In response to this:

Summary

  • During restore, reads the etcd snapshot S3 URL from the hypershift.openshift.io/etcd-snapshot-url annotation (persisted by the backup plugin, since Velero strips status fields during restore)
  • Converts s3:// URLs to pre-signed HTTPS GET URLs using AWS SigV4 (pure Go stdlib, no AWS SDK dependency) so the etcd-init container can download the snapshot via curl
  • Injects the pre-signed URL into spec.etcd.managed.storage.restoreSnapshotURL on the HostedCluster
  • Supports virtual-hosted/path-style S3 addressing, custom endpoints (MinIO, RHOCS), and STS session tokens

Dependencies

This PR depends on the following PRs being merged first:

Note for reviewers

Only the last commit (b168ebd8) needs review. The previous commits belong to PR #238 and will be rebased out once that PR is merged.

New files

File Purpose
pkg/s3presign/presign.go AWS SigV4 pre-signed URL generation (stdlib only)
pkg/s3presign/presign_test.go Unit tests (21 tests)
pkg/core/restore_test.go Restore plugin unit tests (4 tests)
tests/integration/s3presign/presign_integration_test.go Integration tests with real S3 (ephemeral buckets)

Test plan

  • go test ./pkg/s3presign/ — 21 unit tests pass
  • go test ./pkg/core/ -run TestPresign — 4 unit tests pass
  • go test -tags=integration ./tests/integration/s3presign/ — 4 integration tests pass against real S3
  • End-to-end: OADP backup with etcdSnapshot method → delete HC → OADP restore → etcd-init downloads snapshot via pre-signed URL → etcd restores successfully

Known issues

  • OCPBUGS-82166etcd-init.sh fails on OCP 4.21+ because etcd 3.6 removed etcdctl snapshot restore (must use etcdutl). This is a HyperShift issue, not in this plugin.

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds HCPEtcdBackup orchestration for etcd snapshot backups/restores, a new S3 SigV4 presign package, credential copy/remap logic, plugin config and scheme/type updates, many unit/integration tests, design docs, and removal of a legacy AWS restore helper.

Changes

Cohort / File(s) Summary
Documentation
docs/references/HCPEtcdBackup/HCPEtcdBackup-implementation.md
New implementation/design doc describing HCPEtcdBackup backup/restore control flow, credential handling, polling semantics, exclusions, snapshot URL annotation semantics, and troubleshooting/testing notes.
Module deps
go.mod
Bumped Hypershift and OpenShift API dependency versions.
Common: scheme/types/utils
pkg/common/scheme.go, pkg/common/types.go, pkg/common/utils.go, pkg/common/utils_test.go
Registered apiextensions v1 in scheme; added exported constants for HCPEtcdBackup, HO namespace and config keys, etcd backup method names, PVC/name helpers; added thread-safe SA file path setter and GetHostedCluster helper plus its unit test.
Core: backup plugin & tests
pkg/core/backup.go, pkg/core/backup_test.go, pkg/core/types/types.go, pkg/core/validation/backup.go, pkg/core/validation/backup_test.go
Extended BackupPlugin to support etcdSnapshot flow: initialize orchestrator, idempotent HCPEtcdBackup CR creation, verify/wait lifecycle, inject snapshot URL into HostedCluster (annotation/status), exclude etcd pods/PVCs from volume backups; updated options/types and validation tests.
Core: restore plugin & tests
pkg/core/restore.go, pkg/core/restore_test.go, pkg/core/validation/restore.go, pkg/core/validation/restore_test.go
RestorePlugin reads etcd snapshot annotation and, for s3:// URLs, presigns to HTTPS using new s3presign helper and injects into spec.etcd.managed.storage.restoreSnapshotURL; added presign unit tests and removed prior AWS restore-task invocation.
Etcd backup orchestrator & tests
pkg/etcdbackup/orchestrator.go, pkg/etcdbackup/orchestrator_test.go
New exported Orchestrator type and methods (Create/Verify/Wait/Cleanup): maps Velero BSL to HCPEtcdBackup storage for AWS/Azure, copies/remaps BSL credential Secret into HO namespace, propagates encryption fields, handles idempotency and polling; extensive unit tests for mapping, secret handling and orchestration flows.
S3 presign package & tests
pkg/s3presign/presign.go, pkg/s3presign/presign_test.go, tests/integration/s3presign/presign_integration_test.go
New package implementing AWS SigV4 presigned GET URL generation, S3 URL and credentials parsing, deterministic unit tests and integration tests verifying real S3 downloads and expiry/session token behavior.
Platform cleanup
pkg/platform/aws/aws.go
Removed exported RestoreTasks function and unused imports; file left without operational code. Confirmed callers adjusted in this diff.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from celebdor and muraee April 9, 2026 13:49
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 9, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jparrill

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 9, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 9, 2026

@jparrill: This pull request references CNTRLPLANE-2685 which is a valid jira issue.

Details

In response to this:

Summary

  • During restore, reads the etcd snapshot S3 URL from the hypershift.openshift.io/etcd-snapshot-url annotation (persisted by the backup plugin, since Velero strips status fields during restore)
  • Converts s3:// URLs to pre-signed HTTPS GET URLs using AWS SigV4 (pure Go stdlib, no AWS SDK dependency) so the etcd-init container can download the snapshot via curl
  • Injects the pre-signed URL into spec.etcd.managed.storage.restoreSnapshotURL on the HostedCluster
  • Supports virtual-hosted/path-style S3 addressing, custom endpoints (MinIO, RHOCS), and STS session tokens

Dependencies

This PR depends on the following PRs being merged first:

Note for reviewers

Only the last commit (b168ebd8) needs review. The previous commits belong to PR #238 and will be rebased out once that PR is merged.

New files

File Purpose
pkg/s3presign/presign.go AWS SigV4 pre-signed URL generation (stdlib only)
pkg/s3presign/presign_test.go Unit tests (21 tests)
pkg/core/restore_test.go Restore plugin unit tests (4 tests)
tests/integration/s3presign/presign_integration_test.go Integration tests with real S3 (ephemeral buckets)

Test plan

  • go test ./pkg/s3presign/ — 21 unit tests pass
  • go test ./pkg/core/ -run TestPresign — 4 unit tests pass
  • go test -tags=integration ./tests/integration/s3presign/ — 4 integration tests pass against real S3
  • End-to-end: OADP backup with etcdSnapshot method → delete HC → OADP restore → etcd-init downloads snapshot via pre-signed URL → etcd restores successfully

Known issues

  • OCPBUGS-82166etcd-init.sh fails on OCP 4.21+ because etcd 3.6 removed etcdctl snapshot restore (must use etcdutl). This is a HyperShift issue, not in this plugin.

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/etcdbackup/orchestrator.go`:
- Around line 58-107: The HCPEtcdBackup name is made unstable by the random
suffix (crName) so retries create new CRs; in CreateEtcdBackup remove the
utilrand.String(4) suffix and derive a stable CR name from the Velero backup
(e.g., "oadp-"+backup.Name) so subsequent calls hit the
apierrors.IsAlreadyExists reuse path; update the crName usage in the
HCPEtcdBackup ObjectMeta, logging, and the assignments to
o.BackupName/o.BackupNamespace accordingly to ensure idempotent create behavior.
- Around line 207-240: mapAWSBSLToStorage currently returns empty Bucket/Region
which produces invalid HCPEtcdBackup specs; update the validation so we reject
AWS BSLs that lack required fields. In mapBSLToStorage (and/or change
mapAWSBSLToStorage signature to return (*hyperv1.HCPEtcdBackupStorage, error)),
check that bsl.Spec.ObjectStorage.Bucket and bsl.Spec.Config["region"] are
non-empty for AWS providers and return a descriptive error (e.g. "invalid AWS
BSL: missing bucket" or "missing region") instead of constructing a storage
object; ensure callers handle the error path so we never create/copy credentials
for an invalid HCPEtcdBackup.
- Around line 115-128: The switch handling backup condition reasons must treat
controller terminal states as errors: in both VerifyInProgress and
WaitForCompletion add a case for common.BackupAlreadyInProgressReason that
returns false with a descriptive error (e.g., "backup already in progress:
<cond.Message>") so it doesn't fall through and keep polling; additionally, add
hyperv1.EtcdUnhealthyReason to the WaitForCompletion switch to return false with
an error ("etcd unhealthy: <cond.Message>") instead of continuing to poll.
Ensure you reference the existing switch on cond.Reason in both functions and
use the same logging/error pattern as the other terminal cases.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b4188f4e-0fd5-4272-a534-316dd9138bd5

📥 Commits

Reviewing files that changed from the base of the PR and between 3c34d79 and b168ebd.

⛔ Files ignored due to path filters (25)
  • go.sum is excluded by !**/*.sum
  • vendor/github.com/openshift/api/config/v1/types_apiserver.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/config/v1/types_authentication.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/config/v1/types_infrastructure.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/config/v1/types_insights.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/config/v1/types_network.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/config/v1/types_tlssecurityprofile.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.swagger_doc_generated.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/operator/v1/types_network.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/azure.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/azureprivatelinkservice_types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/controlplaneversion_types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/etcdbackup_types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/gcp.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/gcpprivateserviceconnect_types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hosted_controlplane.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/nodepool_types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/operator.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/zz_generated.deepcopy.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests.yaml is excluded by !vendor/**, !**/vendor/**
  • vendor/modules.txt is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (19)
  • docs/references/HCPEtcdBackup/HCPEtcdBackup-implementation.md
  • go.mod
  • pkg/common/scheme.go
  • pkg/common/types.go
  • pkg/common/utils.go
  • pkg/common/utils_test.go
  • pkg/core/backup.go
  • pkg/core/restore.go
  • pkg/core/restore_test.go
  • pkg/core/types/types.go
  • pkg/core/validation/backup.go
  • pkg/core/validation/backup_test.go
  • pkg/core/validation/restore.go
  • pkg/etcdbackup/orchestrator.go
  • pkg/etcdbackup/orchestrator_test.go
  • pkg/platform/aws/aws.go
  • pkg/s3presign/presign.go
  • pkg/s3presign/presign_test.go
  • tests/integration/s3presign/presign_integration_test.go
💤 Files with no reviewable changes (1)
  • pkg/platform/aws/aws.go

Comment thread pkg/etcdbackup/orchestrator.go
Comment thread pkg/etcdbackup/orchestrator.go
Comment thread pkg/etcdbackup/orchestrator.go
@jparrill jparrill force-pushed the CNTRLPLANE-2685-restore branch from b168ebd to 5c53288 Compare April 9, 2026 17:56
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
pkg/etcdbackup/orchestrator.go (1)

155-165: ⚠️ Potential issue | 🟡 Minor

Add EtcdUnhealthyReason to WaitForCompletion terminal failures.

EtcdUnhealthyReason is handled in VerifyInProgress (line 126-127) but missing from WaitForCompletion. If etcd becomes unhealthy during the backup, WaitForCompletion will continue polling until timeout instead of failing fast.

Suggested fix
 		// Terminal failures
 		switch cond.Reason {
 		case hyperv1.BackupFailedReason:
 			return false, fmt.Errorf("HCPEtcdBackup failed: %s", cond.Message)
 		case common.BackupRejectedReason:
 			return false, fmt.Errorf("HCPEtcdBackup rejected: %s", cond.Message)
+		case hyperv1.EtcdUnhealthyReason:
+			return false, fmt.Errorf("etcd unhealthy: %s", cond.Message)
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/etcdbackup/orchestrator.go` around lines 155 - 165, The WaitForCompletion
polling switch on cond.Reason currently handles hyperv1.BackupFailedReason and
common.BackupRejectedReason but omits EtcdUnhealthyReason; add a case for the
EtcdUnhealthyReason (the same constant used/handled in VerifyInProgress) in the
switch inside WaitForCompletion so that when cond.Reason == EtcdUnhealthyReason
the function returns false and an error (e.g., fmt.Errorf("HCPEtcdBackup etcd
unhealthy: %s", cond.Message)) to fail fast instead of continuing to poll.
🧹 Nitpick comments (2)
docs/references/HCPEtcdBackup/HCPEtcdBackup-implementation.md (1)

470-473: Add language specifier to fenced code block.

The example output block lacks a language specifier.

Suggested fix
 The output should be an array containing a **pre-signed HTTPS URL**, e.g.:
-```
+```text
 ["https://my-bucket.s3.us-east-1.amazonaws.com/path/to/snapshot.db?X-Amz-Algorithm=AWS4-HMAC-SHA256&..."]
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/references/HCPEtcdBackup/HCPEtcdBackup-implementation.md around lines
470 - 473, The fenced code block that shows the example output (the array with
the pre-signed HTTPS URL) is missing a language specifier; update that fenced
block to include a language tag (e.g., "text") so the snippet is rendered
correctly — locate the block containing ["https://my-bucket..."] and change totext.


</details>

</blockquote></details>
<details>
<summary>pkg/core/restore.go (1)</summary><blockquote>

`231-287`: **Pre-signed URL expiry may be insufficient for slow restores.**

The 1-hour expiry (`DefaultPresignExpiry`) should suffice for typical restores, but large etcd snapshots or slow networks could exceed this. The `etcd-init` container will fail if the URL expires mid-download.

Consider documenting this constraint or making expiry configurable for edge cases.

<details>
<summary>🤖 Prompt for AI Agents</summary>

```
Verify each finding against the current code and only fix it if needed.

In `@pkg/core/restore.go` around lines 231 - 287, presignS3URL currently always
uses s3presign.DefaultPresignExpiry which can expire during slow restores; make
the expiry configurable and use that value when calling
s3presign.GeneratePresignedGetURL. Update RestorePlugin.presignS3URL to read a
configurable expiry (e.g., from the Backup object annotation or a RestorePlugin
field), parse it as a time.Duration with a sensible fallback to
s3presign.DefaultPresignExpiry, and pass that duration into the PresignOptions.
Adjust any related types (RestorePlugin config or annotation key) and ensure
GeneratePresignedGetURL is called with PresignOptions.Expiry set to the parsed
value (not always DefaultPresignExpiry).
```

</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @pkg/common/utils.go:

  • Around line 31-34: SetK8sSAFilePath currently mutates the package-global
    k8sSAFilePath without synchronization which races with reads in
    GetCurrentNamespace; protect access by introducing a package-level sync.RWMutex
    (e.g., k8sSAFileMu) and use k8sSAFileMu.Lock() / Unlock() in SetK8sSAFilePath
    and k8sSAFileMu.RLock() / RUnlock() around reads of k8sSAFilePath inside
    GetCurrentNamespace (and any other readers) so concurrent tests/usage are safe.

In @pkg/s3presign/presign.go:

  • Around line 249-257: The uriEncode function currently uses url.PathEscape
    which leaves AWS SigV4 reserved characters unencoded; replace its usage in
    uriEncode with a custom percent-encoder that encodes every byte except the SigV4
    unreserved characters (A-Z, a-z, 0-9, '-', '_', '.', '~'), preserving segment
    boundaries by continuing to split on '/' (as in segments := strings.Split(path,
    "/")). For each segment implement byte-wise encoding: if a rune/byte is in the
    unreserved set, append as-is; otherwise append percent-encoded (%HH) uppercase
    hex for each UTF-8 byte; then join segments with "/" and return the result so
    signatures match AWS SigV4 rules.

Duplicate comments:
In @pkg/etcdbackup/orchestrator.go:

  • Around line 155-165: The WaitForCompletion polling switch on cond.Reason
    currently handles hyperv1.BackupFailedReason and common.BackupRejectedReason but
    omits EtcdUnhealthyReason; add a case for the EtcdUnhealthyReason (the same
    constant used/handled in VerifyInProgress) in the switch inside
    WaitForCompletion so that when cond.Reason == EtcdUnhealthyReason the function
    returns false and an error (e.g., fmt.Errorf("HCPEtcdBackup etcd unhealthy: %s",
    cond.Message)) to fail fast instead of continuing to poll.

Nitpick comments:
In @docs/references/HCPEtcdBackup/HCPEtcdBackup-implementation.md:

  • Around line 470-473: The fenced code block that shows the example output (the
    array with the pre-signed HTTPS URL) is missing a language specifier; update
    that fenced block to include a language tag (e.g., "text") so the snippet is
    rendered correctly — locate the block containing ["https://my-bucket..."] and
    change totext.

In @pkg/core/restore.go:

  • Around line 231-287: presignS3URL currently always uses
    s3presign.DefaultPresignExpiry which can expire during slow restores; make the
    expiry configurable and use that value when calling
    s3presign.GeneratePresignedGetURL. Update RestorePlugin.presignS3URL to read a
    configurable expiry (e.g., from the Backup object annotation or a RestorePlugin
    field), parse it as a time.Duration with a sensible fallback to
    s3presign.DefaultPresignExpiry, and pass that duration into the PresignOptions.
    Adjust any related types (RestorePlugin config or annotation key) and ensure
    GeneratePresignedGetURL is called with PresignOptions.Expiry set to the parsed
    value (not always DefaultPresignExpiry).

</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended)
- [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Organization UI

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `ba6b6f6f-7044-4f04-9b63-1049542a4b99`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between b168ebd83adc4ca29d331d83a64fc8b27881a744 and 5c532880c1b8359ba870a7da9cf684927879be72.

</details>

<details>
<summary>📒 Files selected for processing (18)</summary>

* `docs/references/HCPEtcdBackup/HCPEtcdBackup-implementation.md`
* `pkg/common/scheme.go`
* `pkg/common/types.go`
* `pkg/common/utils.go`
* `pkg/common/utils_test.go`
* `pkg/core/backup.go`
* `pkg/core/restore.go`
* `pkg/core/restore_test.go`
* `pkg/core/types/types.go`
* `pkg/core/validation/backup.go`
* `pkg/core/validation/backup_test.go`
* `pkg/core/validation/restore.go`
* `pkg/etcdbackup/orchestrator.go`
* `pkg/etcdbackup/orchestrator_test.go`
* `pkg/platform/aws/aws.go`
* `pkg/s3presign/presign.go`
* `pkg/s3presign/presign_test.go`
* `tests/integration/s3presign/presign_integration_test.go`

</details>

<details>
<summary>💤 Files with no reviewable changes (1)</summary>

* pkg/platform/aws/aws.go

</details>

<details>
<summary>✅ Files skipped from review due to trivial changes (3)</summary>

* pkg/common/scheme.go
* pkg/core/restore_test.go
* pkg/common/utils_test.go

</details>

<details>
<summary>🚧 Files skipped from review as they are similar to previous changes (5)</summary>

* pkg/core/validation/backup_test.go
* pkg/core/validation/restore.go
* pkg/core/types/types.go
* pkg/etcdbackup/orchestrator_test.go
* tests/integration/s3presign/presign_integration_test.go

</details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

Comment thread pkg/common/utils.go
Comment thread pkg/s3presign/presign.go Outdated
@jparrill jparrill force-pushed the CNTRLPLANE-2685-restore branch from 5c53288 to a970ce6 Compare April 9, 2026 22:18
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/references/HCPEtcdBackup/HCPEtcdBackup-implementation.md`:
- Around line 423-428: The prose instructs operators to verify
HostedCluster.status.lastSuccessfulEtcdBackupURL but the example command uses
the annotation path
metadata.annotations.hypershift.openshift.io/etcd-snapshot-url; update the
example so the jsonpath targets the status field instead (or change the prose to
match the annotation), e.g., reference
HostedCluster.status.lastSuccessfulEtcdBackupURL in the text and adjust the oc
get jsonpath to query that status field rather than the
metadata.annotations.hypershift.openshift.io/etcd-snapshot-url key so both prose
and command are consistent.

In `@pkg/core/restore.go`:
- Around line 246-249: The code blindly uses backup.Spec.StorageLocation when
fetching the BackupStorageLocation (see bslName, p.client.Get and
types.NamespacedName{Name: bslName, Namespace: oadpNS}, bsl) which fails if the
field is empty; change the logic to detect an empty backup.Spec.StorageLocation
and then lookup the BSL by the Velero label "velero.io/storage-location" in the
oadp namespace (e.g., list BackupStorageLocation objects filtered by that label
and pick the one matching the backup’s label), then fall back to using its name
for the Get; apply the same fallback in the CreateEtcdBackup path that calls
fetchBSL so both codepaths handle Velero’s default storage location correctly.

In `@pkg/s3presign/presign.go`:
- Around line 179-213: The presigned URL is constructed with the raw urlPath
while the canonicalRequest uses uriEncode(urlPath), causing mismatch for keys
with reserved chars; update the presigned URL construction (where presignedURL
is built) to use the same encoded path used in canonicalRequest (e.g., call
uriEncode(urlPath) or store it as encodedPath) so the request target matches
what was signed (ensure sortedQueryString(params) remains unchanged and params
includes X-Amz-Signature).

In `@tests/integration/s3presign/presign_integration_test.go`:
- Around line 103-115: The createBucket helper currently calls aws s3 mb with
--create-bucket-configuration which fails for non-us-east-1 regions; update
createBucket (function name) to call e.awsCmd(t, ...) with "s3api",
"create-bucket", "--bucket", e.bucket and, for non-us-east-1, include
"--create-bucket-configuration", fmt.Sprintf("LocationConstraint=%s", e.region)
(keep the original us-east-1 branch using "s3", "mb" for compatibility), and
preserve the existing error handling around the e.awsCmd call so failures still
call t.Fatalf.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ce04944a-3fc4-4a81-87ef-e046a664a66f

📥 Commits

Reviewing files that changed from the base of the PR and between 5c53288 and a970ce6.

📒 Files selected for processing (7)
  • docs/references/HCPEtcdBackup/HCPEtcdBackup-implementation.md
  • pkg/common/utils.go
  • pkg/core/restore.go
  • pkg/core/restore_test.go
  • pkg/s3presign/presign.go
  • pkg/s3presign/presign_test.go
  • tests/integration/s3presign/presign_integration_test.go
✅ Files skipped from review due to trivial changes (1)
  • pkg/core/restore_test.go

Comment thread docs/references/HCPEtcdBackup/HCPEtcdBackup-implementation.md
Comment thread pkg/core/restore.go
Comment thread pkg/s3presign/presign.go
Comment thread tests/integration/s3presign/presign_integration_test.go
@jparrill jparrill changed the title CNTRLPLANE-2685: inject pre-signed etcd snapshot URL during OADP restore CNTRLPLANE-2685, CNTRLPLANE-2707: inject pre-signed etcd snapshot URL during OADP restore Apr 10, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 10, 2026

@jparrill: This pull request references CNTRLPLANE-2685 which is a valid jira issue.

This pull request references CNTRLPLANE-2707 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Summary

  • During restore, reads the etcd snapshot S3 URL from the hypershift.openshift.io/etcd-snapshot-url annotation (persisted by the backup plugin, since Velero strips status fields during restore)
  • Converts s3:// URLs to pre-signed HTTPS GET URLs using AWS SigV4 (pure Go stdlib, no AWS SDK dependency) so the etcd-init container can download the snapshot via curl
  • Injects the pre-signed URL into spec.etcd.managed.storage.restoreSnapshotURL on the HostedCluster
  • Supports virtual-hosted/path-style S3 addressing, custom endpoints (MinIO, RHOCS), and STS session tokens

Dependencies

This PR depends on the following PRs being merged first:

Note for reviewers

Only the last commit (b168ebd8) needs review. The previous commits belong to PR #238 and will be rebased out once that PR is merged.

New files

File Purpose
pkg/s3presign/presign.go AWS SigV4 pre-signed URL generation (stdlib only)
pkg/s3presign/presign_test.go Unit tests (21 tests)
pkg/core/restore_test.go Restore plugin unit tests (4 tests)
tests/integration/s3presign/presign_integration_test.go Integration tests with real S3 (ephemeral buckets)

Test plan

  • go test ./pkg/s3presign/ — 21 unit tests pass
  • go test ./pkg/core/ -run TestPresign — 4 unit tests pass
  • go test -tags=integration ./tests/integration/s3presign/ — 4 integration tests pass against real S3
  • End-to-end: OADP backup with etcdSnapshot method → delete HC → OADP restore → etcd-init downloads snapshot via pre-signed URL → etcd restores successfully

Known issues

  • OCPBUGS-82166etcd-init.sh fails on OCP 4.21+ because etcd 3.6 removed etcdctl snapshot restore (must use etcdutl). This is a HyperShift issue, not in this plugin.

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@mgencur mgencur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the last commit. Posted a few questions.

```

> **Important**: The `restoreSnapshotURL` field only takes effect during HostedCluster bootstrap (initial etcd creation). Patching it on an already running cluster will not trigger an etcd restore. To test a full restore, the HostedCluster must be deleted and recreated via the OADP restore flow.
> **Important**: The `restoreSnapshotURL` field only takes effect during HostedCluster bootstrap (initial etcd creation). The `etcd-init` container skips the restore if `/var/lib/data` is not empty. To test a full restore, the HostedCluster must be deleted and recreated via the OADP restore flow.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. The control-plane might be lost and the actual hosted cluster still be running (AFAIK). Then restoring the control plane will ignore the restoreSnapshotURL? What happens in that case? We currently don't delete the data plane during backup/restore tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct — if the etcd PV already has data, the etcd-init container skips the restore entirely (line 3 of etcd-init.sh):

[ "$(ls -A /var/lib/data)" ] && echo "/var/lib/data not empty, not restoring snapshot" && exit 0

This is by design per the RestoreSnapshotURL API godoc: "This snapshot will be restored on initial startup, only when the etcd PV is empty."

In the OADP restore flow, the PVC is deleted and recreated fresh, so the PV will be empty and the restore will proceed. If the data plane is still running and the control plane is being restored, the etcd PV is new (empty) so the snapshot restore will execute correctly.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Makes sense.

Comment thread pkg/core/restore.go
return nil, fmt.Errorf("error converting item to HostedCluster: %v", err)
}
if hc.Spec.Etcd.Managed != nil {
hc.Spec.Etcd.Managed.Storage.RestoreSnapshotURL = []string{snapshotURL}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a concern that anybody who can access this RestoreSnapshotURL can download the snapshot without furthe credentials? probably not

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pre-signed URL is time-limited (1 hour TTL by default) and is only stored in spec.etcd.managed.storage.restoreSnapshotURL on the HostedCluster object, which requires RBAC to read. Once the etcd-init container uses it, the URL expires and becomes useless. Additionally, the etcd snapshot itself is already stored in the S3 bucket which the BSL credentials have access to, so the pre-signed URL doesn't grant any additional access beyond what those credentials already provide. We're also working on CEL immutability for this field (hypershift#8186) to prevent day-2 tampering.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! good to know.

@jparrill jparrill force-pushed the CNTRLPLANE-2685-restore branch from a970ce6 to 9fe273c Compare April 13, 2026 08:23
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/core/backup.go (1)

134-144: ⚠️ Potential issue | 🟠 Major

Reset cached state when the backup changes and ensure credential cleanup on wait failure.

These short-circuits cache p.hcp, p.etcdOrchestrator, and p.etcdSnapshotURL on the plugin instance without checking which Backup is being processed. Since Velero reuses one item-action instance across backups, a later backup can skip HCPEtcdBackup creation and reuse the previous snapshot URL/HCP. Additionally, if WaitForCompletion() fails, the credential secret is never cleaned up (cleanup only happens after successful wait on line 323–327).

Suggested fix
 type BackupPlugin struct {
 	log logrus.FieldLogger
 	ctx context.Context
@@
 	etcdOrchestrator  *etcdbackup.Orchestrator
 	hoNamespace       string
 	etcdBackupMethod  string
 	etcdSnapshotURL   string // populated after HCPEtcdBackup completes
+	currentBackupKey  types.NamespacedName
 }
@@
 func (p *BackupPlugin) Execute(item runtime.Unstructured, backup *velerov1.Backup) (runtime.Unstructured, []velero.ResourceIdentifier, error) {
 	p.log.Debug("Entering Hypershift backup plugin")
 	ctx := context.Context(p.ctx)
+
+	backupKey := types.NamespacedName{Name: backup.Name, Namespace: backup.Namespace}
+	if p.currentBackupKey != backupKey {
+		p.currentBackupKey = backupKey
+		p.hcp = nil
+		p.etcdOrchestrator = nil
+		p.etcdSnapshotURL = ""
+	}
 
 	if returnEarly, err := common.ShouldEndPluginExecution(ctx, backup, p.client, p.log); returnEarly {

For cleanup on failure, call CleanupCredentialSecret() before returning the error from WaitForCompletion().

Also applies to: 262–264, 314–316

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/core/backup.go` around lines 134 - 144, Reset the cached plugin state
when processing a different Backup and ensure credentials are cleaned up on wait
failure: clear p.hcp, p.etcdOrchestrator, and p.etcdSnapshotURL at the start of
handling a new backup (e.g., when entering the code that sets p.hcp from
common.GetHCP or when backup.Spec changes) so cached values from previous
backups aren't reused; and if WaitForCompletion() returns an error, call
CleanupCredentialSecret() before returning the error so the credential secret is
always removed on failure. Reference p.hcp, p.etcdOrchestrator,
p.etcdSnapshotURL for the caches and
WaitForCompletion()/CleanupCredentialSecret() for the cleanup path.
♻️ Duplicate comments (2)
docs/references/HCPEtcdBackup/HCPEtcdBackup-implementation.md (1)

423-428: ⚠️ Potential issue | 🟡 Minor

Make the command match the field being described.

The text says to verify status.lastSuccessfulEtcdBackupURL, but the jsonpath reads the annotation instead. That sends operators to the wrong field.

Suggested fix
 oc get hostedcluster <name> -n <hosted-cluster-namespace> \
-  -o jsonpath='{.metadata.annotations.hypershift\.openshift\.io/etcd-snapshot-url}'
+  -o jsonpath='{.status.lastSuccessfulEtcdBackupURL}'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/references/HCPEtcdBackup/HCPEtcdBackup-implementation.md` around lines
423 - 428, The verification command references the annotation but the text
expects the status field; change the jsonpath to read the HostedCluster
status.lastSuccessfulEtcdBackupURL (replace the current jsonpath
'{.metadata.annotations.hypershift\.openshift\.io/etcd-snapshot-url}' with one
that targets '{.status.lastSuccessfulEtcdBackupURL}') so the command actually
checks the lastSuccessfulEtcdBackupURL status field mentioned in the text.
pkg/core/restore.go (1)

246-249: ⚠️ Potential issue | 🟠 Major

Handle backups that rely on Velero’s default BSL.

backup.Spec.StorageLocation is optional in Velero. When it is empty, this does a Get with an empty BSL name and valid restores using the default storage location fail before presigning. Fall back to the recorded velero.io/storage-location label, or another default-BSL resolution path, before the lookup.

Suggested fix
 	bslName := backup.Spec.StorageLocation
+	if bslName == "" && backup.Labels != nil {
+		bslName = backup.Labels["velero.io/storage-location"]
+	}
+	if bslName == "" {
+		return "", fmt.Errorf("backup %q does not record a BackupStorageLocation", backup.Name)
+	}
 	if err := p.client.Get(ctx, types.NamespacedName{Name: bslName, Namespace: oadpNS}, bsl); err != nil {
 		return "", fmt.Errorf("error getting BackupStorageLocation %q: %w", bslName, err)
 	}

Run this to verify the concern against the vendored Velero API and the current lookup path:

#!/bin/bash
set -euo pipefail

echo "Velero BackupSpec.StorageLocation definition:"
fd -p 'backup_types.go' vendor/github.com/vmware-tanzu/velero | xargs -r rg -n -C2 'StorageLocation'

echo
echo "Current restore lookup:"
sed -n '239,249p' pkg/core/restore.go

echo
echo "Repository references to the storage-location label:"
rg -n -C2 'velero\.io/storage-location' .
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/core/restore.go` around lines 246 - 249, When resolving the
BackupStorageLocation in pkg/core/restore.go, guard against an empty
backup.Spec.StorageLocation (bslName) by first checking
backup.ObjectMeta.Labels["velero.io/storage-location"] and using that as bslName
if StorageLocation is empty before calling p.client.Get; update the lookup in
the restore flow (the code that sets bslName and calls p.client.Get) to use the
label fallback and only call p.client.Get with a non-empty name, and if still
empty consider the existing default-BSL resolution path or return a clear error
mentioning both StorageLocation and velero.io/storage-location.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/references/HCPEtcdBackup/HCPEtcdBackup-implementation.md`:
- Around line 470-473: The fenced code block showing the example output is
missing a language tag (triggering markdownlint MD040); update that fenced block
in HCPEtcdBackup-implementation.md to include the JSON language tag (```json) so
the snippet is treated as JSON and properly linted—ensure the block starts with
```json and keep the array example content unchanged.

In `@pkg/core/backup.go`:
- Around line 318-327: The WaitForCompletion error path currently returns before
removing the copied BSL credentials; call
p.etcdOrchestrator.CleanupCredentialSecret(ctx) when WaitForCompletion returns
an error (and log any cleanup warning) before returning the formatted error.
Concretely, in the error branch after p.etcdOrchestrator.WaitForCompletion(ctx)
where you have "if err != nil { ... }", invoke CleanupCredentialSecret(ctx), log
any cleanupErr with p.log.Warnf("Failed to cleanup etcd backup credential
Secret: %v", cleanupErr), then return the original fmt.Errorf("HCPEtcdBackup
failed: %v", err); keep the existing successful-path code that sets
p.etcdSnapshotURL and logs the success unchanged.

In `@pkg/core/restore.go`:
- Around line 196-208: The code currently round-trips the resource through
runtime.DefaultUnstructuredConverter.FromUnstructured/ToUnstructured into a
HostedCluster (hc) which can drop unknown fields; instead mutate the nested
unstructured map directly: get the map via input.Item.UnstructuredContent(),
traverse/create the nested maps for "spec" -> "etcd" -> "managed" -> "storage",
and set "restoreSnapshotURL" to a string slice containing snapshotURL (i.e.
[]string{snapshotURL}) in that storage map, then call
input.Item.SetUnstructuredContent if you modified the root map and keep the
existing log statement; remove the FromUnstructured/ToUnstructured/HostedCluster
conversions to preserve unknown fields.

---

Outside diff comments:
In `@pkg/core/backup.go`:
- Around line 134-144: Reset the cached plugin state when processing a different
Backup and ensure credentials are cleaned up on wait failure: clear p.hcp,
p.etcdOrchestrator, and p.etcdSnapshotURL at the start of handling a new backup
(e.g., when entering the code that sets p.hcp from common.GetHCP or when
backup.Spec changes) so cached values from previous backups aren't reused; and
if WaitForCompletion() returns an error, call CleanupCredentialSecret() before
returning the error so the credential secret is always removed on failure.
Reference p.hcp, p.etcdOrchestrator, p.etcdSnapshotURL for the caches and
WaitForCompletion()/CleanupCredentialSecret() for the cleanup path.

---

Duplicate comments:
In `@docs/references/HCPEtcdBackup/HCPEtcdBackup-implementation.md`:
- Around line 423-428: The verification command references the annotation but
the text expects the status field; change the jsonpath to read the HostedCluster
status.lastSuccessfulEtcdBackupURL (replace the current jsonpath
'{.metadata.annotations.hypershift\.openshift\.io/etcd-snapshot-url}' with one
that targets '{.status.lastSuccessfulEtcdBackupURL}') so the command actually
checks the lastSuccessfulEtcdBackupURL status field mentioned in the text.

In `@pkg/core/restore.go`:
- Around line 246-249: When resolving the BackupStorageLocation in
pkg/core/restore.go, guard against an empty backup.Spec.StorageLocation
(bslName) by first checking
backup.ObjectMeta.Labels["velero.io/storage-location"] and using that as bslName
if StorageLocation is empty before calling p.client.Get; update the lookup in
the restore flow (the code that sets bslName and calls p.client.Get) to use the
label fallback and only call p.client.Get with a non-empty name, and if still
empty consider the existing default-BSL resolution path or return a clear error
mentioning both StorageLocation and velero.io/storage-location.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2d193e69-6c15-4931-a354-6fbdaaa53ac8

📥 Commits

Reviewing files that changed from the base of the PR and between a970ce6 and 9fe273c.

⛔ Files ignored due to path filters (25)
  • go.sum is excluded by !**/*.sum
  • vendor/github.com/openshift/api/config/v1/types_apiserver.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/config/v1/types_authentication.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/config/v1/types_infrastructure.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/config/v1/types_insights.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/config/v1/types_network.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/config/v1/types_tlssecurityprofile.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/config/v1/zz_generated.swagger_doc_generated.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/operator/v1/types_network.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/api/operator/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/azure.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/azureprivatelinkservice_types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/controlplaneversion_types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/etcdbackup_types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/gcp.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/gcpprivateserviceconnect_types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hosted_controlplane.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/nodepool_types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/operator.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/zz_generated.deepcopy.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests.yaml is excluded by !vendor/**, !**/vendor/**
  • vendor/modules.txt is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (21)
  • docs/references/HCPEtcdBackup/HCPEtcdBackup-implementation.md
  • go.mod
  • pkg/common/scheme.go
  • pkg/common/types.go
  • pkg/common/utils.go
  • pkg/common/utils_test.go
  • pkg/core/backup.go
  • pkg/core/backup_test.go
  • pkg/core/restore.go
  • pkg/core/restore_test.go
  • pkg/core/types/types.go
  • pkg/core/validation/backup.go
  • pkg/core/validation/backup_test.go
  • pkg/core/validation/restore.go
  • pkg/core/validation/restore_test.go
  • pkg/etcdbackup/orchestrator.go
  • pkg/etcdbackup/orchestrator_test.go
  • pkg/platform/aws/aws.go
  • pkg/s3presign/presign.go
  • pkg/s3presign/presign_test.go
  • tests/integration/s3presign/presign_integration_test.go
💤 Files with no reviewable changes (1)
  • pkg/platform/aws/aws.go
✅ Files skipped from review due to trivial changes (5)
  • pkg/common/scheme.go
  • pkg/common/utils_test.go
  • pkg/core/restore_test.go
  • tests/integration/s3presign/presign_integration_test.go
  • pkg/etcdbackup/orchestrator.go
🚧 Files skipped from review as they are similar to previous changes (6)
  • pkg/core/validation/backup.go
  • pkg/core/validation/restore.go
  • go.mod
  • pkg/core/types/types.go
  • pkg/common/types.go
  • pkg/s3presign/presign.go

Comment thread docs/references/HCPEtcdBackup/HCPEtcdBackup-implementation.md
Comment thread pkg/core/backup.go
Comment thread pkg/core/restore.go
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pkg/core/restore_test.go (2)

55-62: Global SA path mutation may create future test-order coupling

Both tests mutate common.SetK8sSAFilePath(...) (package-global state). It’s restored correctly, but this still becomes fragile if any tests in package start using t.Parallel(). Consider centralizing this in a helper guarded by a package test mutex, or injecting the path into the code under test to avoid global mutation.

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

Also applies to: 278-285

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/core/restore_test.go` around lines 55 - 62, The tests mutate
package-global state via common.SetK8sSAFilePath (using
common.DefaultK8sSAFilePath) which can cause test-order coupling when tests run
in parallel; refactor by either (A) creating a test helper (e.g.,
setK8sSAPathForTest) that takes t *testing.T, uses a package-level sync.Mutex to
serialize modifications, sets the path to t.TempDir()+"/namespace", and
registers t.Cleanup to restore the original path, then replace the direct calls
in restore_test.go (and at 278-285) with this helper; or (B) better yet, change
the code under test to accept the SA file path as an injected parameter
(constructor or function arg) and update tests to pass the temp path instead of
calling common.SetK8sSAFilePath.

286-294: Avoid panic-prone type assertions in unstructured extraction

The chained .(map[string]any) / .([]any) assertions can panic and obscure test intent when object shape changes. Prefer unstructured.Nested* helpers so failures are explicit assertions instead of runtime panics.

Refactor sketch
- extractRestoreSnapshotURL := func(output *veleroapiv1.RestoreItemActionExecuteOutput) ([]any, bool) {
-   spec := output.UpdatedItem.UnstructuredContent()["spec"].(map[string]any)
-   etcd := spec["etcd"].(map[string]any)
-   managed := etcd["managed"].(map[string]any)
-   storage := managed["storage"].(map[string]any)
-   urls, ok := storage["restoreSnapshotURL"].([]any)
-   return urls, ok
- }
+ extractRestoreSnapshotURL := func(t *testing.T, output *veleroapiv1.RestoreItemActionExecuteOutput) ([]any, bool) {
+   t.Helper()
+   urls, found, err := unstructured.NestedSlice(
+     output.UpdatedItem.UnstructuredContent(),
+     "spec", "etcd", "managed", "storage", "restoreSnapshotURL",
+   )
+   if err != nil {
+     t.Fatalf("restoreSnapshotURL has unexpected type: %v", err)
+   }
+   return urls, found
+ }
// call-site adjustments outside this hunk:
urls, ok := extractRestoreSnapshotURL(t, output)
As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/core/restore_test.go` around lines 286 - 294, The helper
extractRestoreSnapshotURL currently uses chained type assertions on
output.UpdatedItem.UnstructuredContent() which can panic; replace it with the
k8s unstructured helpers (e.g., unstructured.NestedMap /
unstructured.NestedSlice or unstructured.NestedStringSlice) to safely traverse
the hierarchy from the RestoreItemActionExecuteOutput UnstructuredContent,
returning the slice and a bool or failing the test with clear assertions; update
the function signature (extractRestoreSnapshotURL) to accept the output map or
testing.T as needed and use unstructured.NestedMap(output, "spec", "etcd",
"managed", "storage") and then unstructured.NestedSlice on "restoreSnapshotURL"
to avoid runtime panics and make failures explicit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/core/restore_test.go`:
- Around line 55-62: The tests mutate package-global state via
common.SetK8sSAFilePath (using common.DefaultK8sSAFilePath) which can cause
test-order coupling when tests run in parallel; refactor by either (A) creating
a test helper (e.g., setK8sSAPathForTest) that takes t *testing.T, uses a
package-level sync.Mutex to serialize modifications, sets the path to
t.TempDir()+"/namespace", and registers t.Cleanup to restore the original path,
then replace the direct calls in restore_test.go (and at 278-285) with this
helper; or (B) better yet, change the code under test to accept the SA file path
as an injected parameter (constructor or function arg) and update tests to pass
the temp path instead of calling common.SetK8sSAFilePath.
- Around line 286-294: The helper extractRestoreSnapshotURL currently uses
chained type assertions on output.UpdatedItem.UnstructuredContent() which can
panic; replace it with the k8s unstructured helpers (e.g.,
unstructured.NestedMap / unstructured.NestedSlice or
unstructured.NestedStringSlice) to safely traverse the hierarchy from the
RestoreItemActionExecuteOutput UnstructuredContent, returning the slice and a
bool or failing the test with clear assertions; update the function signature
(extractRestoreSnapshotURL) to accept the output map or testing.T as needed and
use unstructured.NestedMap(output, "spec", "etcd", "managed", "storage") and
then unstructured.NestedSlice on "restoreSnapshotURL" to avoid runtime panics
and make failures explicit.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9046ed8c-8764-4498-ae3a-a10bb3b04ce7

📥 Commits

Reviewing files that changed from the base of the PR and between 9fe273c and 4e6d26e.

📒 Files selected for processing (1)
  • pkg/core/restore_test.go

@jparrill jparrill force-pushed the CNTRLPLANE-2685-restore branch from 4e6d26e to d1dda3e Compare April 13, 2026 11:00
Document the full HCPEtcdBackup integration including architecture,
backup/restore flows, configuration, credential handling, storage layout,
dependency chain (PRs #8139, #8010, #8017, #8040, #8114, enhancement #1945),
and troubleshooting guide.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Juan Manuel Parrilla Madrid <jparrill@redhat.com>
@jparrill jparrill force-pushed the CNTRLPLANE-2685-restore branch from d1dda3e to 1bcf307 Compare April 13, 2026 14:14
Copy link
Copy Markdown

@sdminonne sdminonne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment but in general not that important and basically test related

// Delete the bucket
out, err = e.awsCmd(t, "s3", "rb", fmt.Sprintf("s3://%s", e.bucket))
if err != nil {
t.Logf("WARNING: failed to delete bucket %s: %v\n%s", e.bucket, err, out)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI if this fails with a warning we're going to leak the S3 right? Guess it should fail the test with a message and s3 bucket to be deleted

jparrill and others added 2 commits April 13, 2026 20:03
During restore, read the etcd snapshot S3 URL from the
hypershift.openshift.io/etcd-snapshot-url annotation (persisted by the
backup plugin since Velero strips status fields), convert it to a
pre-signed HTTPS GET URL using AWS SigV4, and inject it into
spec.etcd.managed.storage.restoreSnapshotURL so the etcd-init container
can download the snapshot via curl.

The s3presign package implements SigV4 signing using only the Go stdlib
(no AWS SDK dependency), supporting virtual-hosted/path-style addressing,
custom S3 endpoints, and STS session tokens.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Juan Manuel Parrilla Madrid <jparrill@redhat.com>
…l Execute coverage

Add table-driven tests for presignS3URL and restore Execute flow covering
etcd snapshot URL injection, including s3://, https://, and no-annotation cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Juan Manuel Parrilla Madrid <jparrill@redhat.com>
@jparrill jparrill force-pushed the CNTRLPLANE-2685-restore branch from 1bcf307 to 07ca889 Compare April 13, 2026 18:03
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 13, 2026

@jparrill: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@sdminonne
Copy link
Copy Markdown

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Apr 13, 2026
@openshift-merge-bot openshift-merge-bot Bot merged commit 9e14896 into openshift:main Apr 13, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants