CNTRLPLANE-3167: Auto-detect credential type for HCPEtcdBackup#8368
CNTRLPLANE-3167: Auto-detect credential type for HCPEtcdBackup#8368jparrill wants to merge 4 commits intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@jparrill: This pull request references CNTRLPLANE-3167 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe pull request implements credential auto-detection and handling for HCPEtcdBackup, adding a credential-resolution module that classifies AWS (static vs STS/IRSA) and Azure (workload-identity, client-secret, managed-identity) modes from Secret contents. The reconciler now derives a resolvedCredentials value and passes it to ServiceAccount and Job creation, which conditionally set annotations, labels, volumes, mounts, env vars, and CLI flags. Unit tests validate parsing and Job/ServiceAccount behavior. Docs and navigation add a Managed Services Credentials tech-preview guide. Sequence Diagram(s)sequenceDiagram
participant User
participant Reconciler
participant APIServer as Kubernetes API
participant Cloud as Cloud Platform
User->>APIServer: Create HCPEtcdBackup CR + credentials Secret
Reconciler->>APIServer: Watch HCPEtcdBackup
Note over Reconciler: Trigger reconciliation
Reconciler->>APIServer: Read credentials Secret
APIServer-->>Reconciler: Secret data (cloud, credentials)
Note over Reconciler: Resolve credential mode
alt AWS STS detected
Reconciler->>APIServer: Create ServiceAccount
Reconciler->>APIServer: Create Job with projected token volume
Reconciler->>APIServer: Set AWS_ROLE_ARN & AWS_WEB_IDENTITY_TOKEN_FILE env
Note over Reconciler: No --credentials-file
else AWS Static detected
Reconciler->>APIServer: Create ServiceAccount
Reconciler->>APIServer: Create Job with credentials secret volume
Note over Reconciler: Include --credentials-file
else Azure Workload Identity detected
Reconciler->>APIServer: Create ServiceAccount with azure.workload.identity/client-id
Reconciler->>APIServer: Create Job with pod label azure.workload.identity/use=true
Note over Reconciler: No --credentials-file
else Azure Client Secret detected
Reconciler->>APIServer: Create ServiceAccount
Reconciler->>APIServer: Create Job with credentials secret volume
Note over Reconciler: Include --credentials-file and --azure-auth-type client-secret
else Azure Managed Identity detected
Reconciler->>APIServer: Create ServiceAccount
Reconciler->>APIServer: Create Job with credentials secret volume
Note over Reconciler: Include --credentials-file and --azure-auth-type managed-identity
end
Reconciler->>APIServer: Job pod scheduled
APIServer->>Cloud: Pod authenticates using detected credential mode
Cloud-->>APIServer: Credentials validated
Note over Cloud: Pod performs etcd snapshot upload
🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jparrill The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #8368 +/- ##
==========================================
+ Coverage 37.50% 37.60% +0.09%
==========================================
Files 751 752 +1
Lines 91992 92137 +145
==========================================
+ Hits 34505 34645 +140
- Misses 54844 54848 +4
- Partials 2643 2644 +1
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@docs/content/how-to/disaster-recovery/etcd-snapshot-backup/managed-services-credentials.md`:
- Around line 275-289: The unlabeled fenced code blocks under the "AWS STS:
Access Denied" and "Azure WI: No Matching Federated Identity Record" sections
are causing markdownlint MD040; update each triple-backtick fence that contains
the error messages (the blocks shown below those headings) to include a language
tag of text (i.e., ```text) so the snippets remain unchanged visually but
satisfy the linter; locate the blocks by the section headings "AWS STS: Access
Denied" and "Azure WI: No Matching Federated Identity Record" and modify the
opening fences accordingly.
- Around line 15-25: Update the documentation to match actual copy semantics and
resolver precedence: reconcile the Secret name examples (replace or mention both
`cloud-credentials` and `etcd-backup-creds` where CR examples use the latter)
and clarify that the HyperShift-copied Secret may contain either the remapped
`credentials` key, the original `cloud` key, or both depending on copy logic;
explicitly document the resolver behavior that for Azure the resolver checks the
`cloud` key first and will classify any Secret with a `cloud` key as
workload-identity (so client-secret mode must not include `cloud` and must
supply the `credentials` JSON with the client secret), and update the narrative
in the affected sections (including the blocks referencing `cloud-credentials`,
`etcd-backup-creds`, keys `cloud` and `credentials`, and the Azure mode
description) to reflect this precedence and required Secret shapes.
- Around line 88-117: In the "IAM Role Permissions" section update the JSON
policy snippet to remove the invalid IAM actions (`s3:CreateMultipartUpload`,
`s3:UploadPart`, `s3:CompleteMultipartUpload`) and only include the valid
actions `s3:PutObject` and `s3:AbortMultipartUpload` in the "Action" array for
the shown resource ARNs; also adjust the explanatory note below the snippet to
state that multipart upload is authorized by `s3:PutObject` and that
`s3:AbortMultipartUpload` is included for cleanup, removing the claim that the
removed actions are required for the AWS SDK v2 Transfer Manager.
In `@hypershift-operator/controllers/etcdbackup/reconciler.go`:
- Around line 560-575: The current ensureServiceAccount uses a single global
jobServiceAccountName and races when multiple HCP namespaces run backups; change
ensureServiceAccount to create a per-job/per-credential ServiceAccount (e.g.,
include the HCP namespace and a short credential or job identifier derived from
resolvedCredentials and the HCP being reconciled) instead of reusing
jobServiceAccountName, update any RoleBinding/ensureRoleBinding logic to
reference that per-job ServiceAccount subject (adjust symbols
ensureServiceAccount and any ensureRoleBinding or RoleBinding creation), and
ensure cleanup of the per-job ServiceAccount/RoleBinding after job completion or
failure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 3f26774b-d142-433f-a5fe-0126039ec61b
⛔ Files ignored due to path filters (1)
docs/content/reference/aggregated-docs.mdis excluded by!docs/content/reference/aggregated-docs.md
📒 Files selected for processing (7)
docs/content/how-to/disaster-recovery/etcd-snapshot-backup/index.mddocs/content/how-to/disaster-recovery/etcd-snapshot-backup/managed-services-credentials.mddocs/mkdocs.ymlhypershift-operator/controllers/etcdbackup/credentials.gohypershift-operator/controllers/etcdbackup/credentials_test.gohypershift-operator/controllers/etcdbackup/reconciler.gohypershift-operator/controllers/etcdbackup/reconciler_test.go
c92c851 to
b337ffc
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
hypershift-operator/controllers/etcdbackup/reconciler_test.go (1)
1076-1092: Minor: Backup name generation may produce unexpected characters for larger test sets.The expression
string(rune('a'+i))works correctly for i < 26, but if the test set were expanded beyond 26 items, it would produce non-letter characters. This is fine for the current test with only 3 items, but consider usingfmt.Sprintf("backup-%d", i)for consistency with other similar tests in this file (e.g., line 1025).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/etcdbackup/reconciler_test.go` around lines 1076 - 1092, The backup name generation using string(rune('a'+i)) inside the loop that creates HCPEtcdBackup objects can produce unexpected characters for larger i; update the name to use a numeric format (e.g., fmt.Sprintf("backup-%d", i)) when constructing each HCPEtcdBackup in the for i := range 3 loop so names are predictable and consistent with other tests, and add an import for fmt if not already present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@hypershift-operator/controllers/etcdbackup/credentials.go`:
- Around line 64-79: When resolving the "cloud" secret in
resolveAzureCredentials, do not return credentialModeAzureWorkloadIdentity with
an empty ClientID; instead check if clientID == "" and if so log a warning
(e.g., klog.Warningf or processLogger) referencing secret.Name and the missing
AZURE_CLIENT_ID, then return a non-workload mode such as credentialModeUnknown
(or the existing fallback mode) so the ServiceAccount is not annotated with
azure.workload.identity/client-id:""; only set Mode to
credentialModeAzureWorkloadIdentity and populate ClientID when a non-empty
AZURE_CLIENT_ID is found.
---
Nitpick comments:
In `@hypershift-operator/controllers/etcdbackup/reconciler_test.go`:
- Around line 1076-1092: The backup name generation using string(rune('a'+i))
inside the loop that creates HCPEtcdBackup objects can produce unexpected
characters for larger i; update the name to use a numeric format (e.g.,
fmt.Sprintf("backup-%d", i)) when constructing each HCPEtcdBackup in the for i
:= range 3 loop so names are predictable and consistent with other tests, and
add an import for fmt if not already present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 3868b814-bbdb-466d-bad3-fbb6c8ba9d04
⛔ Files ignored due to path filters (1)
docs/content/reference/aggregated-docs.mdis excluded by!docs/content/reference/aggregated-docs.md
📒 Files selected for processing (7)
docs/content/how-to/disaster-recovery/etcd-snapshot-backup/index.mddocs/content/how-to/disaster-recovery/etcd-snapshot-backup/managed-services-credentials.mddocs/mkdocs.ymlhypershift-operator/controllers/etcdbackup/credentials.gohypershift-operator/controllers/etcdbackup/credentials_test.gohypershift-operator/controllers/etcdbackup/reconciler.gohypershift-operator/controllers/etcdbackup/reconciler_test.go
✅ Files skipped from review due to trivial changes (3)
- docs/mkdocs.yml
- docs/content/how-to/disaster-recovery/etcd-snapshot-backup/index.md
- hypershift-operator/controllers/etcdbackup/credentials_test.go
b337ffc to
f26d30c
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
docs/content/how-to/disaster-recovery/etcd-snapshot-backup/managed-services-credentials.md (2)
277-289:⚠️ Potential issue | 🟡 MinorAdd language tags to the troubleshooting code fences.
This issue was flagged in a previous review but remains unaddressed. The fenced code blocks at Lines 277-279 and 287-289 lack language tags, triggering markdownlint MD040. Adding
textas the language identifier will resolve the warning without changing rendering.📝 Proposed fix
### AWS STS: Access Denied -``` +```text An error occurred (AccessDenied) when calling the <S3_OPERATION> operation ``` ... ### Azure WI: No Matching Federated Identity Record -``` +```text AADSTS700213: No matching federated identity record found for presented assertion subject ```🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/content/how-to/disaster-recovery/etcd-snapshot-backup/managed-services-credentials.md` around lines 277 - 289, Two fenced code blocks containing the literal strings "An error occurred (AccessDenied) when calling the <S3_OPERATION> operation" and "AADSTS700213: No matching federated identity record found for presented assertion subject" are missing language tags causing markdownlint MD040; fix by adding the language identifier `text` immediately after the opening triple backticks for both code fences (i.e., change the opening ``` to ```text for the block containing the S3 AccessDenied message and for the block containing the AADSTS700213 message) and keep the existing closing ``` unchanged.
88-117:⚠️ Potential issue | 🟠 MajorRemove invalid IAM action names from the policy snippet; only
s3:PutObjectands3:AbortMultipartUploadare needed.This issue was flagged in a previous review but remains unaddressed. According to AWS documentation,
s3:CreateMultipartUpload,s3:UploadPart, ands3:CompleteMultipartUploadare API operation names, not IAM policy action names. Multipart upload operations are authorized entirely bys3:PutObjecton the target object. The policy should include only:
s3:PutObject(covers CreateMultipartUpload, UploadPart, and CompleteMultipartUpload operations)s3:AbortMultipartUpload(valid IAM action for cleanup)Update the policy snippet to remove Lines 100-102 and revise the explanatory note (Lines 114-116) to state that multipart upload is authorized by
s3:PutObjectand thats3:AbortMultipartUploadis included for cleanup.🛠️ Proposed fix for the IAM policy
"Action": [ "s3:PutObject", - "s3:CreateMultipartUpload", - "s3:UploadPart", - "s3:CompleteMultipartUpload", "s3:AbortMultipartUpload" ],And update the note:
!!! note - The multipart permissions (`CreateMultipartUpload`, `UploadPart`, `CompleteMultipartUpload`, `AbortMultipartUpload`) are required because the upload uses the AWS SDK v2 Transfer Manager, which automatically splits files larger than 5 MB into multipart uploads. Etcd snapshots typically range from 30-100 MB, so multipart upload is used in practice. `AbortMultipartUpload` is needed to clean up incomplete uploads on failure. + The `s3:PutObject` permission authorizes all multipart upload operations (CreateMultipartUpload, UploadPart, CompleteMultipartUpload) used by the AWS SDK v2 Transfer Manager, which automatically splits files larger than 5 MB into multipart uploads. Etcd snapshots typically range from 30-100 MB, so multipart upload is used in practice. `s3:AbortMultipartUpload` is included to clean up incomplete uploads on failure.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/content/how-to/disaster-recovery/etcd-snapshot-backup/managed-services-credentials.md` around lines 88 - 117, Update the IAM policy JSON and explanatory note: remove the invalid IAM actions "s3:CreateMultipartUpload", "s3:UploadPart", and "s3:CompleteMultipartUpload" from the Action array and keep only "s3:PutObject" and "s3:AbortMultipartUpload" (so the statement lists s3:PutObject and s3:AbortMultipartUpload for the bucket and objects), and revise the accompanying note to state that multipart uploads are authorized by s3:PutObject (which covers CreateMultipartUpload/UploadPart/CompleteMultipartUpload operations) and that s3:AbortMultipartUpload is included solely for cleanup of incomplete uploads.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@docs/content/how-to/disaster-recovery/etcd-snapshot-backup/managed-services-credentials.md`:
- Around line 15-27: There is a name mismatch between the destination Secret
shown in the explanation table ("cloud-credentials") and the HCPEtcdBackup CR
examples that reference "etcd-backup-creds"; pick the correct canonical Secret
name used by the OADP HyperShift plugin and make the docs consistent by either
(a) updating the table to show "etcd-backup-creds" if the plugin actually
creates that name, or (b) updating both HCPEtcdBackup CR examples to reference
"cloud-credentials" if that is the source of truth; additionally, add one short
clarifying sentence stating whether the plugin auto-creates and copies the
Secret (so no manual creation is required) or whether users must
create/reference a different Secret name, and apply the same change to both
HCPEtcdBackup examples currently showing the alternate name.
- Around line 275-284: Update the permissions listed in the "AWS STS: Access
Denied" troubleshooting bullet so it matches the corrected IAM policy: replace
the full multipart upload action list with only the two actions allowed by the
policy—s3:PutObject and s3:AbortMultipartUpload—so the bullet under "AWS STS:
Access Denied" contains only `s3:PutObject` and `s3:AbortMultipartUpload` and
remove `s3:CreateMultipartUpload`, `s3:UploadPart`, and
`s3:CompleteMultipartUpload`.
---
Duplicate comments:
In
`@docs/content/how-to/disaster-recovery/etcd-snapshot-backup/managed-services-credentials.md`:
- Around line 277-289: Two fenced code blocks containing the literal strings "An
error occurred (AccessDenied) when calling the <S3_OPERATION> operation" and
"AADSTS700213: No matching federated identity record found for presented
assertion subject" are missing language tags causing markdownlint MD040; fix by
adding the language identifier `text` immediately after the opening triple
backticks for both code fences (i.e., change the opening ``` to ```text for the
block containing the S3 AccessDenied message and for the block containing the
AADSTS700213 message) and keep the existing closing ``` unchanged.
- Around line 88-117: Update the IAM policy JSON and explanatory note: remove
the invalid IAM actions "s3:CreateMultipartUpload", "s3:UploadPart", and
"s3:CompleteMultipartUpload" from the Action array and keep only "s3:PutObject"
and "s3:AbortMultipartUpload" (so the statement lists s3:PutObject and
s3:AbortMultipartUpload for the bucket and objects), and revise the accompanying
note to state that multipart uploads are authorized by s3:PutObject (which
covers CreateMultipartUpload/UploadPart/CompleteMultipartUpload operations) and
that s3:AbortMultipartUpload is included solely for cleanup of incomplete
uploads.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 4351cd03-c94a-4676-b1ee-0ef34059bcec
⛔ Files ignored due to path filters (1)
docs/content/reference/aggregated-docs.mdis excluded by!docs/content/reference/aggregated-docs.md
📒 Files selected for processing (3)
docs/content/how-to/disaster-recovery/etcd-snapshot-backup/index.mddocs/content/how-to/disaster-recovery/etcd-snapshot-backup/managed-services-credentials.mddocs/mkdocs.yml
✅ Files skipped from review due to trivial changes (2)
- docs/content/how-to/disaster-recovery/etcd-snapshot-backup/index.md
- docs/mkdocs.yml
f26d30c to
c3fc037
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
hypershift-operator/controllers/etcdbackup/reconciler.go (1)
61-62:⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy liftShared ServiceAccount still creates a cross-namespace Azure WI race.
jobServiceAccountNameis global in the operator namespace, but annotation state is mutated per reconcile. Because active-job gating is per HCP namespace, concurrent backups in different HCP namespaces can overwrite/removeazure.workload.identity/client-idon the same SA before pod admission.Also applies to: 560-575, 625-629, 806-807
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/etcdbackup/reconciler.go` around lines 61 - 62, The global jobServiceAccountName leads to cross-namespace races because the controller mutates the azure.workload.identity/client-id annotation per HCP during Reconcile; change to use a namespace-scoped ServiceAccount name (e.g., derive SA name from jobServiceAccountName plus the HCP namespace or HCP UID) so each HCP gets its own SA, update all places that reference jobServiceAccountName (including the blocks around the earlier mentions) to use the derived per-HCP name, and ensure all annotation reads/writes target that per-HCP ServiceAccount rather than a single shared SA to avoid concurrent annotation overwrites.docs/content/how-to/disaster-recovery/etcd-snapshot-backup/managed-services-credentials.md (1)
25-26:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAzure detection description is internally inconsistent with resolver precedence.
Line 25 says Azure falls back from
cloudtocredentials, but Line 230 says findingcloudtriggers WI. Inhypershift-operator/controllers/etcdbackup/credentials.go(Lines 65-79), a presentcloudkey with missingAZURE_CLIENT_IDreturns managed-identity immediately (nocredentialsfallback). Please align this section to the actual precedence to avoid misconfiguration.Also applies to: 230-230
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/content/how-to/disaster-recovery/etcd-snapshot-backup/managed-services-credentials.md` around lines 25 - 26, Documentation text in managed-services-credentials.md incorrectly states Azure falls back from `cloud` to `credentials` but the actual resolver in hypershift-operator/controllers/etcdbackup/credentials.go (function handling detection around lines 65-79) treats a present `cloud` key with missing AZURE_CLIENT_ID as immediate managed-identity (no `credentials` fallback). Update both occurrences (around lines 25 and 230) to reflect the real precedence: explain that presence of the `cloud` key triggers the managed-identity flow unless explicit AZURE client credentials (AZURE_CLIENT_ID/CLIENT_SECRET/TENANT_ID) are present, and clarify there is no automatic fallback to `credentials` when `cloud` exists but AZURE_* values are absent; reference the resolver behavior in hypershift-operator/controllers/etcdbackup/credentials.go to ensure wording matches code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In
`@docs/content/how-to/disaster-recovery/etcd-snapshot-backup/managed-services-credentials.md`:
- Around line 25-26: Documentation text in managed-services-credentials.md
incorrectly states Azure falls back from `cloud` to `credentials` but the actual
resolver in hypershift-operator/controllers/etcdbackup/credentials.go (function
handling detection around lines 65-79) treats a present `cloud` key with missing
AZURE_CLIENT_ID as immediate managed-identity (no `credentials` fallback).
Update both occurrences (around lines 25 and 230) to reflect the real
precedence: explain that presence of the `cloud` key triggers the
managed-identity flow unless explicit AZURE client credentials
(AZURE_CLIENT_ID/CLIENT_SECRET/TENANT_ID) are present, and clarify there is no
automatic fallback to `credentials` when `cloud` exists but AZURE_* values are
absent; reference the resolver behavior in
hypershift-operator/controllers/etcdbackup/credentials.go to ensure wording
matches code.
In `@hypershift-operator/controllers/etcdbackup/reconciler.go`:
- Around line 61-62: The global jobServiceAccountName leads to cross-namespace
races because the controller mutates the azure.workload.identity/client-id
annotation per HCP during Reconcile; change to use a namespace-scoped
ServiceAccount name (e.g., derive SA name from jobServiceAccountName plus the
HCP namespace or HCP UID) so each HCP gets its own SA, update all places that
reference jobServiceAccountName (including the blocks around the earlier
mentions) to use the derived per-HCP name, and ensure all annotation
reads/writes target that per-HCP ServiceAccount rather than a single shared SA
to avoid concurrent annotation overwrites.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: e0f9440f-58d1-4e2a-96e1-eb34dd0d9147
⛔ Files ignored due to path filters (1)
docs/content/reference/aggregated-docs.mdis excluded by!docs/content/reference/aggregated-docs.md
📒 Files selected for processing (7)
docs/content/how-to/disaster-recovery/etcd-snapshot-backup/index.mddocs/content/how-to/disaster-recovery/etcd-snapshot-backup/managed-services-credentials.mddocs/mkdocs.ymlhypershift-operator/controllers/etcdbackup/credentials.gohypershift-operator/controllers/etcdbackup/credentials_test.gohypershift-operator/controllers/etcdbackup/reconciler.gohypershift-operator/controllers/etcdbackup/reconciler_test.go
✅ Files skipped from review due to trivial changes (2)
- docs/content/how-to/disaster-recovery/etcd-snapshot-backup/index.md
- docs/mkdocs.yml
c3fc037 to
50a1270
Compare
cblecker
left a comment
There was a problem hiding this comment.
Nice separation of the credential resolution into its own file — the detection logic and test coverage look solid. Two small items inline.
|
|
||
| Step-by-step description of the restore process: how the OADP plugin injects the snapshot URL, how the Control Plane Operator restores etcd, and how the cluster recovers. | ||
|
|
||
| ### [Managed Services Credentials](managed-services-credentials.md) |
There was a problem hiding this comment.
This new page also needs an entry in docs/mkdocs.yml under the "Etcd Snapshot Backup (Tech Preview)" nav section, otherwise it won't show up in the sidebar. Something like:
- how-to/disaster-recovery/etcd-snapshot-backup/managed-services-credentials.mdafter the restore-flow.md entry.
| case hyperv1.AzureBlobBackupStorage: | ||
| return resolveAzureCredentials(secret) | ||
| default: | ||
| return resolvedCredentials{Mode: credentialModeAWSStatic, SecretName: secret.Name} |
There was a problem hiding this comment.
nit: this default case silently returns credentialModeAWSStatic for any unknown storage type. The callers (getCredentialSecretName, buildUploadArgs) already error on unsupported types so it's not a bug today, but if a new storage type gets added later this could mask issues. Worth considering returning a distinct unknown mode or logging a warning here.
The HCPEtcdBackup controller now inspects the referenced credential Secret to determine the authentication mode instead of always passing --credentials-file. This enables IRSA/STS on AWS (ROSA HCP) and Workload Identity on Azure (ARO HCP) without API changes. Detection logic: - AWS STS: Secret key "credentials" contains an ARN prefix - AWS static: Secret key "credentials" contains INI-format creds - Azure WI: Secret key "cloud" with AZURE_CLIENT_ID env format - Azure client-secret: Secret key "credentials" with clientSecret JSON - Azure managed-identity: fallback for other credential formats Job PodSpec is built per mode: - AWS STS: projected SA token volume + AWS_ROLE_ARN env var - Azure WI: pod label azure.workload.identity/use=true, SA annotated - Static/client-secret: credentials Secret mounted as volume - Azure modes: --azure-auth-type arg set accordingly Ref: CNTRLPLANE-3167 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Juan Manuel Parrilla Madrid <jparrill@redhat.com>
…ckup Document credential auto-detection for ROSA HCP (AWS STS/IRSA) and ARO HCP (Azure Workload Identity) in the etcd snapshot backup flow. Covers OADP plugin Secret flow with key remapping, IAM/federated credential prerequisites, and troubleshooting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Juan Manuel Parrilla Madrid <jparrill@redhat.com>
…dBackup For Azure Workload Identity mode, if the etcd-backup-job ServiceAccount already has an azure.workload.identity/client-id annotation (e.g., set by infrastructure or Helm), it is preserved. Otherwise, the annotation is set from the credential secret, matching the original behavior for self-managed HyperShift deployments. This allows managed platforms (ARO-HCP) to use a dedicated managed identity for the backup job while self-managed deployments can continue reusing the Velero identity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ckup Replace the findActiveJob guard in cleanupResources with hasNonTerminalBackup, which checks for any non-terminal HCPEtcdBackup CR in the same namespace. This closes the race window where a new backup CR has been created but its Job has not been spawned yet, preventing premature deletion of shared RBAC and NetworkPolicy resources. Signed-off-by: Juan Manuel Parrilla Madrid <jparrill@redhat.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Juan Manuel Parrilla Madrid <jparrill@redhat.com>
|
Now I have the complete picture. Here is the report: Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThe Root CauseCodecov uses a flag-based coverage system where different test suites (e.g.,
The difference (187 missing files, 35,887 missing lines) corresponds exactly to the coverage flags that had not yet uploaded. This is a known Codecov behavior: when The failing check run Recommendations
Evidence
|
|
@jparrill: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
cblecker
left a comment
There was a problem hiding this comment.
A few more things beyond the inline comments:
The mode-specific checks (creds.Mode == credentialModeAWSSTS, creds.Mode == credentialModeAzureWorkloadIdentity) are scattered across reconciler.go in four places. Moving these to behavioral methods on resolvedCredentials (like needsProjectedToken(), needsWorkloadIdentityLabel(), azureAuthType()) would keep all mode knowledge co-located in credentials.go and make the reconciler a pure consumer of queries. Not blocking, but would clean up nicely.
Also, there's no test that verifies the precedence order when both cloud and credentials keys are present in the same secret. The resolver checks cloud first, which is the right behavior, but a test documenting that priority would prevent future regressions.
| for line := range strings.SplitSeq(string(cloudData), "\n") { | ||
| line = strings.TrimSpace(line) | ||
| if v, ok := strings.CutPrefix(line, "AZURE_CLIENT_ID="); ok { | ||
| clientID = v |
There was a problem hiding this comment.
resolveAWSCredentials correctly trims whitespace from the credentials value with strings.TrimSpace, but this path doesn't trim the extracted clientID. If the secret data has Windows-style line endings (\r\n) or trailing whitespace, you'd end up with clientID = "client-789\r" which would be set as the SA annotation and silently fail against Azure AD.
Worth adding clientID = strings.TrimSpace(v) here for consistency with the AWS path.
| var creds struct { | ||
| ClientSecret string `json:"clientSecret"` | ||
| } | ||
| if err := json.Unmarshal(credData, &creds); err == nil && creds.ClientSecret != "" { |
There was a problem hiding this comment.
If the credentials key contains malformed JSON (truncated secret, wrong encoding, a typo), this silently falls through to managed-identity mode with no logging or feedback. A user who provides a client-secret JSON file with a typo would get managed-identity auth instead, and the root cause would be hard to track down.
Even without changing the function signature, a log line here when err != nil would help operators diagnose misconfigured secrets.
| if sa.Annotations == nil { | ||
| sa.Annotations = map[string]string{} | ||
| } | ||
| if sa.Annotations["azure.workload.identity/client-id"] == "" { |
There was a problem hiding this comment.
This preserves an existing annotation, which makes sense for infrastructure-managed SAs. But it also means if the credential secret is rotated to a different Azure client ID, the SA annotation won't be updated — the backup job would continue using the old client ID.
Might be worth at least logging when the existing annotation value differs from the resolved creds.ClientID, so operators can tell why the "wrong" identity is being used.
Summary
credentials.gowith dedicated unit testsDescription
ROSA HCP and ARO HCP use short-lived federated credentials instead of long-lived static keys. This PR enables the HCPEtcdBackup controller to auto-detect the credential type from the Secret format:
credentialskey contains a bare IAM role ARN (arn:...), the controller configures a projected SA token volume withsts.amazonaws.comaudience and setsAWS_ROLE_ARN/AWS_WEB_IDENTITY_TOKEN_FILEenv varscloudkey with a non-emptyAZURE_CLIENT_ID=, the controller adds theazure.workload.identity/use=truepod label and annotates the SA withazure.workload.identity/client-idNo API changes — detection is purely based on Secret content inspection.
Related PRs
cloudkey for Azure WI detectionJira
Test plan
resolveAWSCredentials,resolveAzureCredentials,resolveCredentials,needsCredentialsFilecreateBackupJobwith STS, Workload Identity, and managed-identity modesensureServiceAccountannotation management🤖 Generated with Claude Code