CNTRLPLANE-3167: support STS/IRSA credentials and standalone Velero#247
CNTRLPLANE-3167: support STS/IRSA credentials and standalone Velero#247jparrill wants to merge 1 commit intoopenshift:mainfrom
Conversation
…or etcd backup/restore Add support for AWS STS/IRSA credential detection and pre-signing, and handle standalone Velero deployments (no DPA) where the BSL has no explicit credential reference. - Add ParseAWSCredentialData to auto-detect credential type (static, bare ARN, INI with role_arn) and STSClient for AssumeRoleWithWebIdentity without AWS SDK dependency - Detect OADP+DPA vs standalone Velero at plugin init via DPA CRD check - Fall back to well-known cloud-credentials secret when BSL has no .spec.credential (ARO with WI, future ROSA without DPA) - Preserve original secret key alongside "credentials" in copied secret so the HCPEtcdBackup controller can auto-detect credential type - Refactor copyCredentialSecret to accept resolved SecretKeySelector Ref: openshift/hypershift#8368 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Juan Manuel Parrilla Madrid <jparrill@redhat.com>
|
Skipping CI for Draft Pull Request. |
|
@jparrill: This pull request references CNTRLPLANE-3167 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughThis PR introduces credential-handling improvements for standalone Velero deployments and OADP+DPA detection. It adds new credential type detection logic (STS vs static AWS keys), implements an STS AssumeRoleWithWebIdentity client, updates backup/restore plugins to track DPA presence and use fallback credential defaults, and refactors credential passing throughout the codebase. Changes
Sequence Diagram(s)sequenceDiagram
participant RestorePlugin
participant BSL as BackupStorageLocation
participant CredSecret as Credential Secret
participant PresignFlow as Presign Flow
participant STSClient
participant STSEndpoint as STS Endpoint
participant S3 as S3 Presigner
RestorePlugin->>PresignFlow: presignS3URL(bsl, hcName)
activate PresignFlow
PresignFlow->>BSL: Check spec.credential
alt Credential Reference Present
PresignFlow->>CredSecret: Fetch credential data
CredSecret-->>PresignFlow: credential bytes
else No Credential Reference
PresignFlow->>CredSecret: Fetch default "cloud-credentials" secret
CredSecret-->>PresignFlow: fallback credential bytes
end
PresignFlow->>PresignFlow: ParseAWSCredentialData(data)
alt Static Credentials Detected
PresignFlow->>S3: Use AWSCredentials directly
S3-->>PresignFlow: presigned URL
else STS Role Detected
PresignFlow->>STSClient: AssumeRoleWithWebIdentity(roleARN, tokenFile, hcName)
activate STSClient
STSClient->>STSClient: Read web identity token from file
STSClient->>STSEndpoint: POST (roleARN, sessionName, token)
STSEndpoint-->>STSClient: HTTP 200 + STS XML response
STSClient->>STSClient: Parse XML & extract credentials
STSClient-->>PresignFlow: AWSCredentials (temp keys + session token)
deactivate STSClient
PresignFlow->>S3: Use temporary credentials
S3-->>PresignFlow: presigned URL
end
deactivate PresignFlow
PresignFlow-->>RestorePlugin: presigned URL
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes 🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jparrill The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/core/restore.go`:
- Around line 325-330: The STS session name built as sessionName :=
fmt.Sprintf("oadp-restore-%s", hcName) can exceed AWS's 64-char RoleSessionName
limit; update the code that sets sessionName (used in the call to
s3presign.NewSTSClient().AssumeRoleWithWebIdentity) to produce a string capped
at 64 characters while preserving the "oadp-restore-" prefix (e.g., build a
helper like buildSTSSessionName(prefix, name) that concatenates prefix+name and
truncates to 64 bytes/characters before passing it into
AssumeRoleWithWebIdentity). Ensure the truncated sessionName is used in the
existing AssumeRoleWithWebIdentity call so AWS validation errors are avoided.
In `@pkg/s3presign/presign_test.go`:
- Around line 549-553: Replace the os.Unsetenv call with t.Setenv to avoid
unchecked errors: in the test block that currently checks tt.envTokenFile and
sets AWS_WEB_IDENTITY_TOKEN_FILE, always call
t.Setenv("AWS_WEB_IDENTITY_TOKEN_FILE", tt.envTokenFile) (use empty string when
tt.envTokenFile == "") instead of calling os.Unsetenv; update the code around
the tt.envTokenFile branch in presign_test.go (the block referencing
tt.envTokenFile and AWS_WEB_IDENTITY_TOKEN_FILE) so the test harness t.Setenv is
used for both branches.
In `@pkg/s3presign/sts_test.go`:
- Around line 37-45: Tests in pkg/s3presign/sts_test.go are ignoring errors
returned by w.Write in several HTTP test handlers (e.g., the handler that writes
the AssumeRoleWithWebIdentityResponse and similar handlers around the other
occurrences), which trips errcheck; modify each handler to capture the error
returned by w.Write and handle it (fail the test) instead of discarding it —
e.g., assign the result of w.Write(...) to a variable and call t.Fatalf/t.Errorf
or use require.NoError(t, err) when err != nil so the test stops on write/parse
failures; apply this change to every handler that currently calls w.Write
without checking the returned error.
In `@pkg/s3presign/sts.go`:
- Around line 111-119: The STS response handling in the function that builds
AWSCredentials (using resp.Result.Credentials from AssumeRoleWithWebIdentity)
currently only checks AccessKeyID and SecretAccessKey; update that check to also
require a non-empty SessionToken and return an error if it's missing so we never
return AWSCredentials without SessionToken (which would cause presigned S3 URLs
to omit X-Amz-Security-Token and fail). Locate the creds :=
resp.Result.Credentials block and change the conditional to validate
creds.SessionToken as well, returning a descriptive fmt.Errorf when the token is
empty before constructing the AWSCredentials.
- Around line 37-75: Modify AssumeRoleWithWebIdentity to accept a
context.Context as its first parameter and use it for the HTTP call: change the
signature of AssumeRoleWithWebIdentity(roleARN, tokenFile, sessionName string)
to AssumeRoleWithWebIdentity(ctx context.Context, roleARN, tokenFile,
sessionName string), build the url.Values as before but replace
c.HTTPClient.PostForm with creating an http.NewRequestWithContext(ctx, "POST",
endpoint, strings.NewReader(params.Encode())), set the Content-Type header to
"application/x-www-form-urlencoded" and call c.HTTPClient.Do(req); keep the
existing error wrapping and resp.Body.Close() handling so
cancellations/deadlines propagate correctly.
- Around line 26-31: NewSTSClient currently hardcodes Endpoint to
defaultSTSEndpoint which breaks non-commercial ARNs (arn:aws-us-gov,
arn:aws-cn); change NewSTSClient/STSClient so the STS endpoint is derived from
the ARN partition rather than always using defaultSTSEndpoint: detect partition
(aws, aws-cn, aws-us-gov) from the parsed ARN and map to the appropriate STS
endpoint (e.g., aws -> https://sts.amazonaws.com or regional variant, aws-cn ->
https://sts.cn-north-1.amazonaws.com.cn, aws-us-gov ->
https://sts.us-gov-west-1.amazonaws.com), and set STSClient.Endpoint accordingly
(or accept the partition/region as an argument) instead of using
defaultSTSEndpoint.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 0f862226-a971-46ed-88fc-b01b1fd13e40
📒 Files selected for processing (10)
pkg/common/types.gopkg/core/backup.gopkg/core/restore.gopkg/core/restore_test.gopkg/etcdbackup/orchestrator.gopkg/etcdbackup/orchestrator_test.gopkg/s3presign/presign.gopkg/s3presign/presign_test.gopkg/s3presign/sts.gopkg/s3presign/sts_test.go
| stsClient := s3presign.NewSTSClient() | ||
| sessionName := fmt.Sprintf("oadp-restore-%s", hcName) | ||
| creds, err = stsClient.AssumeRoleWithWebIdentity( | ||
| parsed.STSRole.RoleARN, | ||
| parsed.STSRole.WebIdentityTokenFile, | ||
| sessionName, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n pkg/core/restore.go | sed -n '320,335p'Repository: openshift/hypershift-oadp-plugin
Length of output: 693
🌐 Web query:
AWS STS RoleSessionName maximum length constraint
💡 Result:
The maximum length of the RoleSessionName parameter in AWS STS AssumeRole (and related operations like AssumeRoleWithSAML, AssumeRoleWithWebIdentity) is 64 characters. It has a minimum length of 2 characters and must match the pattern [\w+=,.@-]* (upper/lowercase alphanumeric, plus _, =, +, ,, ., @, -; no spaces).
Citations:
- 1: https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html
- 2: https://docs.aws.amazon.com/cli/latest/reference/sts/assume-role.html
- 3: https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_iam-quotas.html
- 4: https://docs.aws.amazon.com/STS/latest/APIReference/API%5FAssumeRoleWithWebIdentity.html
🏁 Script executed:
# Search for hcName definition and how it's derived
rg "hcName\s*:=" pkg/core/restore.go -B 5 -A 2Repository: openshift/hypershift-oadp-plugin
Length of output: 409
🏁 Script executed:
# Check if there are any existing length validations in the STS or session code
rg "sessionName|RoleSessionName" --type go -A 3 -B 1Repository: openshift/hypershift-oadp-plugin
Length of output: 2218
🏁 Script executed:
# Look for test files that might show typical hcName values
fd "test\.go|_test\.go" pkg/core/ -x cat {} \; | head -100Repository: openshift/hypershift-oadp-plugin
Length of output: 3055
Bound the generated STS session name to 64 characters.
Line 326 builds oadp-restore-<hcName> without truncation. AWS STS RoleSessionName has a 64-character maximum; the prefix consumes 13 characters, leaving only 51 characters for hcName. Since HostedCluster names are Kubernetes DNS subdomains (up to 253 characters), typical cluster names easily exceed this budget, causing AssumeRoleWithWebIdentity to fail with a validation error from AWS.
Suggested direction
- sessionName := fmt.Sprintf("oadp-restore-%s", hcName)
+ sessionName := buildSTSSessionName("oadp-restore-", hcName)func buildSTSSessionName(prefix, name string) string {
session := prefix + name
if len(session) > 64 {
session = session[:64]
}
return session
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/core/restore.go` around lines 325 - 330, The STS session name built as
sessionName := fmt.Sprintf("oadp-restore-%s", hcName) can exceed AWS's 64-char
RoleSessionName limit; update the code that sets sessionName (used in the call
to s3presign.NewSTSClient().AssumeRoleWithWebIdentity) to produce a string
capped at 64 characters while preserving the "oadp-restore-" prefix (e.g., build
a helper like buildSTSSessionName(prefix, name) that concatenates prefix+name
and truncates to 64 bytes/characters before passing it into
AssumeRoleWithWebIdentity). Ensure the truncated sessionName is used in the
existing AssumeRoleWithWebIdentity call so AWS validation errors are avoided.
| if tt.envTokenFile != "" { | ||
| t.Setenv("AWS_WEB_IDENTITY_TOKEN_FILE", tt.envTokenFile) | ||
| } else { | ||
| os.Unsetenv("AWS_WEB_IDENTITY_TOKEN_FILE") | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, examine the file at the specified lines
cat -n pkg/s3presign/presign_test.go | sed -n '545,560p'Repository: openshift/hypershift-oadp-plugin
Length of output: 558
🏁 Script executed:
# Check if there's a parser that handles the AWS_WEB_IDENTITY_TOKEN_FILE env var
rg -A 10 -B 5 "AWS_WEB_IDENTITY_TOKEN_FILE" pkg/s3presign/ --type goRepository: openshift/hypershift-oadp-plugin
Length of output: 5223
🏁 Script executed:
# Look for the test function containing these lines to understand context
rg -B 30 "os.Unsetenv.*AWS_WEB_IDENTITY_TOKEN_FILE" pkg/s3presign/presign_test.go --type goRepository: openshift/hypershift-oadp-plugin
Length of output: 1007
Use t.Setenv for both branches to avoid the errcheck error.
Line 552's os.Unsetenv error is not being handled, so errcheck will flag this. Since the parser treats an empty AWS_WEB_IDENTITY_TOKEN_FILE value the same as an unset variable (both return empty string from os.Getenv), you can simplify this to use t.Setenv in both branches.
Suggested change
- if tt.envTokenFile != "" {
- t.Setenv("AWS_WEB_IDENTITY_TOKEN_FILE", tt.envTokenFile)
- } else {
- os.Unsetenv("AWS_WEB_IDENTITY_TOKEN_FILE")
- }
+ t.Setenv("AWS_WEB_IDENTITY_TOKEN_FILE", tt.envTokenFile)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if tt.envTokenFile != "" { | |
| t.Setenv("AWS_WEB_IDENTITY_TOKEN_FILE", tt.envTokenFile) | |
| } else { | |
| os.Unsetenv("AWS_WEB_IDENTITY_TOKEN_FILE") | |
| } | |
| t.Setenv("AWS_WEB_IDENTITY_TOKEN_FILE", tt.envTokenFile) |
🧰 Tools
🪛 golangci-lint (2.11.4)
[error] 552-552: Error return value of os.Unsetenv is not checked
(errcheck)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/s3presign/presign_test.go` around lines 549 - 553, Replace the
os.Unsetenv call with t.Setenv to avoid unchecked errors: in the test block that
currently checks tt.envTokenFile and sets AWS_WEB_IDENTITY_TOKEN_FILE, always
call t.Setenv("AWS_WEB_IDENTITY_TOKEN_FILE", tt.envTokenFile) (use empty string
when tt.envTokenFile == "") instead of calling os.Unsetenv; update the code
around the tt.envTokenFile branch in presign_test.go (the block referencing
tt.envTokenFile and AWS_WEB_IDENTITY_TOKEN_FILE) so the test harness t.Setenv is
used for both branches.
| w.Write([]byte(`<AssumeRoleWithWebIdentityResponse> | ||
| <AssumeRoleWithWebIdentityResult> | ||
| <Credentials> | ||
| <AccessKeyId>ASIATESTACCESSKEY</AccessKeyId> | ||
| <SecretAccessKey>testSecretKey123</SecretAccessKey> | ||
| <SessionToken>testSessionToken456</SessionToken> | ||
| </Credentials> | ||
| </AssumeRoleWithWebIdentityResult> | ||
| </AssumeRoleWithWebIdentityResponse>`)) |
There was a problem hiding this comment.
Handle write/parse errors in test handlers to avoid errcheck failures.
Line 37, Line 79, Line 106, and Line 204 currently ignore returned errors (same pattern also appears at Line 130 and Line 207). With errcheck enabled, this can break CI.
✅ Suggested patch
@@
- w.Write([]byte(`<AssumeRoleWithWebIdentityResponse>
+ if _, err := w.Write([]byte(`<AssumeRoleWithWebIdentityResponse>
<AssumeRoleWithWebIdentityResult>
<Credentials>
<AccessKeyId>ASIATESTACCESSKEY</AccessKeyId>
<SecretAccessKey>testSecretKey123</SecretAccessKey>
<SessionToken>testSessionToken456</SessionToken>
</Credentials>
</AssumeRoleWithWebIdentityResult>
-</AssumeRoleWithWebIdentityResponse>`))
+</AssumeRoleWithWebIdentityResponse>`)); err != nil {
+ t.Fatalf("failed to write response: %v", err)
+ }
@@
- w.Write([]byte(`<ErrorResponse><Error><Code>AccessDenied</Code><Message>Not authorized</Message></Error></ErrorResponse>`))
+ if _, err := w.Write([]byte(`<ErrorResponse><Error><Code>AccessDenied</Code><Message>Not authorized</Message></Error></ErrorResponse>`)); err != nil {
+ t.Fatalf("failed to write response: %v", err)
+ }
@@
- w.Write([]byte(`not xml at all`))
+ if _, err := w.Write([]byte(`not xml at all`)); err != nil {
+ t.Fatalf("failed to write response: %v", err)
+ }
@@
- w.Write([]byte(`<AssumeRoleWithWebIdentityResponse>
+ if _, err := w.Write([]byte(`<AssumeRoleWithWebIdentityResponse>
<AssumeRoleWithWebIdentityResult>
<Credentials>
<AccessKeyId></AccessKeyId>
<SecretAccessKey></SecretAccessKey>
<SessionToken></SessionToken>
</Credentials>
</AssumeRoleWithWebIdentityResult>
-</AssumeRoleWithWebIdentityResponse>`))
+</AssumeRoleWithWebIdentityResponse>`)); err != nil {
+ t.Fatalf("failed to write response: %v", err)
+ }
@@
- r.ParseForm()
+ if err := r.ParseForm(); err != nil {
+ t.Fatalf("failed to parse form: %v", err)
+ }
@@
- w.Write([]byte(`<AssumeRoleWithWebIdentityResponse>
+ if _, err := w.Write([]byte(`<AssumeRoleWithWebIdentityResponse>
<AssumeRoleWithWebIdentityResult>
<Credentials>
<AccessKeyId>ASIA123</AccessKeyId>
<SecretAccessKey>secret</SecretAccessKey>
<SessionToken>token</SessionToken>
</Credentials>
</AssumeRoleWithWebIdentityResult>
-</AssumeRoleWithWebIdentityResponse>`))
+</AssumeRoleWithWebIdentityResponse>`)); err != nil {
+ t.Fatalf("failed to write response: %v", err)
+ }Also applies to: 79-79, 106-106, 130-138, 204-215
🧰 Tools
🪛 golangci-lint (2.11.4)
[error] 37-37: Error return value of w.Write is not checked
(errcheck)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/s3presign/sts_test.go` around lines 37 - 45, Tests in
pkg/s3presign/sts_test.go are ignoring errors returned by w.Write in several
HTTP test handlers (e.g., the handler that writes the
AssumeRoleWithWebIdentityResponse and similar handlers around the other
occurrences), which trips errcheck; modify each handler to capture the error
returned by w.Write and handle it (fail the test) instead of discarding it —
e.g., assign the result of w.Write(...) to a variable and call t.Fatalf/t.Errorf
or use require.NoError(t, err) when err != nil so the test stops on write/parse
failures; apply this change to every handler that currently calls w.Write
without checking the returned error.
| // NewSTSClient creates a default STSClient pointing at the global STS endpoint. | ||
| func NewSTSClient() *STSClient { | ||
| return &STSClient{ | ||
| HTTPClient: &http.Client{Timeout: 30 * time.Second}, | ||
| Endpoint: defaultSTSEndpoint, | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd /repo && find . -name "sts.go" -o -name "*presign*test*" | head -20Repository: openshift/hypershift-oadp-plugin
Length of output: 132
🏁 Script executed:
cd /repo && git ls-files | grep -E "(sts\.go|presign.*test)" | head -20Repository: openshift/hypershift-oadp-plugin
Length of output: 132
🏁 Script executed:
cd /repo && rg "arn:aws-us-gov|arn:aws-cn" --max-count 20Repository: openshift/hypershift-oadp-plugin
Length of output: 132
🏁 Script executed:
git ls-files | grep -E "(sts\.go|presign)" | head -30Repository: openshift/hypershift-oadp-plugin
Length of output: 234
🏁 Script executed:
find . -type f -name "*.go" | xargs grep -l "sts\|STS" | head -20Repository: openshift/hypershift-oadp-plugin
Length of output: 791
🏁 Script executed:
rg "arn:aws-us-gov|arn:aws-cn" -A 2 -B 2Repository: openshift/hypershift-oadp-plugin
Length of output: 1579
🏁 Script executed:
cat -n pkg/s3presign/sts.goRepository: openshift/hypershift-oadp-plugin
Length of output: 4152
🏁 Script executed:
cat -n pkg/s3presign/presign.go | head -100Repository: openshift/hypershift-oadp-plugin
Length of output: 3406
🏁 Script executed:
cat -n pkg/s3presign/presign.go | grep -A 50 "ParseAWSCredentials\|STSRole"Repository: openshift/hypershift-oadp-plugin
Length of output: 6349
🏁 Script executed:
rg "NewSTSClient\|STSClient\{" -A 3 -B 1Repository: openshift/hypershift-oadp-plugin
Length of output: 58
🏁 Script executed:
cat -n pkg/s3presign/presign_test.go | head -150Repository: openshift/hypershift-oadp-plugin
Length of output: 4471
🏁 Script executed:
cat -n pkg/s3presign/presign_test.go | grep -A 40 "GovCloud ARN\|China region ARN"Repository: openshift/hypershift-oadp-plugin
Length of output: 4522
🏁 Script executed:
rg "NewSTSClient\|NewSTSClientWithEndpoint\|\.Endpoint\s*=" pkg/s3presign/Repository: openshift/hypershift-oadp-plugin
Length of output: 58
🏁 Script executed:
cat -n pkg/s3presign/sts_test.goRepository: openshift/hypershift-oadp-plugin
Length of output: 9610
🏁 Script executed:
rg "NewSTSClient\(\)" -A 5 -B 2Repository: openshift/hypershift-oadp-plugin
Length of output: 3159
🏁 Script executed:
rg "AssumeRoleWithWebIdentity" -A 3 -B 2 pkg/Repository: openshift/hypershift-oadp-plugin
Length of output: 7914
🏁 Script executed:
cat -n pkg/s3presign/presign.go | grep -A 30 "func GetPresignedURL\|STSClient"Repository: openshift/hypershift-oadp-plugin
Length of output: 58
🏁 Script executed:
rg "arn:aws" pkg/s3presign/presign.go -B 3 -A 3 | head -50Repository: openshift/hypershift-oadp-plugin
Length of output: 730
🏁 Script executed:
rg "partition\|Partition\|aws-us-gov\|aws-cn" pkg/s3presign/ -iRepository: openshift/hypershift-oadp-plugin
Length of output: 58
The STS endpoint must be derived from the ARN partition, not hardcoded.
NewSTSClient() always targets https://sts.amazonaws.com, but the parser also accepts arn:aws-us-gov and arn:aws-cn ARNs (verified in presign_test.go). When non-commercial ARNs are provided, they parse successfully but the STS call will fail because the endpoint is wrong. For example, GovCloud requires https://sts.us-gov-west-1.amazonaws.com and China requires https://sts.cn-north-1.amazonaws.com.cn.
Extract the partition from the ARN and route to the correct regional STS endpoint accordingly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/s3presign/sts.go` around lines 26 - 31, NewSTSClient currently hardcodes
Endpoint to defaultSTSEndpoint which breaks non-commercial ARNs (arn:aws-us-gov,
arn:aws-cn); change NewSTSClient/STSClient so the STS endpoint is derived from
the ARN partition rather than always using defaultSTSEndpoint: detect partition
(aws, aws-cn, aws-us-gov) from the parsed ARN and map to the appropriate STS
endpoint (e.g., aws -> https://sts.amazonaws.com or regional variant, aws-cn ->
https://sts.cn-north-1.amazonaws.com.cn, aws-us-gov ->
https://sts.us-gov-west-1.amazonaws.com), and set STSClient.Endpoint accordingly
(or accept the partition/region as an argument) instead of using
defaultSTSEndpoint.
| func (c *STSClient) AssumeRoleWithWebIdentity(roleARN, tokenFile, sessionName string) (*AWSCredentials, error) { | ||
| if roleARN == "" { | ||
| return nil, fmt.Errorf("roleARN is required") | ||
| } | ||
| if tokenFile == "" { | ||
| return nil, fmt.Errorf("web identity token file path is required") | ||
| } | ||
|
|
||
| token, err := os.ReadFile(tokenFile) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to read web identity token file %q: %w", tokenFile, err) | ||
| } | ||
| if len(token) == 0 { | ||
| return nil, fmt.Errorf("web identity token file %q is empty", tokenFile) | ||
| } | ||
|
|
||
| if sessionName == "" { | ||
| sessionName = defaultSessionName | ||
| } | ||
|
|
||
| params := url.Values{ | ||
| "Action": {"AssumeRoleWithWebIdentity"}, | ||
| "Version": {stsAPIVersion}, | ||
| "RoleArn": {roleARN}, | ||
| "RoleSessionName": {sessionName}, | ||
| "WebIdentityToken": {string(token)}, | ||
| "DurationSeconds": {fmt.Sprintf("%d", defaultDurationSecs)}, | ||
| } | ||
|
|
||
| endpoint := c.Endpoint | ||
| if endpoint == "" { | ||
| endpoint = defaultSTSEndpoint | ||
| } | ||
|
|
||
| resp, err := c.HTTPClient.PostForm(endpoint, params) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("STS request failed: %w", err) | ||
| } | ||
| defer resp.Body.Close() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
git ls-files | grep -E "(sts|s3presign)" | head -20Repository: openshift/hypershift-oadp-plugin
Length of output: 1258
🏁 Script executed:
cat pkg/s3presign/sts.goRepository: openshift/hypershift-oadp-plugin
Length of output: 3305
🏁 Script executed:
# Search for usages of AssumeRoleWithWebIdentity in the codebase
rg "AssumeRoleWithWebIdentity" --type goRepository: openshift/hypershift-oadp-plugin
Length of output: 2830
🏁 Script executed:
# Check for lint configuration files
fd -E 'node_modules' -E '.git' -E 'dist' '(golangci|lint|.golintrc)'Repository: openshift/hypershift-oadp-plugin
Length of output: 130
🏁 Script executed:
cat -n pkg/core/restore.go | grep -A 10 -B 5 "AssumeRoleWithWebIdentity"Repository: openshift/hypershift-oadp-plugin
Length of output: 751
🏁 Script executed:
# Check for golangci-lint or other lint configurations
fd -name ".golangci*" -o -name "Makefile" -o -name "*.yaml" | xargs grep -l "lint" 2>/dev/null | head -10Repository: openshift/hypershift-oadp-plugin
Length of output: 307
🏁 Script executed:
# Look for any lint output or configuration
find . -maxdepth 2 -name ".golangci*" -type fRepository: openshift/hypershift-oadp-plugin
Length of output: 58
🏁 Script executed:
# Check the sts_test.go to understand the test structure
head -100 pkg/s3presign/sts_test.goRepository: openshift/hypershift-oadp-plugin
Length of output: 3079
🏁 Script executed:
sed -n '300,340p' pkg/core/restore.goRepository: openshift/hypershift-oadp-plugin
Length of output: 1581
🏁 Script executed:
# Get more context about the calling function
sed -n '250,335p' pkg/core/restore.goRepository: openshift/hypershift-oadp-plugin
Length of output: 3251
🏁 Script executed:
# Check the lint issue - look for any golangci or error handling checks
rg "resp.Body.Close" --type goRepository: openshift/hypershift-oadp-plugin
Length of output: 930
🏁 Script executed:
# Check Go version and lint setup
cat go.mod | head -20Repository: openshift/hypershift-oadp-plugin
Length of output: 637
🏁 Script executed:
# Check all call sites of AssumeRoleWithWebIdentity (exclude vendor and test files)
rg "AssumeRoleWithWebIdentity\(" --type go --glob "!vendor/*" --glob "!*_test.go"Repository: openshift/hypershift-oadp-plugin
Length of output: 282
🏁 Script executed:
# Check for golangci-lint config to see what rules are enabled
find . -maxdepth 2 -type f \( -name ".golangci.yml" -o -name ".golangci.yaml" -o -name "golangci.yml" \)Repository: openshift/hypershift-oadp-plugin
Length of output: 58
🏁 Script executed:
# Check Makefile or CI config for lint rules
find . -maxdepth 2 -name "Makefile" -o -name "*.yaml" -o -name "*.yml" | xargs grep -l "lint" 2>/dev/null | head -5Repository: openshift/hypershift-oadp-plugin
Length of output: 58
Add context awareness to AssumeRoleWithWebIdentity.
The method currently uses PostForm which ignores context cancellation/deadlines. Accept a context.Context parameter and switch to http.NewRequestWithContext + Do() to properly propagate context through the STS API call.
Suggested implementation
-func (c *STSClient) AssumeRoleWithWebIdentity(roleARN, tokenFile, sessionName string) (*AWSCredentials, error) {
+func (c *STSClient) AssumeRoleWithWebIdentity(ctx context.Context, roleARN, tokenFile, sessionName string) (*AWSCredentials, error) {
// ...
- resp, err := c.HTTPClient.PostForm(endpoint, params)
+ req, err := http.NewRequestWithContext(ctx, http.MethodPost, endpoint, strings.NewReader(params.Encode()))
+ if err != nil {
+ return nil, fmt.Errorf("failed to build STS request: %w", err)
+ }
+ req.Header.Set("Content-Type", "application/x-www-form-urlencoded")
+
+ resp, err := c.HTTPClient.Do(req)
if err != nil {
return nil, fmt.Errorf("STS request failed: %w", err)
}
- defer resp.Body.Close()
+ defer func() { _ = resp.Body.Close() }()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (c *STSClient) AssumeRoleWithWebIdentity(roleARN, tokenFile, sessionName string) (*AWSCredentials, error) { | |
| if roleARN == "" { | |
| return nil, fmt.Errorf("roleARN is required") | |
| } | |
| if tokenFile == "" { | |
| return nil, fmt.Errorf("web identity token file path is required") | |
| } | |
| token, err := os.ReadFile(tokenFile) | |
| if err != nil { | |
| return nil, fmt.Errorf("failed to read web identity token file %q: %w", tokenFile, err) | |
| } | |
| if len(token) == 0 { | |
| return nil, fmt.Errorf("web identity token file %q is empty", tokenFile) | |
| } | |
| if sessionName == "" { | |
| sessionName = defaultSessionName | |
| } | |
| params := url.Values{ | |
| "Action": {"AssumeRoleWithWebIdentity"}, | |
| "Version": {stsAPIVersion}, | |
| "RoleArn": {roleARN}, | |
| "RoleSessionName": {sessionName}, | |
| "WebIdentityToken": {string(token)}, | |
| "DurationSeconds": {fmt.Sprintf("%d", defaultDurationSecs)}, | |
| } | |
| endpoint := c.Endpoint | |
| if endpoint == "" { | |
| endpoint = defaultSTSEndpoint | |
| } | |
| resp, err := c.HTTPClient.PostForm(endpoint, params) | |
| if err != nil { | |
| return nil, fmt.Errorf("STS request failed: %w", err) | |
| } | |
| defer resp.Body.Close() | |
| func (c *STSClient) AssumeRoleWithWebIdentity(ctx context.Context, roleARN, tokenFile, sessionName string) (*AWSCredentials, error) { | |
| if roleARN == "" { | |
| return nil, fmt.Errorf("roleARN is required") | |
| } | |
| if tokenFile == "" { | |
| return nil, fmt.Errorf("web identity token file path is required") | |
| } | |
| token, err := os.ReadFile(tokenFile) | |
| if err != nil { | |
| return nil, fmt.Errorf("failed to read web identity token file %q: %w", tokenFile, err) | |
| } | |
| if len(token) == 0 { | |
| return nil, fmt.Errorf("web identity token file %q is empty", tokenFile) | |
| } | |
| if sessionName == "" { | |
| sessionName = defaultSessionName | |
| } | |
| params := url.Values{ | |
| "Action": {"AssumeRoleWithWebIdentity"}, | |
| "Version": {stsAPIVersion}, | |
| "RoleArn": {roleARN}, | |
| "RoleSessionName": {sessionName}, | |
| "WebIdentityToken": {string(token)}, | |
| "DurationSeconds": {fmt.Sprintf("%d", defaultDurationSecs)}, | |
| } | |
| endpoint := c.Endpoint | |
| if endpoint == "" { | |
| endpoint = defaultSTSEndpoint | |
| } | |
| req, err := http.NewRequestWithContext(ctx, http.MethodPost, endpoint, strings.NewReader(params.Encode())) | |
| if err != nil { | |
| return nil, fmt.Errorf("failed to build STS request: %w", err) | |
| } | |
| req.Header.Set("Content-Type", "application/x-www-form-urlencoded") | |
| resp, err := c.HTTPClient.Do(req) | |
| if err != nil { | |
| return nil, fmt.Errorf("STS request failed: %w", err) | |
| } | |
| defer func() { _ = resp.Body.Close() }() |
🧰 Tools
🪛 golangci-lint (2.11.4)
[error] 75-75: Error return value of resp.Body.Close is not checked
(errcheck)
[error] 71-71: (*net/http.Client).PostForm must not be called. use (*net/http.Client).Do(*http.Request)
(noctx)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/s3presign/sts.go` around lines 37 - 75, Modify AssumeRoleWithWebIdentity
to accept a context.Context as its first parameter and use it for the HTTP call:
change the signature of AssumeRoleWithWebIdentity(roleARN, tokenFile,
sessionName string) to AssumeRoleWithWebIdentity(ctx context.Context, roleARN,
tokenFile, sessionName string), build the url.Values as before but replace
c.HTTPClient.PostForm with creating an http.NewRequestWithContext(ctx, "POST",
endpoint, strings.NewReader(params.Encode())), set the Content-Type header to
"application/x-www-form-urlencoded" and call c.HTTPClient.Do(req); keep the
existing error wrapping and resp.Body.Close() handling so
cancellations/deadlines propagate correctly.
| creds := resp.Result.Credentials | ||
| if creds.AccessKeyID == "" || creds.SecretAccessKey == "" { | ||
| return nil, fmt.Errorf("STS response missing credentials") | ||
| } | ||
|
|
||
| return &AWSCredentials{ | ||
| AccessKeyID: creds.AccessKeyID, | ||
| SecretAccessKey: creds.SecretAccessKey, | ||
| SessionToken: creds.SessionToken, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n pkg/s3presign/sts.go | sed -n '100,125p'Repository: openshift/hypershift-oadp-plugin
Length of output: 892
🏁 Script executed:
# Get broader context around the function
cat -n pkg/s3presign/sts.go | head -150Repository: openshift/hypershift-oadp-plugin
Length of output: 4152
🏁 Script executed:
# Search for where SessionToken is used in the codebase
rg "SessionToken" --type goRepository: openshift/hypershift-oadp-plugin
Length of output: 1998
🏁 Script executed:
# Find presigned URL generation code
rg "presign|Presign" --type go -A 3 -B 3Repository: openshift/hypershift-oadp-plugin
Length of output: 33840
Reject STS responses that omit the session token.
The credentials returned by AssumeRoleWithWebIdentity are temporary. If SessionToken is empty here, later presigned URLs will be generated without X-Amz-Security-Token and fail against S3 even though this function reported success.
Suggested change
- if creds.AccessKeyID == "" || creds.SecretAccessKey == "" {
+ if creds.AccessKeyID == "" || creds.SecretAccessKey == "" || creds.SessionToken == "" {
return nil, fmt.Errorf("STS response missing credentials")
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| creds := resp.Result.Credentials | |
| if creds.AccessKeyID == "" || creds.SecretAccessKey == "" { | |
| return nil, fmt.Errorf("STS response missing credentials") | |
| } | |
| return &AWSCredentials{ | |
| AccessKeyID: creds.AccessKeyID, | |
| SecretAccessKey: creds.SecretAccessKey, | |
| SessionToken: creds.SessionToken, | |
| creds := resp.Result.Credentials | |
| if creds.AccessKeyID == "" || creds.SecretAccessKey == "" || creds.SessionToken == "" { | |
| return nil, fmt.Errorf("STS response missing credentials") | |
| } | |
| return &AWSCredentials{ | |
| AccessKeyID: creds.AccessKeyID, | |
| SecretAccessKey: creds.SecretAccessKey, | |
| SessionToken: creds.SessionToken, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/s3presign/sts.go` around lines 111 - 119, The STS response handling in
the function that builds AWSCredentials (using resp.Result.Credentials from
AssumeRoleWithWebIdentity) currently only checks AccessKeyID and
SecretAccessKey; update that check to also require a non-empty SessionToken and
return an error if it's missing so we never return AWSCredentials without
SessionToken (which would cause presigned S3 URLs to omit X-Amz-Security-Token
and fail). Locate the creds := resp.Result.Credentials block and change the
conditional to validate creds.SessionToken as well, returning a descriptive
fmt.Errorf when the token is empty before constructing the AWSCredentials.
Summary
cloud-credentialssecret (key:cloud) when BSL has no.spec.credentialreference (ARO with Workload Identity, future ROSA without DPA)credentialsin copied secret so the HCPEtcdBackup controller can auto-detect credential typeDescription
ROSA HCP and ARO HCP use short-lived federated credentials instead of long-lived static keys. This PR enables the OADP plugin to handle these environments:
ParseAWSCredentialDataauto-detects 3 credential formats (bare ARN, INI withrole_arn, static keys). When IRSA credentials are detected during restore, the plugin assumes the role viaAssumeRoleWithWebIdentitybefore pre-signing the S3 URLdataprotectionapplications.oadp.openshift.io). When absent (ARO, future ROSA without DPA), falls back to the well-knowncloud-credentialssecret in the Velero namespacecopyCredentialSecretnow copies the credential data with bothcredentials(for S3 controller) and the original key (e.g.,cloudfor Azure WI detection)Scenarios covered
.spec.credential(existing flow)cloud-credentialsin velero NScloud-credentialsin velero NSRelated PRs
Jira
Test plan
go build ./...passespkg/s3presign,pkg/core,pkg/etcdbackup).spec.credentialpresent pass (no regression)ParseAWSCredentialData— bare ARN, INI STS, static keys, invalid inputSTSClient.AssumeRoleWithWebIdentity— success, error, XML parsing🤖 Generated with Claude Code