OCPBUGS-77557: propagate additionalTrustBundle to AWS control plane components#7907
OCPBUGS-77557: propagate additionalTrustBundle to AWS control plane components#7907sdminonne wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
📝 WalkthroughWalkthroughAdds support for wiring an AWS CA bundle into deployments when a HostedControlPlane has Spec.AdditionalTrustBundle set and the platform is AWS. A new utility, DeploymentAddAWSCABundleVolume, constructs user/system CA volumes, an init container to produce a combined bundle, mounts it into main containers, and sets AWS_CA_BUNDLE. Multiple hosted control plane components now call this utility during their deployment adaptation; tests and e2e checks were added to validate presence/absence of volumes, mounts, init containers, and the AWS_CA_BUNDLE env var. Sequence Diagram(s)mermaid Changes
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
/assign @enxebre |
|
how about karpenter-aws and aws-node-termination-handler? |
That statement seems to contradict the docs https://docs.aws.amazon.com/sdk-for-go/api/aws/session/ "Path to a custom Credentials Authority (CA) bundle PEM file that the SDK will use instead of the default system's root CA bundle. Use this only if you want to replace the CA bundle the SDK uses for TLS requests." |
|
/jira refresh |
|
@sdminonne: No Jira issue is referenced in the title of this pull request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@sdminonne: This pull request references Jira Issue OCPBUGS-77557, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@sdminonne: This pull request references Jira Issue OCPBUGS-77557, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@sdminonne: This pull request references Jira Issue OCPBUGS-77557, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (yli2@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
c9c61d8 to
3aaf940
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sdminonne The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/test e2e-aws |
|
/test e2e-aks |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
🧹 Nitpick comments (3)
support/util/volumes.go (1)
93-103: Avoid hard-codingContainers[0]for CA mount/env wiringLine 93 and Line 100 assume the AWS-using container is always first. If container order changes,
AWS_CA_BUNDLEand the mount can land on the wrong container.Proposed refactor
- // Mount the combined CA bundle in the main container. - deployment.Spec.Template.Spec.Containers[0].VolumeMounts = append(deployment.Spec.Template.Spec.Containers[0].VolumeMounts, corev1.VolumeMount{ - Name: combinedCAVolumeName, - MountPath: combinedCAMountPath, - ReadOnly: true, - }) - - // Point AWS_CA_BUNDLE to the combined CA file so the AWS SDK trusts both system and user CAs. - deployment.Spec.Template.Spec.Containers[0].Env = append(deployment.Spec.Template.Spec.Containers[0].Env, corev1.EnvVar{ - Name: "AWS_CA_BUNDLE", - Value: combinedCAMountPath + "/" + combinedCAFileName, - }) + // Mount and env wiring on all app containers to avoid index-coupling. + for i := range deployment.Spec.Template.Spec.Containers { + c := &deployment.Spec.Template.Spec.Containers[i] + c.VolumeMounts = append(c.VolumeMounts, corev1.VolumeMount{ + Name: combinedCAVolumeName, + MountPath: combinedCAMountPath, + ReadOnly: true, + }) + + updated := false + for j := range c.Env { + if c.Env[j].Name == "AWS_CA_BUNDLE" { + c.Env[j].Value = combinedCAMountPath + "/" + combinedCAFileName + updated = true + break + } + } + if !updated { + c.Env = append(c.Env, corev1.EnvVar{ + Name: "AWS_CA_BUNDLE", + Value: combinedCAMountPath + "/" + combinedCAFileName, + }) + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@support/util/volumes.go` around lines 93 - 103, Do not assume the AWS-using container is Containers[0]; instead locate the correct container in deployment.Spec.Template.Spec.Containers before mutating it. Change the code that appends to deployment.Spec.Template.Spec.Containers[0].VolumeMounts and .Env to first iterate the slice and pick the container by a reliable signal (prefer container.Name == <your AWS container name variable> if available, otherwise detect a container whose Image contains "amazonaws" or which already has AWS-related env vars like "AWS_REGION" or "AWS_ACCESS_KEY_ID"); then append the corev1.VolumeMount (using combinedCAVolumeName/combinedCAMountPath) and the corev1.EnvVar {Name:"AWS_CA_BUNDLE",Value: combinedCAMountPath + "/" + combinedCAFileName} to that container's VolumeMounts and Env fields (fall back to index 0 only if no match found) so the mount and env are applied to the correct container.test/e2e/nodepool_additionalTrustBundlePropagation_test.go (1)
152-153: Use container name lookup instead of positional index in predicates.Accessing
Containers[0]makes this e2e check fragile to manifest ordering changes. Resolve the target container by name (e.g.,aws-cloud-controller-manager) before validating env vars.As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
Also applies to: 248-249
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/nodepool_additionalTrustBundlePropagation_test.go` around lines 152 - 153, The predicate currently assumes the target container is at Containers[0], which is brittle; update the check to locate the container by name (e.g., "aws-cloud-controller-manager") by iterating over obj.Spec.Template.Spec.Containers and selecting the one whose Name matches, then validate its Env slice for the AWS_CA_BUNDLE entry; apply the same container-name lookup change to the other similar occurrence referenced (around lines 248-249) so both predicates no longer rely on positional indexing.control-plane-operator/controllers/hostedcontrolplane/v2/capi_provider/deployment_test.go (1)
16-25: Consolidate duplicatedfakeReleaseProvidertest double.This same stub appears in multiple new test files in this PR. Moving it to a shared test helper (per package testutil) will reduce drift and make future interface updates cheaper.
As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/controllers/hostedcontrolplane/v2/capi_provider/deployment_test.go` around lines 16 - 25, Duplicate fakeReleaseProvider test double should be moved to a shared test helper to avoid drift; create a single exported stub type (e.g., FakeReleaseProvider) in a new test helper package (testutil) and implement the same methods: GetImage(string) string, ImageExist(string) (string,bool), Version() string, ComponentVersions() (map[string]string,error), ComponentImages() map[string]string so it satisfies the release provider interface. Replace the local fakeReleaseProvider declarations in deployment_test.go and other tests with the shared testutil.FakeReleaseProvider and update imports accordingly, ensuring method signatures match exactly so all tests compile.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/capi_provider/deployment_test.go`:
- Around line 16-25: Duplicate fakeReleaseProvider test double should be moved
to a shared test helper to avoid drift; create a single exported stub type
(e.g., FakeReleaseProvider) in a new test helper package (testutil) and
implement the same methods: GetImage(string) string, ImageExist(string)
(string,bool), Version() string, ComponentVersions() (map[string]string,error),
ComponentImages() map[string]string so it satisfies the release provider
interface. Replace the local fakeReleaseProvider declarations in
deployment_test.go and other tests with the shared testutil.FakeReleaseProvider
and update imports accordingly, ensuring method signatures match exactly so all
tests compile.
In `@support/util/volumes.go`:
- Around line 93-103: Do not assume the AWS-using container is Containers[0];
instead locate the correct container in deployment.Spec.Template.Spec.Containers
before mutating it. Change the code that appends to
deployment.Spec.Template.Spec.Containers[0].VolumeMounts and .Env to first
iterate the slice and pick the container by a reliable signal (prefer
container.Name == <your AWS container name variable> if available, otherwise
detect a container whose Image contains "amazonaws" or which already has
AWS-related env vars like "AWS_REGION" or "AWS_ACCESS_KEY_ID"); then append the
corev1.VolumeMount (using combinedCAVolumeName/combinedCAMountPath) and the
corev1.EnvVar {Name:"AWS_CA_BUNDLE",Value: combinedCAMountPath + "/" +
combinedCAFileName} to that container's VolumeMounts and Env fields (fall back
to index 0 only if no match found) so the mount and env are applied to the
correct container.
In `@test/e2e/nodepool_additionalTrustBundlePropagation_test.go`:
- Around line 152-153: The predicate currently assumes the target container is
at Containers[0], which is brittle; update the check to locate the container by
name (e.g., "aws-cloud-controller-manager") by iterating over
obj.Spec.Template.Spec.Containers and selecting the one whose Name matches, then
validate its Env slice for the AWS_CA_BUNDLE entry; apply the same
container-name lookup change to the other similar occurrence referenced (around
lines 248-249) so both predicates no longer rely on positional indexing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 30dc9019-41aa-4d55-ad6a-5b803e87e3f4
📒 Files selected for processing (15)
control-plane-operator/controllers/hostedcontrolplane/v2/awsnodeterminationhandler/deployment.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/awsnodeterminationhandler/deployment_test.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/capi_provider/deployment.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/capi_provider/deployment_test.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/aws/component.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/aws/deployment.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/aws/deployment_test.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/ingressoperator/deployment.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/ingressoperator/deployment_test.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/karpenter/deployment.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/karpenter/deployment_test.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/karpenteroperator/deployment.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/karpenteroperator/deployment_test.gosupport/util/volumes.gotest/e2e/nodepool_additionalTrustBundlePropagation_test.go
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
3aaf940 to
0a3ed02
Compare
|
@sdminonne: This pull request references Jira Issue OCPBUGS-77557, which is valid. 3 validation(s) were run on this bug
No GitHub users were found matching the public email listed for the QA contact in Jira (wewang@redhat.com), skipping review request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7907 +/- ##
==========================================
+ Coverage 37.39% 37.48% +0.08%
==========================================
Files 751 752 +1
Lines 91806 91939 +133
==========================================
+ Hits 34333 34461 +128
- Misses 54838 54841 +3
- Partials 2635 2637 +2
... and 1 file with indirect coverage changes
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
1c08219 to
1a4ffcc
Compare
|
@enxebre PTAL |
1a4ffcc to
b62e6a8
Compare
|
@enxebre PTAL |
|
I now have all the evidence needed. Here is the complete analysis: Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThis is not a test failure — it is a merge conflict preventing tide from merging the PR. The PR was created on March 10, 2026 and has not been rebased since. In the ~7 weeks since, three major PRs merged into Root CauseThe PR branch
The Recommendations
Evidence
|
…ents AWS control plane components fail TLS verification when calling AWS endpoints in isolated environments (e.g. US-ISO regions) because they do not honor the additionalTrustBundle from the HostedCluster spec. The AWS SDK replaces the system CA bundle when AWS_CA_BUNDLE is set, rather than appending to it (both v1 and v2 create a new empty x509.CertPool). To handle this, add a DeploymentAddAWSCABundleVolume helper in support/podspec that uses an init container (CPO image) to concatenate the system CA bundle with the user CAs from the additionalTrustBundle ConfigMap into a combined PEM file. AWS_CA_BUNDLE points to this combined file, ensuring the AWS SDK trusts both system and user CAs. The init container runs with a restricted security context (AllowPrivilegeEscalation=false, drop ALL capabilities) and minimal resource requests (cpu: 10m, memory: 10Mi), consistent with other lightweight init containers in the codebase. Wire the helper into all affected AWS components: - aws-cloud-controller-manager - capi-provider - ingress-operator - karpenter - karpenter-operator - aws-node-termination-handler Signed-off-by: Salvatore Dario Minonne <sminonne@redhat.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@sdminonne: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Now I have the complete picture. Both failures share the identical root cause. Here is the final report: Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryBoth CI jobs fail due to a compilation error in Root CauseThe PR adds two new blocks of test code that verify AWS_CA_BUNDLE wiring on the
The identifier
Neither of these is aliased to Fix: Replace Recommendations
Evidence
|
Summary
DeploymentAddAWSCABundleVolumehelper that creates a combined CA bundle (system + user CAs) via an init container and setsAWS_CA_BUNDLEon the main containerAdditionalTrustBundleis set on the HostedControlPlane spec:aws-cloud-controller-managercapi-provideringress-operatorkarpenterkarpenter-operatoraws-node-termination-handleraws-cloud-controller-managerProblem
In isolated AWS environments (e.g., US-ISO regions), custom CA bundles specified via
HostedCluster.Spec.AdditionalTrustBundleare not propagated to AWS control plane components. This causes TLS verification failures when these components call AWS API endpoints:Why not reuse
DeploymentAddTrustBundleVolume?The existing helper mounts a ConfigMap as a directory at
/etc/pki/tls/certs, which replaces the entire system CA directory. This works for in-house components (CPO, ignition-server, OAPI) whose TLS needs are tightly controlled. However, the affected components are binaries that make HTTPS calls to standard AWS service endpoints (EC2, ELB, STS, SQS). The AWS SDK's default HTTP client loads the system CA store from/etc/pki/tls/certsto verify TLS certificates. Replacing that directory with a ConfigMap containing only the custom CA would cause the binary to lose the public root CAs (e.g., Amazon Trust Services), breaking connectivity to standard AWS API endpoints.Why
AWS_CA_BUNDLEwith a combined bundle?The AWS SDK (both v1 and v2) reads
AWS_CA_BUNDLEand uses it instead of the system CA bundle — it creates a new emptyx509.CertPooland loads only the specified file. To avoid losing trust in standard AWS endpoints, an init container concatenates the system CAs (/etc/pki/tls/certs/ca-bundle.crt) with the user-provided CAs fromadditionalTrustBundleinto a single combined PEM file.AWS_CA_BUNDLEpoints to this combined file, ensuring the AWS SDK trusts both system and custom CAs.Test plan
AdditionalTrustBundleis setAdditionalTrustBundleis nilAWS_CA_BUNDLEwiring onaws-cloud-controller-managermake testpassesmake verifypassesFixes: https://issues.redhat.com/browse/OCPBUGS-77557
🤖 Generated with Claude Code