Skip to content

OCPBUGS-77557: propagate additionalTrustBundle to AWS control plane components#7907

Open
sdminonne wants to merge 1 commit intoopenshift:mainfrom
sdminonne:OCPBUGS-77557
Open

OCPBUGS-77557: propagate additionalTrustBundle to AWS control plane components#7907
sdminonne wants to merge 1 commit intoopenshift:mainfrom
sdminonne:OCPBUGS-77557

Conversation

@sdminonne
Copy link
Copy Markdown
Contributor

@sdminonne sdminonne commented Mar 10, 2026

Summary

  • Add DeploymentAddAWSCABundleVolume helper that creates a combined CA bundle (system + user CAs) via an init container and sets AWS_CA_BUNDLE on the main container
  • Wire trust bundle propagation into all six AWS control plane components when AdditionalTrustBundle is set on the HostedControlPlane spec:
    • aws-cloud-controller-manager
    • capi-provider
    • ingress-operator
    • karpenter
    • karpenter-operator
    • aws-node-termination-handler
  • Add unit tests for all six components and an e2e test for aws-cloud-controller-manager

Problem

In isolated AWS environments (e.g., US-ISO regions), custom CA bundles specified via HostedCluster.Spec.AdditionalTrustBundle are not propagated to AWS control plane components. This causes TLS verification failures when these components call AWS API endpoints:

Post https://sts.us-iso-east-1.c2s.ic.gov: tls: failed to verify certificate:
x509: certificate signed by unknown authority

Why not reuse DeploymentAddTrustBundleVolume?

The existing helper mounts a ConfigMap as a directory at /etc/pki/tls/certs, which replaces the entire system CA directory. This works for in-house components (CPO, ignition-server, OAPI) whose TLS needs are tightly controlled. However, the affected components are binaries that make HTTPS calls to standard AWS service endpoints (EC2, ELB, STS, SQS). The AWS SDK's default HTTP client loads the system CA store from /etc/pki/tls/certs to verify TLS certificates. Replacing that directory with a ConfigMap containing only the custom CA would cause the binary to lose the public root CAs (e.g., Amazon Trust Services), breaking connectivity to standard AWS API endpoints.

Why AWS_CA_BUNDLE with a combined bundle?

The AWS SDK (both v1 and v2) reads AWS_CA_BUNDLE and uses it instead of the system CA bundle — it creates a new empty x509.CertPool and loads only the specified file. To avoid losing trust in standard AWS endpoints, an init container concatenates the system CAs (/etc/pki/tls/certs/ca-bundle.crt) with the user-provided CAs from additionalTrustBundle into a single combined PEM file. AWS_CA_BUNDLE points to this combined file, ensuring the AWS SDK trusts both system and custom CAs.

Test plan

  • Unit tests verify volume, init container, mount, and env var presence when AdditionalTrustBundle is set
  • Unit tests verify no volume/env var when AdditionalTrustBundle is nil
  • Unit tests verify non-AWS platforms are unaffected (capi-provider, ingress-operator)
  • E2E test verifies AWS_CA_BUNDLE wiring on aws-cloud-controller-manager
  • make test passes
  • make verify passes

Fixes: https://issues.redhat.com/browse/OCPBUGS-77557

🤖 Generated with Claude Code

@openshift-ci-robot
Copy link
Copy Markdown

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 10, 2026

📝 Walkthrough

Walkthrough

Adds support for wiring an AWS CA bundle into deployments when a HostedControlPlane has Spec.AdditionalTrustBundle set and the platform is AWS. A new utility, DeploymentAddAWSCABundleVolume, constructs user/system CA volumes, an init container to produce a combined bundle, mounts it into main containers, and sets AWS_CA_BUNDLE. Multiple hosted control plane components now call this utility during their deployment adaptation; tests and e2e checks were added to validate presence/absence of volumes, mounts, init containers, and the AWS_CA_BUNDLE env var.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant HCP as HostedControlPlane
participant Controller as Control Plane Operator
participant Util as support/util
participant Deployment as Kubernetes Deployment
participant KubeAPI as Kubernetes API
HCP->>Controller: Reconcile / adaptDeployment invoked
Controller->>Controller: check Platform == AWS && AdditionalTrustBundle != nil
Controller->>Util: DeploymentAddAWSCABundleVolume(trustBundleConfigMap, deployment, initImage)
Util->>Deployment: add user-ca ConfigMap volume
Util->>Deployment: add aws-ca-bundle EmptyDir + init container (concat CAs)
Util->>Deployment: add volumeMount to main container + set AWS_CA_BUNDLE env
Controller->>KubeAPI: apply updated Deployment
KubeAPI-->>Deployment: Deployment updated/applied

Changes

Cohort / File(s) Summary
Utility
support/util/volumes.go
Adds DeploymentAddAWSCABundleVolume(...) to add user CA ConfigMap volume, aws-ca-bundle EmptyDir, a setup init-container to combine CAs, mount the combined bundle into containers, and set AWS_CA_BUNDLE.
CAPI Provider
control-plane-operator/controllers/hostedcontrolplane/v2/capi_provider/deployment.go and test
Calls DeploymentAddAWSCABundleVolume when platform is AWS and AdditionalTrustBundle is present; adds tests validating volumes, mounts, init container, and AWS_CA_BUNDLE.
AWS Cloud Controller Manager
control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/aws/component.go, deployment.go, tests
Registers and implements adaptDeployment to invoke DeploymentAddAWSCABundleVolume for AWS+AdditionalTrustBundle; adds tests asserting expected wiring.
Ingress Operator
control-plane-operator/controllers/hostedcontrolplane/v2/ingressoperator/deployment.go and test
Adds conditional call to DeploymentAddAWSCABundleVolume in adaptDeployment for AWS with AdditionalTrustBundle; tests added.
AWS Node Determination Handler
control-plane-operator/controllers/hostedcontrolplane/v2/awsnodeterminationhandler/deployment.go and test
Injects AWS CA bundle wiring when AdditionalTrustBundle exists; tests added to verify volumes, init container, mounts, and env var.
Karpenter & Karpenter Operator
control-plane-operator/controllers/hostedcontrolplane/v2/karpenter/deployment.go, karpenteroperator/deployment.go and tests
Adds conditional AWS CA bundle wiring in adaptDeployment for Karpenter and its operator; tests validate presence/absence of CA resources and AWS_CA_BUNDLE.
E2E test
test/e2e/nodepool_additionalTrustBundlePropagation_test.go
Adds runtime checks (AWS-only) verifying aws-cloud-controller-manager deployment has aws-ca-bundle EmptyDir, setup-aws-ca-bundle init container, and AWS_CA_BUNDLE env var present/absent across bundle add/remove scenarios.
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: propagating additionalTrustBundle to AWS control plane components, which is the central theme of all modifications across multiple deployment files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from bryan-cox and enxebre March 10, 2026 15:48
@openshift-ci openshift-ci Bot added area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/platform/aws PR/issue for AWS (AWSPlatform) platform and removed do-not-merge/needs-area labels Mar 10, 2026
@sdminonne
Copy link
Copy Markdown
Contributor Author

/assign @enxebre

@enxebre
Copy link
Copy Markdown
Member

enxebre commented Mar 10, 2026

how about karpenter-aws and aws-node-termination-handler?

@enxebre
Copy link
Copy Markdown
Member

enxebre commented Mar 10, 2026

The AWS SDK (both v1 and v2) reads AWS_CA_BUNDLE and appends those CAs to the system cert pool.

That statement seems to contradict the docs https://docs.aws.amazon.com/sdk-for-go/api/aws/session/
"Custom CA Bundle" section

"Path to a custom Credentials Authority (CA) bundle PEM file that the SDK will use instead of the default system's root CA bundle. Use this only if you want to replace the CA bundle the SDK uses for TLS requests."

@sdminonne sdminonne changed the title fix(cpo): propagate additionalTrustBundle to AWS control plane components fix(OCPBUGS-77557): propagate additionalTrustBundle to AWS control plane components Mar 11, 2026
@sdminonne
Copy link
Copy Markdown
Contributor Author

/jira refresh

@openshift-ci-robot
Copy link
Copy Markdown

@sdminonne: No Jira issue is referenced in the title of this pull request.
To reference a jira issue, add 'XYZ-NNN:' to the title of this pull request and request another refresh with /jira refresh.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sdminonne sdminonne changed the title fix(OCPBUGS-77557): propagate additionalTrustBundle to AWS control plane components OCPBUGS-77557: propagate additionalTrustBundle to AWS control plane components Mar 11, 2026
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 11, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@sdminonne: This pull request references Jira Issue OCPBUGS-77557, which is invalid:

  • expected the bug to target either version "4.22." or "openshift-4.22.", but it targets "4.21.z" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Summary

  • Add DeploymentAddAWSCABundleVolume helper that mounts the user-ca-bundle ConfigMap at a non-conflicting path (/etc/pki/ca-trust/extracted/hypershift/) and sets the AWS_CA_BUNDLE environment variable
  • Wire trust bundle propagation into aws-cloud-controller-manager, capi-provider, and ingress-operator deployments when AdditionalTrustBundle is set on the HostedControlPlane spec
  • Add unit tests for all three components

Problem

In isolated AWS environments (e.g., US-ISO regions), custom CA bundles specified via HostedCluster.Spec.AdditionalTrustBundle are not propagated to three control plane components: aws-cloud-controller-manager, ingress-operator, and capi-provider. This causes TLS verification failures when these components call AWS STS endpoints:

Post https://sts.us-iso-east-1.c2s.ic.gov: tls: failed to verify certificate:
x509: certificate signed by unknown authority

Why not reuse DeploymentAddTrustBundleVolume?

The existing helper mounts a ConfigMap as a directory at /etc/pki/tls/certs, which replaces the entire system CA directory. This works for in-house components (CPO, ignition-server, OAPI) whose TLS needs are tightly controlled. However, the three affected components are third-party binaries that make HTTPS calls to standard AWS service endpoints (EC2, ELB, STS). The AWS SDK's default HTTP client loads the system CA store from /etc/pki/tls/certs to verify TLS certificates on those connections. Replacing that directory with a ConfigMap containing only the custom CA would cause the binary to lose the public root CAs (e.g., Amazon Trust Services), breaking connectivity to standard AWS API endpoints.

Why AWS_CA_BUNDLE?

The AWS SDK (both v1 and v2) reads AWS_CA_BUNDLE and appends those CAs to the system cert pool. This means standard AWS endpoints continue to work via system CAs while also trusting custom CAs needed in isolated regions.

Test plan

  • Unit tests verify volume, mount, and env var presence when AdditionalTrustBundle is set
  • Unit tests verify no volume/env var when AdditionalTrustBundle is nil
  • Unit tests verify non-AWS platforms are unaffected (capi-provider, ingress-operator)
  • make test passes
  • make verify passes

Fixes: https://issues.redhat.com/browse/OCPBUGS-77557

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

  • Added AWS CA bundle support across control plane deployments. When an additional trust bundle is configured on AWS platforms, it is now properly mounted and integrated into deployments, enabling components to use custom CA certificates.

  • Tests

  • Added test coverage for AWS CA bundle deployment configuration across multiple components.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sdminonne
Copy link
Copy Markdown
Contributor Author

/jira refresh

@openshift-ci-robot
Copy link
Copy Markdown

@sdminonne: This pull request references Jira Issue OCPBUGS-77557, which is invalid:

  • expected the bug to target either version "4.22." or "openshift-4.22.", but it targets "4.21.z" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sdminonne
Copy link
Copy Markdown
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 11, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@sdminonne: This pull request references Jira Issue OCPBUGS-77557, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (yli2@redhat.com), skipping review request.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the area/testing Indicates the PR includes changes for e2e testing label Mar 16, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 16, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sdminonne
Once this PR has been reviewed and has the lgtm label, please ask for approval from enxebre. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sdminonne
Copy link
Copy Markdown
Contributor Author

/test e2e-aws

@sdminonne
Copy link
Copy Markdown
Contributor Author

/test e2e-aks

@sdminonne
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 16, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
support/util/volumes.go (1)

93-103: Avoid hard-coding Containers[0] for CA mount/env wiring

Line 93 and Line 100 assume the AWS-using container is always first. If container order changes, AWS_CA_BUNDLE and the mount can land on the wrong container.

Proposed refactor
-	// Mount the combined CA bundle in the main container.
-	deployment.Spec.Template.Spec.Containers[0].VolumeMounts = append(deployment.Spec.Template.Spec.Containers[0].VolumeMounts, corev1.VolumeMount{
-		Name:      combinedCAVolumeName,
-		MountPath: combinedCAMountPath,
-		ReadOnly:  true,
-	})
-
-	// Point AWS_CA_BUNDLE to the combined CA file so the AWS SDK trusts both system and user CAs.
-	deployment.Spec.Template.Spec.Containers[0].Env = append(deployment.Spec.Template.Spec.Containers[0].Env, corev1.EnvVar{
-		Name:  "AWS_CA_BUNDLE",
-		Value: combinedCAMountPath + "/" + combinedCAFileName,
-	})
+	// Mount and env wiring on all app containers to avoid index-coupling.
+	for i := range deployment.Spec.Template.Spec.Containers {
+		c := &deployment.Spec.Template.Spec.Containers[i]
+		c.VolumeMounts = append(c.VolumeMounts, corev1.VolumeMount{
+			Name:      combinedCAVolumeName,
+			MountPath: combinedCAMountPath,
+			ReadOnly:  true,
+		})
+
+		updated := false
+		for j := range c.Env {
+			if c.Env[j].Name == "AWS_CA_BUNDLE" {
+				c.Env[j].Value = combinedCAMountPath + "/" + combinedCAFileName
+				updated = true
+				break
+			}
+		}
+		if !updated {
+			c.Env = append(c.Env, corev1.EnvVar{
+				Name:  "AWS_CA_BUNDLE",
+				Value: combinedCAMountPath + "/" + combinedCAFileName,
+			})
+		}
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@support/util/volumes.go` around lines 93 - 103, Do not assume the AWS-using
container is Containers[0]; instead locate the correct container in
deployment.Spec.Template.Spec.Containers before mutating it. Change the code
that appends to deployment.Spec.Template.Spec.Containers[0].VolumeMounts and
.Env to first iterate the slice and pick the container by a reliable signal
(prefer container.Name == <your AWS container name variable> if available,
otherwise detect a container whose Image contains "amazonaws" or which already
has AWS-related env vars like "AWS_REGION" or "AWS_ACCESS_KEY_ID"); then append
the corev1.VolumeMount (using combinedCAVolumeName/combinedCAMountPath) and the
corev1.EnvVar {Name:"AWS_CA_BUNDLE",Value: combinedCAMountPath + "/" +
combinedCAFileName} to that container's VolumeMounts and Env fields (fall back
to index 0 only if no match found) so the mount and env are applied to the
correct container.
test/e2e/nodepool_additionalTrustBundlePropagation_test.go (1)

152-153: Use container name lookup instead of positional index in predicates.

Accessing Containers[0] makes this e2e check fragile to manifest ordering changes. Resolve the target container by name (e.g., aws-cloud-controller-manager) before validating env vars.

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

Also applies to: 248-249

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/nodepool_additionalTrustBundlePropagation_test.go` around lines 152
- 153, The predicate currently assumes the target container is at Containers[0],
which is brittle; update the check to locate the container by name (e.g.,
"aws-cloud-controller-manager") by iterating over
obj.Spec.Template.Spec.Containers and selecting the one whose Name matches, then
validate its Env slice for the AWS_CA_BUNDLE entry; apply the same
container-name lookup change to the other similar occurrence referenced (around
lines 248-249) so both predicates no longer rely on positional indexing.
control-plane-operator/controllers/hostedcontrolplane/v2/capi_provider/deployment_test.go (1)

16-25: Consolidate duplicated fakeReleaseProvider test double.

This same stub appears in multiple new test files in this PR. Moving it to a shared test helper (per package testutil) will reduce drift and make future interface updates cheaper.

As per coding guidelines, "Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/controllers/hostedcontrolplane/v2/capi_provider/deployment_test.go`
around lines 16 - 25, Duplicate fakeReleaseProvider test double should be moved
to a shared test helper to avoid drift; create a single exported stub type
(e.g., FakeReleaseProvider) in a new test helper package (testutil) and
implement the same methods: GetImage(string) string, ImageExist(string)
(string,bool), Version() string, ComponentVersions() (map[string]string,error),
ComponentImages() map[string]string so it satisfies the release provider
interface. Replace the local fakeReleaseProvider declarations in
deployment_test.go and other tests with the shared testutil.FakeReleaseProvider
and update imports accordingly, ensuring method signatures match exactly so all
tests compile.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/capi_provider/deployment_test.go`:
- Around line 16-25: Duplicate fakeReleaseProvider test double should be moved
to a shared test helper to avoid drift; create a single exported stub type
(e.g., FakeReleaseProvider) in a new test helper package (testutil) and
implement the same methods: GetImage(string) string, ImageExist(string)
(string,bool), Version() string, ComponentVersions() (map[string]string,error),
ComponentImages() map[string]string so it satisfies the release provider
interface. Replace the local fakeReleaseProvider declarations in
deployment_test.go and other tests with the shared testutil.FakeReleaseProvider
and update imports accordingly, ensuring method signatures match exactly so all
tests compile.

In `@support/util/volumes.go`:
- Around line 93-103: Do not assume the AWS-using container is Containers[0];
instead locate the correct container in deployment.Spec.Template.Spec.Containers
before mutating it. Change the code that appends to
deployment.Spec.Template.Spec.Containers[0].VolumeMounts and .Env to first
iterate the slice and pick the container by a reliable signal (prefer
container.Name == <your AWS container name variable> if available, otherwise
detect a container whose Image contains "amazonaws" or which already has
AWS-related env vars like "AWS_REGION" or "AWS_ACCESS_KEY_ID"); then append the
corev1.VolumeMount (using combinedCAVolumeName/combinedCAMountPath) and the
corev1.EnvVar {Name:"AWS_CA_BUNDLE",Value: combinedCAMountPath + "/" +
combinedCAFileName} to that container's VolumeMounts and Env fields (fall back
to index 0 only if no match found) so the mount and env are applied to the
correct container.

In `@test/e2e/nodepool_additionalTrustBundlePropagation_test.go`:
- Around line 152-153: The predicate currently assumes the target container is
at Containers[0], which is brittle; update the check to locate the container by
name (e.g., "aws-cloud-controller-manager") by iterating over
obj.Spec.Template.Spec.Containers and selecting the one whose Name matches, then
validate its Env slice for the AWS_CA_BUNDLE entry; apply the same
container-name lookup change to the other similar occurrence referenced (around
lines 248-249) so both predicates no longer rely on positional indexing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 30dc9019-41aa-4d55-ad6a-5b803e87e3f4

📥 Commits

Reviewing files that changed from the base of the PR and between c9c61d8 and 3aaf940.

📒 Files selected for processing (15)
  • control-plane-operator/controllers/hostedcontrolplane/v2/awsnodeterminationhandler/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/awsnodeterminationhandler/deployment_test.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/capi_provider/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/capi_provider/deployment_test.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/aws/component.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/aws/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/cloud_controller_manager/aws/deployment_test.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/ingressoperator/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/ingressoperator/deployment_test.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/karpenter/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/karpenter/deployment_test.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/karpenteroperator/deployment.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/karpenteroperator/deployment_test.go
  • support/util/volumes.go
  • test/e2e/nodepool_additionalTrustBundlePropagation_test.go

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 16, 2026
@openshift-merge-robot
Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 14, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@sdminonne: This pull request references Jira Issue OCPBUGS-77557, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira (wewang@redhat.com), skipping review request.

Details

In response to this:

Summary

  • Add DeploymentAddAWSCABundleVolume helper that creates a combined CA bundle (system + user CAs) via an init container and sets AWS_CA_BUNDLE on the main container
  • Wire trust bundle propagation into all six AWS control plane components when AdditionalTrustBundle is set on the HostedControlPlane spec:
  • aws-cloud-controller-manager
  • capi-provider
  • ingress-operator
  • karpenter
  • karpenter-operator
  • aws-node-termination-handler
  • Add unit tests for all six components and an e2e test for aws-cloud-controller-manager

Problem

In isolated AWS environments (e.g., US-ISO regions), custom CA bundles specified via HostedCluster.Spec.AdditionalTrustBundle are not propagated to AWS control plane components. This causes TLS verification failures when these components call AWS API endpoints:

Post https://sts.us-iso-east-1.c2s.ic.gov: tls: failed to verify certificate:
x509: certificate signed by unknown authority

Why not reuse DeploymentAddTrustBundleVolume?

The existing helper mounts a ConfigMap as a directory at /etc/pki/tls/certs, which replaces the entire system CA directory. This works for in-house components (CPO, ignition-server, OAPI) whose TLS needs are tightly controlled. However, the affected components are binaries that make HTTPS calls to standard AWS service endpoints (EC2, ELB, STS, SQS). The AWS SDK's default HTTP client loads the system CA store from /etc/pki/tls/certs to verify TLS certificates. Replacing that directory with a ConfigMap containing only the custom CA would cause the binary to lose the public root CAs (e.g., Amazon Trust Services), breaking connectivity to standard AWS API endpoints.

Why AWS_CA_BUNDLE with a combined bundle?

The AWS SDK (both v1 and v2) reads AWS_CA_BUNDLE and uses it instead of the system CA bundle — it creates a new empty x509.CertPool and loads only the specified file. To avoid losing trust in standard AWS endpoints, an init container concatenates the system CAs (/etc/pki/tls/certs/ca-bundle.crt) with the user-provided CAs from additionalTrustBundle into a single combined PEM file. AWS_CA_BUNDLE points to this combined file, ensuring the AWS SDK trusts both system and custom CAs.

Test plan

  • Unit tests verify volume, init container, mount, and env var presence when AdditionalTrustBundle is set
  • Unit tests verify no volume/env var when AdditionalTrustBundle is nil
  • Unit tests verify non-AWS platforms are unaffected (capi-provider, ingress-operator)
  • E2E test verifies AWS_CA_BUNDLE wiring on aws-cloud-controller-manager
  • make test passes
  • make verify passes

Fixes: https://issues.redhat.com/browse/OCPBUGS-77557

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 98.87640% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 37.48%. Comparing base (e09cc2d) to head (1de0787).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...plane/v2/cloud_controller_manager/aws/component.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7907      +/-   ##
==========================================
+ Coverage   37.39%   37.48%   +0.08%     
==========================================
  Files         751      752       +1     
  Lines       91806    91939     +133     
==========================================
+ Hits        34333    34461     +128     
- Misses      54838    54841       +3     
- Partials     2635     2637       +2     
Files with missing lines Coverage Δ
...olplane/v2/awsnodeterminationhandler/deployment.go 92.30% <100.00%> (+0.64%) ⬆️
.../hostedcontrolplane/v2/capi_provider/deployment.go 100.00% <100.00%> (ø)
...lane/v2/cloud_controller_manager/aws/deployment.go 100.00% <100.00%> (ø)
...ostedcontrolplane/v2/ingressoperator/deployment.go 65.62% <100.00%> (+3.55%) ⬆️
...lers/hostedcontrolplane/v2/karpenter/deployment.go 100.00% <100.00%> (ø)
...tedcontrolplane/v2/karpenteroperator/deployment.go 100.00% <100.00%> (ø)
support/podspec/volumes.go 72.34% <100.00%> (+72.34%) ⬆️
...plane/v2/cloud_controller_manager/aws/component.go 0.00% <0.00%> (ø)

... and 1 file with indirect coverage changes

Flag Coverage Δ
cmd-support 32.71% <100.00%> (+0.15%) ⬆️
cpo-hostedcontrolplane 36.56% <95.23%> (+0.07%) ⬆️
cpo-other 37.73% <ø> (ø)
hypershift-operator 47.93% <ø> (+0.07%) ⬆️
other 27.77% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sdminonne
Copy link
Copy Markdown
Contributor Author

@enxebre PTAL

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 17, 2026
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 20, 2026
@sdminonne
Copy link
Copy Markdown
Contributor Author

@enxebre PTAL

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 1, 2026
@hypershift-jira-solve-ci
Copy link
Copy Markdown

hypershift-jira-solve-ci Bot commented May 1, 2026

I now have all the evidence needed. Here is the complete analysis:

Test Failure Analysis Complete

Job Information

  • Prow Job: tide (merge automation)
  • Build ID: N/A (tide is a merge controller, not a test job)
  • PR: #7907 — OCPBUGS-77557: propagate additionalTrustBundle to AWS control plane components
  • Author: sdminonne
  • Branch: OCPBUGS-77557main
  • State: ERROR — Not mergeable. PR has a merge conflict.

Test Failure Analysis

Error

tide: Not mergeable. PR has a merge conflict.
GitHub mergeable_state: dirty | mergeable: false | rebaseable: false
Label applied: needs-rebase

Summary

This is not a test failure — it is a merge conflict preventing tide from merging the PR. The PR was created on March 10, 2026 and has not been rebased since. In the ~7 weeks since, three major PRs merged into main that modified the same files this PR touches, making it unmergeable. All 8 pending Prow CI jobs (e2e-aws, e2e-aks, etc.) are blocked with "Waiting for pipeline condition to trigger this job" because tide will not attempt to merge a conflicting PR, and the required presubmit jobs cannot be triggered until the conflict is resolved.

Root Cause

The PR branch OCPBUGS-77557 has git merge conflicts with the current main branch (mergeable: false, rebaseable: false). Three PRs merged into main after this PR was created that modified overlapping files:

  1. PR CNTRLPLANE-3340: Extract support/podspec package from support/util #8354 (merged Apr 28) — "CNTRLPLANE-3340: Extract support/podspec package from support/util"

    • This is the primary conflict source. It performed a large-scale refactoring that:
    • Directly conflicts with 5 files that PR OCPBUGS-77557: propagate additionalTrustBundle to AWS control plane components #7907 also modifies:
      • control-plane-operator/.../awsnodeterminationhandler/deployment.go
      • control-plane-operator/.../ingressoperator/deployment.go
      • control-plane-operator/.../karpenter/deployment.go
      • control-plane-operator/.../karpenteroperator/deployment.go
      • control-plane-operator/.../karpenteroperator/deployment_test.go
  2. PR CNTRLPLANE-3160: Drop AutoNodeKarpenter feature gate and promote EC2NodeClass to v1 #8166 (merged Apr 30) — "CNTRLPLANE-3160: Drop AutoNodeKarpenter feature gate and promote EC2NodeClass to v1"

  3. PR OCPBUGS-84551: fix(ingress): set FIPS_ENABLED env var on ingress operator #8375 (merged Apr 30) — "OCPBUGS-84551: fix(ingress): set FIPS_ENABLED env var on ingress operator"

The needs-rebase label was automatically applied by Prow's tide component, which detected that GitHub cannot automatically merge the PR.

Recommendations
  1. Rebase the PR onto current main — this is the only action needed:
    git fetch upstream main
    git rebase upstream/main
  2. Resolve merge conflicts during rebase, paying special attention to:
    • support/util/volumes.go: This file was moved to support/podspec/volumes.go. The changes from PR OCPBUGS-77557: propagate additionalTrustBundle to AWS control plane components #7907 need to be applied to the new file path instead.
    • All deployment.go files: Import statements changed from support/util to support/podspec. Ensure the additionalTrustBundle volume/mount changes use the new import path.
    • karpenter/deployment.go: The Karpenter feature gate was dropped and EC2NodeClass was promoted to v1 — ensure the trust bundle propagation is compatible with these API changes.
    • ingressoperator/deployment.go: A new FIPS_ENABLED env var was added — merge alongside the trust bundle changes.
  3. Re-run support/util/volumes_test.go — if this test file was also moved to support/podspec/, update the test file path accordingly.
  4. Force-push the rebased branch to trigger fresh CI runs. The 8 pending Prow presubmit jobs will automatically trigger once the needs-rebase label is removed by tide.
Evidence
Evidence Detail
Tide status Not mergeable. PR has a merge conflict.
GitHub mergeable false (mergeable_state: dirty, rebaseable: false)
Label needs-rebase — applied automatically by Prow
PR age Created Mar 10, 2026 — 52 days without rebase
Primary conflict PR #8354 (merged Apr 28) moved support/util/volumes.gosupport/podspec/volumes.go and updated imports in 5 overlapping files
Secondary conflict PR #8166 (merged Apr 30) modified karpenter/deployment.go (Karpenter feature gate removal)
Tertiary conflict PR #8375 (merged Apr 30) modified ingressoperator/deployment.go (FIPS env var)
Blocked CI jobs 8 pending presubmit jobs (e2e-aws, e2e-aks, e2e-v2-aws, etc.) — all "Waiting for pipeline condition"
Passed checks verify-deps, images, okd-scos-images, security, unit tests, lint, codespell, gitlint — all passed on the stale base

@openshift-ci openshift-ci Bot added area/karpenter-operator Indicates the PR includes changes related to the Karpenter operator area/platform/azure PR/issue for Azure (AzurePlatform) platform area/platform/gcp PR/issue for GCP (GCPPlatform) platform area/platform/kubevirt PR/issue for KubeVirt (KubevirtPlatform) platform area/platform/powervs PR/issue for PowerVS (PowerVSPlatform) platform labels May 5, 2026
…ents

AWS control plane components fail TLS verification when calling AWS
endpoints in isolated environments (e.g. US-ISO regions) because they
do not honor the additionalTrustBundle from the HostedCluster spec.

The AWS SDK replaces the system CA bundle when AWS_CA_BUNDLE is set,
rather than appending to it (both v1 and v2 create a new empty
x509.CertPool). To handle this, add a DeploymentAddAWSCABundleVolume
helper in support/podspec that uses an init container (CPO image) to
concatenate the system CA bundle with the user CAs from the
additionalTrustBundle ConfigMap into a combined PEM file. AWS_CA_BUNDLE
points to this combined file, ensuring the AWS SDK trusts both system
and user CAs.

The init container runs with a restricted security context
(AllowPrivilegeEscalation=false, drop ALL capabilities) and minimal
resource requests (cpu: 10m, memory: 10Mi), consistent with other
lightweight init containers in the codebase.

Wire the helper into all affected AWS components:
- aws-cloud-controller-manager
- capi-provider
- ingress-operator
- karpenter
- karpenter-operator
- aws-node-termination-handler

Signed-off-by: Salvatore Dario Minonne <sminonne@redhat.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 5, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 5, 2026

@sdminonne: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws 3aaf940 link true /test e2e-aws
ci/prow/e2e-v2-gke 13667a3 link false /test e2e-v2-gke
ci/prow/e2e-gke 13667a3 link false /test e2e-gke
ci/prow/images 1de0787 link true /test images

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hypershift-jira-solve-ci
Copy link
Copy Markdown

Now I have the complete picture. Both failures share the identical root cause. Here is the final report:

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

# github.com/openshift/hypershift/test/e2e [github.com/openshift/hypershift/test/e2e.test]
test/e2e/nodepool_additionalTrustBundlePropagation_test.go:186:19: undefined: util
test/e2e/nodepool_additionalTrustBundlePropagation_test.go:261:19: undefined: util

Summary

Both CI jobs fail due to a compilation error in test/e2e/nodepool_additionalTrustBundlePropagation_test.go. The PR adds two new code blocks that call util.IsDeploymentReady(k.ctx, obj), but util is not an imported identifier in this file. The function IsDeploymentReady lives in the support/podspec package, which is already imported as podspec. The existing code in the same file correctly calls podspec.IsDeploymentReady(k.ctx, obj). The Prow images job hits this as a go build compilation error when building the e2e test binary, and the GitHub Actions Verify job hits it as a go vet error during make vet.

Root Cause

The PR adds two new blocks of test code that verify AWS_CA_BUNDLE wiring on the aws-cloud-controller-manager deployment. Both blocks call util.IsDeploymentReady(k.ctx, obj) to check deployment readiness:

  1. Line ~186 (inside the "verify AWS_CA_BUNDLE wiring is present" block)
  2. Line ~261 (inside the "verify AWS_CA_BUNDLE wiring is removed" block)

The identifier util does not resolve to any import in this file. The file's import block contains:

  • "github.com/openshift/hypershift/support/podspec" → imported as podspec
  • e2eutil "github.com/openshift/hypershift/test/e2e/util" → imported as e2eutil

Neither of these is aliased to util. The IsDeploymentReady function is defined in support/podspec/deployment.go, so the correct call is podspec.IsDeploymentReady(k.ctx, obj) — which is exactly how the pre-existing code in this same file already calls it.

Fix: Replace util.IsDeploymentReady with podspec.IsDeploymentReady at both call sites (lines ~186 and ~261 in the modified file). No import changes are needed.

Recommendations
  1. Replace util.IsDeploymentReadypodspec.IsDeploymentReady at both locations in test/e2e/nodepool_additionalTrustBundlePropagation_test.go:

    • In the "verify AWS_CA_BUNDLE wiring is present" block (~line 186)
    • In the "verify AWS_CA_BUNDLE wiring is removed" block (~line 261)
  2. Run make vet locally before pushing to catch undefined symbol errors early: make vet runs go vet -tags integration,e2e,reqserving,e2ev2,backuprestore ./...

  3. No other changes required — the podspec package is already imported, and IsDeploymentReady has the correct signature func IsDeploymentReady(ctx context.Context, deployment *appsv1.Deployment) bool.

Evidence
Evidence Detail
Prow error go build fails: test/e2e/nodepool_additionalTrustBundlePropagation_test.go:186:19: undefined: util
GH Actions error make vet fails: vet: test/e2e/nodepool_additionalTrustBundlePropagation_test.go:186:19: undefined: util
Failed image hypershift-tests (built from Dockerfile.e2e); all other images (hypershift, hypershift-operator, hypershift-cli) succeeded
Incorrect call (line ~186) util.IsDeploymentReady(k.ctx, obj) in "verify wiring present" block
Incorrect call (line ~261) util.IsDeploymentReady(k.ctx, obj) in "verify wiring removed" block
Correct usage (same file) podspec.IsDeploymentReady(k.ctx, obj) — used in pre-existing code
Function location support/podspec/deployment.go → imported as podspec
Import mismatch e2eutil "github.com/openshift/hypershift/test/e2e/util" is imported as e2eutil, not util

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release area/karpenter-operator Indicates the PR includes changes related to the Karpenter operator area/platform/aws PR/issue for AWS (AWSPlatform) platform area/platform/azure PR/issue for Azure (AzurePlatform) platform area/platform/gcp PR/issue for GCP (GCPPlatform) platform area/platform/kubevirt PR/issue for KubeVirt (KubevirtPlatform) platform area/platform/powervs PR/issue for PowerVS (PowerVSPlatform) platform area/testing Indicates the PR includes changes for e2e testing jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants