OCPCLOUD-2950: Adopt projected-service-account-tokens-for-credential-providers#1987
Conversation
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@theobarberbany: This pull request references OCPCLOUD-2950 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "5.0.0" version, but no target version was set. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
ed89bf1 to
2262ea2
Compare
damdo
left a comment
There was a problem hiding this comment.
Thanks @theobarberbany
Not an expert but on a first pass, the proposal looks sensible to me. Interested in hearing more from the other parties involved
2262ea2 to
a31c15b
Compare
ca4bf4e to
baf3f25
Compare
a239a43 to
7f9b92f
Compare
7f9b92f to
63e3a61
Compare
|
@theobarberbany: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
|
||
| * As a cluster administrator, I want the credential provider configuration to ship with sensible defaults but be customizable, so that I can adapt it to my organization's cloud identity setup without maintaining my own MCO overrides from scratch. | ||
|
|
||
| * As a security-conscious customer, I want the installer to not provision a default node identity for image pulls, so that I can reduce the blast radius of node compromise and use day-2 provisioned credentials instead of install-time baked-in identity. |
There was a problem hiding this comment.
This seems unnecessary especially since it's listed as a non-goal/out of scope below.
| #### Out of scope but enabled by this work | ||
|
|
||
| - **Removing install-time node identity from the installer.** KEP-4412 adoption is a prerequisite for this work. See [Removing node identity from install](#removing-node-identity-from-install) for the proposed approach. | ||
| - **Removing the ROSA ECR token refresh shim.** ROSA currently works around the lack of credential provider support by injecting a Python script via MachineConfig as a systemd unit on a 4h timer to fetch ECR tokens. KEP-4412 replaces this with the kubelet's native credential provider framework. See [ARO-24037](https://redhat.atlassian.net/browse/ARO-24037). |
There was a problem hiding this comment.
Reference to ARO card but discusses ROSA. Assuming that's because the ARO card provides context is that RHCOS doesn't ship by default with the xcr-credentials-provider binaries and as a result ROSA is using this "hack" to do it.
| #### Out of scope but enabled by this work | ||
|
|
||
| - **Removing install-time node identity from the installer.** KEP-4412 adoption is a prerequisite for this work. See [Removing node identity from install](#removing-node-identity-from-install) for the proposed approach. | ||
| - **Removing the ROSA ECR token refresh shim.** ROSA currently works around the lack of credential provider support by injecting a Python script via MachineConfig as a systemd unit on a 4h timer to fetch ECR tokens. KEP-4412 replaces this with the kubelet's native credential provider framework. See [ARO-24037](https://redhat.atlassian.net/browse/ARO-24037). |
There was a problem hiding this comment.
How does this KEP support mirroring? I thought the issue was primarily the order in which these things were called? The mirroring config is after the credential provider today right?
|
|
||
| ### API Extensions | ||
|
|
||
| None. This enhancement does not introduce new Custom Resource Definitions or modify the OpenShift API surface. The `tokenAttributes` field is part of the upstream kubelet `CredentialProviderConfig` API. |
There was a problem hiding this comment.
Users opt-in to this feature using annotations on the service level? Any dynamic config we need to add to allow them to do this or does it basically just work today if we configure the MCO correctly (which we don't today right?)
is there any reason why a cluster admin might not want this ability/therefore would want a knob to control if it's on/off?
There was a problem hiding this comment.
Based on the What KEP-4412 adoption looks like section, it seems that the "API surface" is "Admin wrote a SA with specific annotations regarding cloud provider"? The MCO enablement seems straightforward so the admin control would just be "I don't write a very specific annotation and the feature is off"
|
|
||
| #### OpenShift Kubernetes Engine | ||
|
|
||
| TBD — needs input from OKE team. |
There was a problem hiding this comment.
I don't think there's any difference between OCP and OKE in regards to this area
| 1. **ECR (cloud-provider-aws)**: Make the annotation key configurable via args or environment variable. Currently hardcoded to `eks.amazonaws.com/ecr-role-arn` — EKS-specific, not usable for OpenShift without a carry patch. | ||
| 2. **ACR (cloud-provider-azure)**: KEP-4412 support is already merged ([PR #9907](https://github.com/kubernetes-sigs/cloud-provider-azure/pull/9907)). Annotation keys (`kubernetes.azure.com/acr-client-id`, `kubernetes.azure.com/acr-tenant-id`) are hardcoded in `pkg/credentialprovider/consts.go` — need the same configurability fix as ECR. | ||
| 3. **GCR (cloud-provider-gcp)**: Implement KEP-4412 support from scratch. Opportunity to establish the configurable-annotation-key pattern from the start. |
There was a problem hiding this comment.
Why do we need something configurable vs documenting the upstream annotations for each cloud?
There was a problem hiding this comment.
I was thinking to avoid mentioning cloud provider managed Kubernetes offerings in our docs, e.g EKS, thinking that could cause confusion? Or just be unexpected and therefore odd UX.
There was a problem hiding this comment.
Does the option work beyond eks? I'd probably just look for a way to add a new version of that name upstream that isn't eks specific
| The upstream ask is making annotation keys configurable — not proposing specific keys upstream. Once configurability lands, OpenShift chooses its own defaults independently. | ||
|
|
||
| TBD — need to decide on OpenShift annotation keys for each provider. This is user-facing UX and we want reviewer input. Options: | ||
| - Per-provider annotations: `openshift.io/ecr-role-arn`, `openshift.io/acr-client-id`, `openshift.io/gcr-service-account` |
There was a problem hiding this comment.
What benefit would this provide over just using the upstream defaults?
| TBD — need to decide on OpenShift annotation keys for each provider. This is user-facing UX and we want reviewer input. Options: | ||
| - Per-provider annotations: `openshift.io/ecr-role-arn`, `openshift.io/acr-client-id`, `openshift.io/gcr-service-account` | ||
| - A single generic OpenShift annotation that each plugin interprets | ||
| - Accept the upstream vendor-specific keys as-is (requires no carry, but ties OpenShift UX to EKS/AKS conventions) |
There was a problem hiding this comment.
The azure ones seem fine. The AWS one is less good as it calls out EKS in the name.
Aligning to upstream has benefits in terms of this clearly being an upstream feature and therefore upstream docs making sense
yuqi-zhang
left a comment
There was a problem hiding this comment.
Seems straightfoward enough from the MCO side, although I think I don't fully appreciate what Removing install-time node identity from the installer entails yet
| 4. Calls AWS Security Token Service `AssumeRoleWithWebIdentity` with the service account token as the web identity token | ||
| 5. Returns temporary IAM credentials used for the ECR `GetAuthorizationToken` call | ||
|
|
||
| Kubelet config: |
There was a problem hiding this comment.
Just to double check, the existing CredentialProviderConfig template would look something like:
apiVersion: kubelet.config.k8s.io/v1
kind: CredentialProviderConfig
providers:
- name: ecr-credential-provider
...
{{- if isWorkloadIdentity . }}
tokenAttributes:
serviceAccountTokenAudience: "sts.amazonaws.com"
...
{{- end }}
?
How does the service account get managed? Is it purely user created?
| The preferred direction is upstream work to allow credential provider plugins to accept a static credential (e.g. a GCP service account key, AWS access key pair) via config file or environment variable. This would allow: | ||
|
|
||
| - CCO (mint mode) provisions a scoped credential for image pulls | ||
| - MCO ships a CredentialsRequest declaring the credential requirement, following the same pattern used by other OpenShift operators (e.g. the image-registry operator, the ingress operator) |
There was a problem hiding this comment.
Is this something we plan to do relatively soon, or is it still up for discussion whether we will go forward with this approach?
|
|
||
| ### API Extensions | ||
|
|
||
| None. This enhancement does not introduce new Custom Resource Definitions or modify the OpenShift API surface. The `tokenAttributes` field is part of the upstream kubelet `CredentialProviderConfig` API. |
There was a problem hiding this comment.
Based on the What KEP-4412 adoption looks like section, it seems that the "API surface" is "Admin wrote a SA with specific annotations regarding cloud provider"? The MCO enablement seems straightforward so the admin control would just be "I don't write a very specific annotation and the feature is off"
No description provided.