Skip to content

OCPSTRAT-2527, OCPSTRAT-2540: Enhancement: etcd data re-encryption for key rotation in HyperShift#1969

Open
muraee wants to merge 17 commits intoopenshift:masterfrom
muraee:hypershift-etcd-reencryption
Open

OCPSTRAT-2527, OCPSTRAT-2540: Enhancement: etcd data re-encryption for key rotation in HyperShift#1969
muraee wants to merge 17 commits intoopenshift:masterfrom
muraee:hypershift-etcd-reencryption

Conversation

@muraee
Copy link
Copy Markdown

@muraee muraee commented Apr 9, 2026

Summary

  • Add enhancement proposal for etcd data re-encryption after encryption key rotation in HyperShift
  • Introduces a new HCCO controller that leverages KubeStorageVersionMigrator from library-go to create StorageVersionMigration CRs in the guest cluster, transparently re-encrypting all encrypted resources with the active key
  • Adds EtcdDataEncryptionUpToDate condition on HCP/HostedCluster for progress tracking
  • Guards against premature backup key removal
  • Supports all encryption types (Azure KMS, AWS KMS, IBM Cloud KMS, AESCBC)

Tracks: OCPSTRAT-2527, OCPSTRAT-2540
Related: ARO-21568, ARO-21456

Test plan

  • Unit tests for key fingerprint computation and controller reconciliation logic
  • Integration tests for StorageVersionMigration CR lifecycle
  • E2E tests for Azure KMS and AESCBC key rotation with re-encryption

🤖 Generated with Claude Code

@openshift-ci openshift-ci Bot requested review from csrwng and sjenning April 9, 2026 16:17
@muraee muraee force-pushed the hypershift-etcd-reencryption branch from ae1dfec to eabd02a Compare April 10, 2026 09:33
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md Outdated
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md Outdated
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md Outdated
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md
@muraee muraee force-pushed the hypershift-etcd-reencryption branch from eabd02a to 7b4c875 Compare April 10, 2026 14:39
Copy link
Copy Markdown
Member

@ardaguclu ardaguclu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I specifically focused on Why KubeStorageVersionMigrator Instead of MigrationController section. It looks good to me. I dropped a comment more about agreement instead of any objection.

Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md
muraee added a commit to muraee/hypershift that referenced this pull request Apr 13, 2026
Add a re-encryption controller in the HCCO that triggers
StorageVersionMigration after an encryption key rotation, ensuring
all existing etcd data is re-encrypted with the new active key.

Components:
- API: EtcdDataEncryptionUpToDate condition type and reasons
- CPO: key fingerprint computation and rekey-needed annotation
  on kas-secret-encryption-config secret
- HCCO: new reencryption controller using library-go's
  KubeStorageVersionMigrator to drive StorageVersionMigration CRs
- HyperShift Operator: condition bubble-up from HCP to HostedCluster

Ref: OCPSTRAT-2527, OCPSTRAT-2540
Enhancement: openshift/enhancements#1969

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
muraee added a commit to muraee/hypershift that referenced this pull request Apr 13, 2026
Add a re-encryption controller in the HCCO that triggers
StorageVersionMigration after an encryption key rotation, ensuring
all existing etcd data is re-encrypted with the new active key.

Components:
- API: EtcdDataEncryptionUpToDate condition type and reasons
- CPO: key fingerprint computation and rekey-needed annotation
  on kas-secret-encryption-config secret
- HCCO: new reencryption controller using library-go's
  KubeStorageVersionMigrator to drive StorageVersionMigration CRs
- HyperShift Operator: condition bubble-up from HCP to HostedCluster

Ref: OCPSTRAT-2527, OCPSTRAT-2540
Enhancement: openshift/enhancements#1969

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
muraee added a commit to muraee/hypershift that referenced this pull request Apr 13, 2026
Add a re-encryption controller in the HCCO that triggers
StorageVersionMigration after an encryption key rotation, ensuring
all existing etcd data is re-encrypted with the new active key.

Components:
- API: EtcdDataEncryptionUpToDate condition type and reasons
- CPO: key fingerprint computation and rekey-needed annotation
  on kas-secret-encryption-config secret
- HCCO: new reencryption controller using library-go's
  KubeStorageVersionMigrator to drive StorageVersionMigration CRs
- HyperShift Operator: condition bubble-up from HCP to HostedCluster

Ref: OCPSTRAT-2527, OCPSTRAT-2540
Enhancement: openshift/enhancements#1969

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
muraee added a commit to muraee/hypershift that referenced this pull request Apr 13, 2026
Add a re-encryption controller in the HCCO that triggers
StorageVersionMigration after an encryption key rotation, ensuring
all existing etcd data is re-encrypted with the new active key.

Components:
- API: EtcdDataEncryptionUpToDate condition type and reasons
- CPO: key fingerprint computation and rekey-needed annotation
  on kas-secret-encryption-config secret
- HCCO: new reencryption controller using library-go's
  KubeStorageVersionMigrator to drive StorageVersionMigration CRs
- HyperShift Operator: condition bubble-up from HCP to HostedCluster

Ref: OCPSTRAT-2527, OCPSTRAT-2540
Enhancement: openshift/enhancements#1969

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
muraee added a commit to muraee/hypershift that referenced this pull request Apr 14, 2026
Add a re-encryption controller in the HCCO that triggers
StorageVersionMigration after an encryption key rotation, ensuring
all existing etcd data is re-encrypted with the new active key.

Components:
- API: EtcdDataEncryptionUpToDate condition type and reasons
- CPO: key fingerprint computation and rekey-needed annotation
  on kas-secret-encryption-config secret
- HCCO: new reencryption controller using library-go's
  KubeStorageVersionMigrator to drive StorageVersionMigration CRs
- HyperShift Operator: condition bubble-up from HCP to HostedCluster

Ref: OCPSTRAT-2527, OCPSTRAT-2540
Enhancement: openshift/enhancements#1969

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@muraee muraee force-pushed the hypershift-etcd-reencryption branch from 7b4c875 to 82ddb23 Compare April 14, 2026 10:12
muraee added a commit to muraee/hypershift that referenced this pull request Apr 14, 2026
Add a re-encryption controller in the HCCO that triggers
StorageVersionMigration after an encryption key rotation, ensuring
all existing etcd data is re-encrypted with the new active key.

Components:
- API: EtcdDataEncryptionUpToDate condition type and reasons
- CPO: key fingerprint computation and rekey-needed annotation
  on kas-secret-encryption-config secret
- HCCO: new reencryption controller using library-go's
  KubeStorageVersionMigrator to drive StorageVersionMigration CRs
- HyperShift Operator: condition bubble-up from HCP to HostedCluster

Ref: OCPSTRAT-2527, OCPSTRAT-2540
Enhancement: openshift/enhancements#1969

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
muraee added a commit to muraee/hypershift that referenced this pull request Apr 14, 2026
Add a re-encryption controller in the HCCO that triggers
StorageVersionMigration after an encryption key rotation, ensuring
all existing etcd data is re-encrypted with the new active key.

Components:
- API: EtcdDataEncryptionUpToDate condition type and reasons
- CPO: key fingerprint computation and rekey-needed annotation
  on kas-secret-encryption-config secret
- HCCO: new reencryption controller using library-go's
  KubeStorageVersionMigrator to drive StorageVersionMigration CRs
- HyperShift Operator: condition bubble-up from HCP to HostedCluster

Ref: OCPSTRAT-2527, OCPSTRAT-2540
Enhancement: openshift/enhancements#1969

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
muraee added a commit to muraee/hypershift that referenced this pull request Apr 15, 2026
Add a re-encryption controller in the HCCO that triggers
StorageVersionMigration after an encryption key rotation, ensuring
all existing etcd data is re-encrypted with the new active key.

Components:
- API: EtcdDataEncryptionUpToDate condition type and reasons
- CPO: key fingerprint computation and rekey-needed annotation
  on kas-secret-encryption-config secret
- HCCO: new reencryption controller using library-go's
  KubeStorageVersionMigrator to drive StorageVersionMigration CRs
- HyperShift Operator: condition bubble-up from HCP to HostedCluster

Ref: OCPSTRAT-2527, OCPSTRAT-2540
Enhancement: openshift/enhancements#1969

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@muraee muraee changed the title Enhancement: etcd data re-encryption for key rotation in HyperShift OCPSTRAT-2527, OCPSTRAT-2540: Enhancement: etcd data re-encryption for key rotation in HyperShift Apr 16, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 16, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 16, 2026

@muraee: This pull request references OCPSTRAT-2527 which is a valid jira issue.

This pull request references OCPSTRAT-2540 which is a valid jira issue.

Details

In response to this:

Summary

  • Add enhancement proposal for etcd data re-encryption after encryption key rotation in HyperShift
  • Introduces a new HCCO controller that leverages KubeStorageVersionMigrator from library-go to create StorageVersionMigration CRs in the guest cluster, transparently re-encrypting all encrypted resources with the active key
  • Adds EtcdDataEncryptionUpToDate condition on HCP/HostedCluster for progress tracking
  • Guards against premature backup key removal
  • Supports all encryption types (Azure KMS, AWS KMS, IBM Cloud KMS, AESCBC)

Tracks: OCPSTRAT-2527, OCPSTRAT-2540
Related: ARO-21568, ARO-21456

Test plan

  • Unit tests for key fingerprint computation and controller reconciliation logic
  • Integration tests for StorageVersionMigration CR lifecycle
  • E2E tests for Azure KMS and AESCBC key rotation with re-encryption

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@csrwng
Copy link
Copy Markdown
Contributor

csrwng commented Apr 16, 2026

Just a couple of thoughts for kms in general:

  • we currently run the kube-storage-version-migrator in the data plane. I really see no good reason to do so, since we don't need workers to migrate storage, but having no workers would block migration if we leave it where it is.
  • (orthogonal but relevant) we're not consistently encrypting resources. Latest code seems to enable encryption for secrets if doing aescbc encryption, but secrets,configmaps,routes,oauthaccesstokens,oauthauthorizetokens when doing kms encryption. Furthermore, routes,oauthaccesstokens, and oauthauthorizetokens are not getting encrypted in any case because we would need the kms sidecar on the openshift apiservers that serve those resources.
  • we should fix the API. Currently, we only allow main/backup key of the same type (all aescbc, or all kms, etc). It should be possible to have multiple keys of different types given that it's what upstream kubernetes allows.
  • we're not doing key rotation properly. When we introduce a new main key, that key should first be added as a backup/read key to all instances of kube-apiserver. Once that has rolled out, then we can start making the new key the write key. Otherwise, when we introduce the new write key to the first instance of kube-apiserver, it could start encrypting with the new key, the other instances that have not been updated, will potentially start crashlooping because they cannot decode the secrets encoded with the new key.

@csrwng
Copy link
Copy Markdown
Contributor

csrwng commented Apr 16, 2026

Could we reflect in HostedCluster status which keys are actively being used and reject spec changes that could potentially result in data loss?

Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md Outdated
…ackupKey

- Deploy kube-storage-version-migrator in HCP namespace (control plane)
  instead of data plane, enabling re-encryption with zero worker nodes
- Disable data-plane operator via annotation removal in
  cluster-kube-storage-version-migrator-operator repo
- Add status.secretEncryption.activeKey field to HC/HCP with full key
  spec for rotation detection and EncryptionConfiguration resilience
- Deprecate backupKey spec fields in favor of status-based tracking
- Update workflow, architecture, risks, and support procedures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@muraee muraee force-pushed the hypershift-etcd-reencryption branch from 5c2acd4 to 4401ea1 Compare April 21, 2026 14:09
Document per-provider behavior for cloud-initiated key rotation:
Azure Key Vault requires explicit keyVersion (always detectable),
AWS KMS automatic rotation keeps same ARN (not detectable but safe),
IBM Cloud has explicit fields, AESCBC is user-managed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@csrwng
Copy link
Copy Markdown
Contributor

csrwng commented May 1, 2026

Thank you for the latest changes @muraee !

Some general comments:

  • We shouldn't rely on VAP for enforcing the behavior we want. I think we should build the feature assuming that there is no VAP and only add the VAP as a nice API UX improvement. To this end, we should have a way of ignoring a change to the spec while we're in the process of migrating storage. Or immediately accept the new spec, but keep the previously incomplete provider config so we can decrypt it and safely migrate to the new one.
  • We're not yet addressing the issue of breaking existing instances of kube apiserver while adding a new encryption provider. If we immediately start encrypting with whatever the user put in the spec, and roll that change out to the kube apiserver, only the first instance where the change is applied will be available, while the other 2 instances will potentially crash loop until their config changes. Rolling out a new key/provider should be done in 2 stages.
  • While we're deprecating the back up key configuration, we cannot ignore it completely. If you have a back up key in your spec, we need to include it in the KAS's configuration so we can decrypt with it.
  • For the status, it seems that what we need is a list of encryption configurations instead of just one active one. That would allow us to keep track of back up keys, incomplete migrations, etc. and also keep track of the state when rolling out a new write configuration.

Redesign the re-encryption workflow to use a safe two-stage KAS
rollout: Stage 1 adds the new key as a read-only provider (all
replicas can decrypt the new key before any writes with it),
Stage 2 promotes it to write provider. This prevents decryption
failures during rolling updates.

Introduce a rolloutPhase-driven state machine (ReadOnlyDeploy →
WritePromote → Migrating) in status.secretEncryption, with a
snapshotted targetKey that the controller uses for the duration
of a rotation. Mid-rotation spec changes are safely queued at
the controller level (VAP-independent), completing the current
rotation before starting a new one.

Reframe the VAP as a UX improvement (clear admission-time error)
rather than a safety requirement. Clarify that spec.backupKey
can only be honored as a fallback when status.activeKey is nil,
since during a rotation both KMS sidecar slots are occupied by
activeKey and targetKey.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@muraee
Copy link
Copy Markdown
Author

muraee commented May 4, 2026

Thanks for the thorough review @csrwng! I've updated the enhancement to address all four points. Here's a summary of the changes:

Point 1 (VAP-independent safety): The system is now designed to be safe without the VAP. When a rotation is in progress (rolloutPhase is not empty), the HCCO controller continues with the snapshotted targetKey and ignores subsequent spec changes. Once the current rotation completes, the controller detects the new spec key mismatch and starts a fresh rotation. The VAP is reframed as a UX improvement — it gives users a clear admission-time error ("wait for re-encryption to complete") rather than silently queuing the change.

Point 2 (Two-stage KAS rollout): The workflow now uses a three-phase state machine driven by status.secretEncryption.rolloutPhase:

  • Stage 1 (ReadOnlyDeploy): The new key is added as a read-only provider. The old key remains the write provider. During this rolling update, no pod writes with the new key, so all reads succeed across all replicas.
  • Stage 2 (WritePromote): The new key is promoted to write provider, old key becomes read-only. Pods still on the previous config already have the new key as a read-only provider (from Stage 1), so they can decrypt data written by new-config pods.
  • Stage 3 (Migrating): All KAS replicas have converged — StorageVersionMigration CRs are created to re-encrypt existing data.

The HCCO waits for full KAS convergence (updatedReplicas == replicas == readyReplicas) between each phase transition.

Point 3 (backupKey honored): spec.backupKey is used as a fallback when status.secretEncryption.activeKey is not yet set (upgrade transition — status not yet populated). However, once the status is initialized, backupKey cannot be honored alongside the status-driven keys because during a rotation both KMS sidecar slots are occupied by status.activeKey and status.targetKey — there is no third sidecar to service a spec.backupKey. For AESCBC (keys are inline, no sidecar needed) this constraint doesn't strictly apply, but for consistency and since backupKey is deprecated, we treat it uniformly as a transition-period fallback.

Point 4 (list vs single key status): Instead of a full list, the status now uses two named fields — activeKey (confirmed encrypted) and targetKey (being rolled out) — plus a rolloutPhase enum. This maps directly to the two-sidecar hardware constraint for KMS providers while providing the state tracking needed for the rollout and mid-rotation safety. The rolloutPhase field drives the CPO's EncryptionConfiguration generation via a simple lookup table.

During ReadOnlyDeploy (no data encrypted with target key yet),
spec changes update targetKey in-place and restart the phase,
allowing immediate correction of wrong-key mistakes. During
WritePromote or Migrating, spec changes are queued until the
current rotation completes (3 simultaneous keys would violate
the two-sidecar constraint).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@csrwng csrwng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of the etcd data re-encryption enhancement. Please see inline comments for detailed feedback.

Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md Outdated
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md Outdated
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md Outdated
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md
This enhancement adds etcd data re-encryption support to HyperShift
by reusing library-go's `KubeStorageVersionMigrator` struct (which
creates and monitors `StorageVersionMigration` CRs via the
`migration.k8s.io/v1alpha1` API) within a new HCCO controller,
Copy link
Copy Markdown
Member

@enxebre enxebre May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean storagemigration.k8s.io/v1beta1? https://kubernetes.io/docs/tasks/manage-kubernetes-objects/storage-version-migration/
we wouldn't want to rely on an alpha API for this

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enxebre These are actually two different API groups:

  • migration.k8s.io/v1alpha1 — out-of-tree CRD from kube-storage-version-migrator, deployed by OpenShift's cluster-kube-storage-version-migrator-operator. This is what library-go's KubeStorageVersionMigrator uses (still current — no PRs to migrate it).
  • storagemigration.k8s.io/v1beta1 — built-in in-tree API (KEP-4192), promoted to beta in K8s 1.35 = OCP 4.22. Requires the StorageVersionMigrator feature gate.

We're using migration.k8s.io/v1alpha1 because we vendor library-go's KubeStorageVersionMigrator, which handles CR creation, stale detection, annotation tracking, status monitoring, and pruning (~130 lines of production-tested code).

Switching to storagemigration.k8s.io/v1beta1 would mean either waiting for library-go to migrate (no signs of that happening) or reimplementing KubeStorageVersionMigrator from scratch against the new API. Is that something we want to take on, or is using the existing library-go implementation with migration.k8s.io/v1alpha1 acceptable for now?

Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md Outdated
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md Outdated
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md Outdated
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md Outdated
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md Outdated
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md Outdated
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md Outdated
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md
Comment thread enhancements/hypershift/etcd-data-reencryption-on-key-rotation.md
muraee and others added 11 commits May 6, 2026 13:12
Move key change detection and two-stage KAS rollout (ReadOnlyDeploy,
WritePromote) to the CPO main reconciler, which naturally owns KAS
Deployment lifecycle. The HCCO re-encryption controller now only
handles the Migrating phase — creating StorageVersionMigration CRs
in the guest cluster and monitoring completion.

The CPO main reconciler drives the state machine and writes
status.secretEncryption fields via Status().Patch(). The adapt
function reads rolloutPhase to generate the appropriate
EncryptionConfiguration. The handoff to HCCO occurs when the CPO
sets rolloutPhase=Migrating after Stage 2 convergence.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
KMS sidecars are only configured on KAS today — routes,
oauthaccesstokens, and oauthauthorizetokens are listed in
KMSEncryptedObjects() but aren't actually encrypted because the
OpenShift API servers lack KMS sidecars. AESCBC only encrypting
secrets (not configmaps) is also a known gap.

The re-encryption controller creates SVMs for whatever
KMSEncryptedObjects() returns, so it will automatically cover
new resources when the upstream encryption scope is expanded.
Expanding the encryption scope and multi-API-server convergence
tracking are explicitly listed as non-goals.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove top-level RolloutPhase field — the current rotation phase
is now history[0].state, following the ControlPlaneVersionStatus
pattern where history[0] represents the current operation.

History entries are created when a rotation starts (state=
ReadOnlyDeploy), updated through phases (WritePromote, Migrating),
and finalized (Completed or Interrupted). The CPO's adapt function
reads history[0].state to generate the EncryptionConfiguration.

API conventions applied: omitzero, +listType=atomic,
+kubebuilder:validation markers, lowercase godoc, +k8s:deepcopy-gen.
EncryptionMigrationState now covers both phases and outcomes:
ReadOnlyDeploy, WritePromote, Migrating, Completed, Interrupted.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add CEL union discriminator validation to SecretEncryptionKeyStatus
(provider field gates which sub-struct is set). Apply lowercase
godoc, +required/+optional markers, omitzero, and +k8s:deepcopy-gen
to all status types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix Non-Goal 4: backupKey is deprecated but still functional as a
fallback when status.activeKey is not yet initialized.

Add explicit upgrade strategy for existing clusters:
- No backupKey set (never rotated): initialize status.activeKey
  directly without re-encryption
- backupKey set (rotation was in progress): trigger full
  re-encryption to ensure all data uses the active key, then
  transition to the status-driven mechanism

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Goal 6 (metrics for alerting) and Goal 7 (rotation history).
Add Prometheus metrics section with three metrics emitted by the
HCCO re-encryption controller: migration state gauge, duration
histogram, and failure counter. Include suggested alert rules for
stuck migrations and persistent failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add SVM CR naming and conflict handling note (Group 11)
- Add EncryptionConfiguration YAML examples per phase (Group 13)
- Add CVO garbage-collection note for data-plane operator
  cleanup on upgrade (Group 15)
- Add forward compatibility notes for cross-type migration and
  etcd sharding (Groups 16, 17)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add +union on the struct, +unionDiscriminator on the provider field,
and +unionMember on each variant field. Update CEL XValidation rules
to match the HyperShift convention (required when matching provider,
forbidden otherwise).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move back to a single HCCO controller for the entire re-encryption
lifecycle. The phase is now derived from observable state (spec vs
status keys, EncryptionConfiguration contents, KAS convergence,
SVM completion) on each reconciliation — history[0].state is a
recorded reflection, not the source of truth.

The CPO's adaptSecretEncryptionConfig() independently derives the
correct EncryptionConfiguration from the same observable state,
implementing the two-stage rollout without needing a stored phase.

This eliminates the CPO main reconciler role added in the previous
commit, avoiding split-state conflicts between CPO and HCCO writing
to the same status fields.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CVO does not garbage-collect resources when manifests are removed
from a profile. The data-plane operator Deployment is added to
ResourcesToRemove() and the HCCO explicitly deletes it from the
hosted cluster on upgrade.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
IBM Cloud does not want additional control-plane deployments.
The control-plane migrator is only deployed on non-IBM platforms
via a CPO component platform predicate. On IBM Cloud, the
data-plane operator continues to run.

For non-IBM platforms, the data-plane operator Deployment is
added to resourcesToRemove() which generates a cleanup manifest
with release.openshift.io/delete annotation — the CVO in the
hosted cluster processes it to delete the data-plane operator.

Removes the separate cluster-kube-storage-version-migrator-operator
repo change — all changes are now in the HyperShift repo.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@csrwng
Copy link
Copy Markdown
Contributor

csrwng commented May 7, 2026

/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 7, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: csrwng

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 7, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 7, 2026

@muraee: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants