Skip to content

feat: add etcd data re-encryption after encryption key rotation#8219

Open
muraee wants to merge 2 commits intoopenshift:mainfrom
muraee:etcd-reencryption-on-key-rotation
Open

feat: add etcd data re-encryption after encryption key rotation#8219
muraee wants to merge 2 commits intoopenshift:mainfrom
muraee:etcd-reencryption-on-key-rotation

Conversation

@muraee
Copy link
Copy Markdown
Contributor

@muraee muraee commented Apr 13, 2026

Summary

  • Add a re-encryption controller in the HCCO that detects encryption key rotations and triggers StorageVersionMigration CRs to re-encrypt all existing etcd data with the new active key
  • Track progress through a new EtcdDataEncryptionUpToDate condition on HCP/HostedCluster
  • Vendor library-go's KubeStorageVersionMigrator and kube-storage-version-migrator informer/lister packages

Details

Problem

HyperShift supports encryption key rotation (setting a new active key + backup key), but has no mechanism to re-encrypt existing etcd data with the new key. Old data remains encrypted with the previous key indefinitely, violating compliance requirements (OCPSTRAT-2527, OCPSTRAT-2540).

Solution

Key change detection is performed entirely within the HCCO re-encryption controller — no CPO changes needed. The controller computes a SHA-256 fingerprint of the active key identity from the HCP spec on each reconcile and compares it against a hypershift.openshift.io/encryption-migrated-key-hash annotation on the kas-secret-encryption-config secret. When the hashes differ, re-encryption is triggered.

API (api/hypershift/v1beta1/hostedcluster_conditions.go):

  • EtcdDataEncryptionUpToDate condition type
  • Reasons: ReEncryptionInProgress, ReEncryptionCompleted, ReEncryptionFailed, WaitingForKASConvergence

HCCO (control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/):

  • New reencryption controller using library-go's KubeStorageVersionMigrator
  • computeActiveKeyFingerprint() — SHA-256 hash of active key identity (supports AWS KMS, Azure KMS, IBM Cloud KMS, AESCBC)
  • On first deployment, initializes migrated hash to active hash (assumes data is already encrypted with current key)
  • Watches HCP, encryption config secret, and KAS deployment on management cluster
  • Waits for KAS deployment convergence before starting migrations
  • Creates StorageVersionMigration CRs in the guest cluster for all encrypted resources (secrets, configmaps, routes, oauth tokens)
  • Retries failed migrations after 5-minute interval via PruneMigration
  • Sets EtcdDataEncryptionUpToDate condition on HCP with detailed progress messages

HyperShift Operator (hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go):

  • Bubbles up EtcdDataEncryptionUpToDate condition from HCP to HostedCluster (only when encryption is configured and condition exists)

Vendoring

  • library-go/pkg/operator/encryption/controllers/migrators/KubeStorageVersionMigrator, Migrator interface
  • kube-storage-version-migrator/pkg/clients/informer/ and lister/ — SVM informer/lister

Ref: OCPSTRAT-2527, OCPSTRAT-2540
Enhancement: openshift/enhancements#1969

Test plan

  • Unit tests for computeActiveKeyFingerprint() — all provider types, determinism, different keys produce different hashes, unsupported type returns error
  • Unit tests for re-encryption controller reconcile — no encryption, first deploy init, hashes match, KAS not converged, migrations in progress, all succeeded, failed with retry/prune, cache not synced
  • go build ./... passes
  • go test -race passes for all modified packages
  • E2E: AWS KMS key rotation with re-encryption verification
  • E2E: Azure KMS key rotation with re-encryption verification
  • E2E: AESCBC key rotation with re-encryption verification

🤖 Generated with Claude Code

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 13, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This pull request introduces a new reencryption controller for managing etcd encryption key rotations in hosted control planes. It adds a new condition type EtcdDataEncryptionUpToDate with associated reason constants to track re-encryption status. The controller integrates the kube-storage-version-migrator to orchestrate data re-encryption when encryption keys rotate, monitors KAS deployment readiness, tracks migration progress, and updates the HostedCluster status with re-encryption state. The changes span API definitions, controller registration, the core reconciler logic, comprehensive test coverage, and status synchronization between HCP and HostedCluster objects.

Sequence Diagram(s)

sequenceDiagram
    participant HC as HostedCluster
    participant HCP as HostedControlPlane
    participant Rec as Reencryption<br/>Reconciler
    participant Secret as Encryption<br/>Config Secret
    participant KAS as KAS<br/>Deployment
    participant SVM as StorageVersion<br/>Migrator

    HC->>Rec: Trigger (HCP/Secret/KAS event)
    Rec->>HCP: Fetch HCP spec
    
    alt No SecretEncryption configured
        Rec->>HCP: Remove EtcdDataEncryptionUpToDate condition
        Rec->>Rec: Exit
    else SecretEncryption configured
        Rec->>Rec: Compute active key fingerprint
        
        alt Fingerprint cannot be computed
            Rec->>Rec: Exit without changes
        else Fingerprint computed
            Rec->>Secret: Fetch encryption config Secret
            
            alt Secret missing
                Rec->>Rec: Requeue
            else Secret exists
                Rec->>Secret: Read EncryptionMigratedKeyHashAnnotation
                
                alt Annotation absent
                    Rec->>Secret: Set annotation to active fingerprint
                    Rec->>Rec: Exit
                else Annotation matches active fingerprint
                    Rec->>Rec: Exit without changes
                else Annotation differs (key rotation detected)
                    Rec->>KAS: Check Deployment ready
                    
                    alt KAS not ready
                        Rec->>HCP: Set EtcdDataEncryptionUpToDate=False<br/>(ReEncryptionWaitingForKAS)
                        Rec->>Rec: Requeue
                    else KAS ready
                        Rec->>SVM: Check migrator synced
                        
                        alt Migrator not synced
                            Rec->>Rec: Requeue
                        else Migrator synced
                            Rec->>SVM: EnsureMigration for each<br/>encrypted resource
                            SVM->>SVM: Perform re-encryption
                            SVM-->>Rec: Return migration status
                            
                            alt Migrations failed
                                Rec->>SVM: PruneMigration (for failed resources)
                                Rec->>HCP: Set EtcdDataEncryptionUpToDate=False<br/>(ReEncryptionFailedReason)
                                Rec->>Rec: Requeue
                            else Migrations in progress
                                Rec->>HCP: Set EtcdDataEncryptionUpToDate=False<br/>(ReEncryptionInProgressReason)
                                Rec->>Rec: Requeue
                            else All migrations succeeded
                                Rec->>HCP: Set EtcdDataEncryptionUpToDate=True<br/>(ReEncryptionCompletedReason)
                                Rec->>Secret: Update annotation to active fingerprint
                            end
                        end
                    end
                end
            end
        end
    end
Loading
🚥 Pre-merge checks | ✅ 9 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (9 passed)
Check name Status Explanation
Stable And Deterministic Test Names ✅ Passed All test names in reencryption_test.go are stable and deterministic with no dynamic information.
Test Structure And Quality ✅ Passed The re-encryption tests are implemented as standard Go table-driven Test* functions with Gomega assertions rather than Ginkgo It blocks. Since no Ginkgo constructs are used, the specified Ginkgo-style quality criteria do not apply to this file.
Microshift Test Compatibility ✅ Passed The PR adds only standard Go unit tests without any Ginkgo e2e tests, so the MicroShift compatibility check is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR adds only standard Go unit tests with fake clients and mocks, not Ginkgo e2e tests. Unit tests are topology-agnostic and have no SNO compatibility concerns.
Topology-Aware Scheduling Compatibility ✅ Passed PR introduces re-encryption controller with no deployment manifests, scheduling constraints, node affinity, or topology-dependent assumptions incompatible with non-HA topologies.
Ote Binary Stdout Contract ✅ Passed The PR does not introduce any process-level stdout writes that would violate the OTE Binary Stdout Contract. Logging is properly configured via ctrl.SetLogger(zap.New(...)) in cmd.go's Run() function, which writes to stderr by default.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests are being added; only standard Go unit tests with mocked dependencies are included.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main feature: automated etcd data re-encryption triggered after encryption key rotation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added area/api Indicates the PR includes changes for the API area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release and removed do-not-merge/needs-area labels Apr 13, 2026
@openshift-ci openshift-ci Bot requested review from cblecker and enxebre April 13, 2026 13:43
…ion-migrator

Vendor the KubeStorageVersionMigrator from library-go and the
generated informer/lister packages from kube-storage-version-migrator.
These are needed by the new HCCO re-encryption controller.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@muraee muraee force-pushed the etcd-reencryption-on-key-rotation branch from d5e5097 to 0d54a40 Compare April 13, 2026 13:46
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go`:
- Around line 98-111: The code is writing the active key fingerprint and
unconditionally setting rekey-needed when the annotation was previously absent;
change the logic in the block that uses computeActiveKeyFingerprint,
EncryptionActiveKeyHashAnnotation and EncryptionRekeyNeededAnnotation so that
you always persist the fingerprint into secret.Annotations when fingerprint !=
"", but only set secret.Annotations[EncryptionRekeyNeededAnnotation] = "true" if
an existingHash was present (existingHash != "") and different from the new
fingerprint; do not mark rekey-needed when writing the hash for the first time.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`:
- Around line 245-252: Reorder the finalization so the HCP status is updated to
show re-encryption completed before removing the rekey-needed annotation: call
setReEncryptionCondition(hcp, metav1.ConditionTrue,
hyperv1.ReEncryptionCompletedReason, ...) and persist that status update (as
done in the parent Reconcile flow) first, and only after the status update
succeeds call r.removeRekeyAnnotation(ctx, encryptionSecret); this ensures that
if the status update fails the reconcile will retry (via Reconcile's rekeyNeeded
logic) and that a later annotation-removal failure won't leave the condition
stuck in InProgress.

In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 897-904: The current logic only sets EtcdDataEncryptionUpToDate on
hcluster when the HCP condition exists, which leaves a stale condition when
encryption is disabled, hcp is nil, or the HCP removes the condition; update the
block around hcp and hcluster.Spec.SecretEncryption to handle the
missing/disabled case by removing the bubbled condition instead of doing
nothing: check if hcp is nil or hcluster.Spec.SecretEncryption is nil or
meta.FindStatusCondition(hcp.Status.Conditions,
string(hyperv1.EtcdDataEncryptionUpToDate)) returns nil, and in those cases call
meta.RemoveStatusCondition(&hcluster.Status.Conditions,
string(hyperv1.EtcdDataEncryptionUpToDate)); otherwise (when cond exists)
continue to set cond.ObservedGeneration = hcluster.Generation and
meta.SetStatusCondition as currently done.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 3f62b72e-be23-40c7-8620-e6a59a833dde

📥 Commits

Reviewing files that changed from the base of the PR and between 72647a4 and d5e5097.

⛔ Files ignored due to path filters (18)
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/errors.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/inprocess.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/inprocess_processor.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/kubestorageversionmigrator.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/metrics.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/modules.txt is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/factory.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/generic.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/internalinterfaces/factory_interfaces.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/interface.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/v1alpha1/interface.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/v1alpha1/storagestate.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/v1alpha1/storageversionmigration.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/lister/migration/v1alpha1/expansion_generated.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/lister/migration/v1alpha1/storagestate.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/lister/migration/v1alpha1/storageversionmigration.go is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (8)
  • api/hypershift/v1beta1/hostedcluster_conditions.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption_test.go
  • control-plane-operator/hostedclusterconfigoperator/api/scheme.go
  • control-plane-operator/hostedclusterconfigoperator/cmd.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption_test.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go

Comment thread control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go (1)

103-111: ⚠️ Potential issue | 🟠 Major

Don't mark re-encryption needed on the first hash write.

When EncryptionActiveKeyHashAnnotation is absent (empty string), the condition existingHash != fingerprint evaluates to true, which sets encryption-rekey-needed=true. This will spuriously trigger re-encryption migrations for new clusters and for existing clusters upgrading to this version.

The hash should always be persisted, but rekey-needed should only be set when a previous hash existed and differs from the new one.

Suggested fix
 	if fingerprint != "" {
 		if secret.Annotations == nil {
 			secret.Annotations = map[string]string{}
 		}
 		existingHash := secret.Annotations[EncryptionActiveKeyHashAnnotation]
-		if existingHash != fingerprint {
-			secret.Annotations[EncryptionActiveKeyHashAnnotation] = fingerprint
-			secret.Annotations[EncryptionRekeyNeededAnnotation] = "true"
+		secret.Annotations[EncryptionActiveKeyHashAnnotation] = fingerprint
+		if existingHash != "" && existingHash != fingerprint {
+			secret.Annotations[EncryptionRekeyNeededAnnotation] = "true"
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go`
around lines 103 - 111, The current logic in secrtencryption.go sets
EncryptionRekeyNeededAnnotation when existingHash != fingerprint even if there
was no previous hash, causing spurious re-encryption; update the block around
secret.Annotations and EncryptionActiveKeyHashAnnotation so you always persist
the new fingerprint but only set EncryptionRekeyNeededAnnotation="true" if an
existing non-empty hash existed and is different (i.e., existingHash != "" &&
existingHash != fingerprint). Keep the nil-check for secret.Annotations and
ensure the new fingerprint is written to EncryptionActiveKeyHashAnnotation in
all cases.
control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go (1)

245-252: ⚠️ Potential issue | 🟡 Minor

Consider ordering: update status before removing annotation.

If the annotation removal at line 247 succeeds but the subsequent HCP status update in the parent Reconcile (line 138) fails, the next reconcile will find rekeyNeeded != "true" (line 167-168) and return early. This leaves the condition potentially stuck in its previous state (e.g., InProgress) rather than Completed.

To ensure consistency, consider splitting this into two reconcile cycles or restructuring to ensure status is persisted before annotation removal is committed.

Alternative approach: set condition before annotation removal
 	// All migrations succeeded — remove the rekey-needed annotation and set condition True.
 	log.Info("All re-encryption migrations completed successfully")
+	setReEncryptionCondition(hcp, metav1.ConditionTrue, hyperv1.ReEncryptionCompletedReason,
+		"All etcd data has been re-encrypted with the active encryption key")
+
+	// Note: If annotation removal fails after status update succeeds in parent Reconcile,
+	// the next reconcile will re-process and retry annotation removal (migrations already done).
 	if err := r.removeRekeyAnnotation(ctx, encryptionSecret); err != nil {
 		return reconcile.Result{}, fmt.Errorf("failed to remove rekey-needed annotation: %w", err)
 	}
-	setReEncryptionCondition(hcp, metav1.ConditionTrue, hyperv1.ReEncryptionCompletedReason,
-		"All etcd data has been re-encrypted with the active encryption key")
 	return reconcile.Result{}, nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`
around lines 245 - 252, Reorder the success path so the HCP status is updated to
ReEncryptionCompleted before removing the rekey-needed annotation: call
setReEncryptionCondition(hcp, metav1.ConditionTrue,
hyperv1.ReEncryptionCompletedReason, ...) and ensure the status update is
persisted (via the same status update code used in Reconcile) prior to calling
removeRekeyAnnotation(ctx, encryptionSecret); if persisting status can fail,
return the error and only remove the annotation after status update succeeds,
ensuring the Reconcile loop sees the completed condition even if annotation
removal later fails; references: setReEncryptionCondition,
removeRekeyAnnotation, Reconcile, rekeyNeeded.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go`:
- Around line 103-111: The current logic in secrtencryption.go sets
EncryptionRekeyNeededAnnotation when existingHash != fingerprint even if there
was no previous hash, causing spurious re-encryption; update the block around
secret.Annotations and EncryptionActiveKeyHashAnnotation so you always persist
the new fingerprint but only set EncryptionRekeyNeededAnnotation="true" if an
existing non-empty hash existed and is different (i.e., existingHash != "" &&
existingHash != fingerprint). Keep the nil-check for secret.Annotations and
ensure the new fingerprint is written to EncryptionActiveKeyHashAnnotation in
all cases.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`:
- Around line 245-252: Reorder the success path so the HCP status is updated to
ReEncryptionCompleted before removing the rekey-needed annotation: call
setReEncryptionCondition(hcp, metav1.ConditionTrue,
hyperv1.ReEncryptionCompletedReason, ...) and ensure the status update is
persisted (via the same status update code used in Reconcile) prior to calling
removeRekeyAnnotation(ctx, encryptionSecret); if persisting status can fail,
return the error and only remove the annotation after status update succeeds,
ensuring the Reconcile loop sees the completed condition even if annotation
removal later fails; references: setReEncryptionCondition,
removeRekeyAnnotation, Reconcile, rekeyNeeded.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 969834bc-33b1-4893-9f05-17b72afb37c2

📥 Commits

Reviewing files that changed from the base of the PR and between d5e5097 and 0d54a40.

⛔ Files ignored due to path filters (18)
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/errors.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/inprocess.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/inprocess_processor.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/kubestorageversionmigrator.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/metrics.go is excluded by !vendor/**, !**/vendor/**
  • vendor/github.com/openshift/library-go/pkg/operator/encryption/controllers/migrators/types.go is excluded by !vendor/**, !**/vendor/**
  • vendor/modules.txt is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/factory.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/generic.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/internalinterfaces/factory_interfaces.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/interface.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/v1alpha1/interface.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/v1alpha1/storagestate.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/informer/migration/v1alpha1/storageversionmigration.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/lister/migration/v1alpha1/expansion_generated.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/lister/migration/v1alpha1/storagestate.go is excluded by !vendor/**, !**/vendor/**
  • vendor/sigs.k8s.io/kube-storage-version-migrator/pkg/clients/lister/migration/v1alpha1/storageversionmigration.go is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (8)
  • api/hypershift/v1beta1/hostedcluster_conditions.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption_test.go
  • control-plane-operator/hostedclusterconfigoperator/api/scheme.go
  • control-plane-operator/hostedclusterconfigoperator/cmd.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption_test.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
✅ Files skipped from review due to trivial changes (1)
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • control-plane-operator/hostedclusterconfigoperator/cmd.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption_test.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • api/hypershift/v1beta1/hostedcluster_conditions.go

@muraee muraee force-pushed the etcd-reencryption-on-key-rotation branch from 0d54a40 to e29501b Compare April 13, 2026 16:35
@openshift-ci openshift-ci Bot added the area/documentation Indicates the PR includes changes for documentation label Apr 13, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go (1)

245-251: ⚠️ Potential issue | 🟠 Major

Persist ReEncryptionCompleted before clearing encryption-rekey-needed.

removeRekeyAnnotation() runs before the outer Reconcile() call writes hcp.Status. If the secret update succeeds but that later status update fails, the next pass sees rekey-needed absent and exits early, leaving EtcdDataEncryptionUpToDate stuck at its previous value. Move annotation removal to a second phase after the completed condition has been successfully persisted.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`
around lines 245 - 251, The code currently removes the encryption-rekey-needed
annotation (removeRekeyAnnotation) before the controller persists the updated
HCP status (setReEncryptionCondition), which can cause the requeue logic to skip
remaining work if the status update later fails; change the order so you first
set and persist the ReEncryptionCompleted condition on the HostedControlPlane
(call setReEncryptionCondition and ensure the status update is saved via the
status writer used in Reconcile), verify that the status update succeeded, and
only then call removeRekeyAnnotation to delete the encryption-rekey-needed
annotation; use the existing functions removeRekeyAnnotation,
setReEncryptionCondition and the same status update flow in Reconcile to
implement this two-phase commit.
hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go (1)

897-904: ⚠️ Potential issue | 🟠 Major

Remove the bubbled condition when HCP no longer reports it.

This only copies EtcdDataEncryptionUpToDate when the HCP currently has it. If encryption is disabled, the HCP is gone, or the reencryption controller removes the HCP condition, the old HostedCluster value sticks around and becomes stale.

Suggested fix
-	// Bubble up the EtcdDataEncryptionUpToDate condition from HCP when encryption is configured.
-	if hcp != nil && hcluster.Spec.SecretEncryption != nil {
-		cond := meta.FindStatusCondition(hcp.Status.Conditions, string(hyperv1.EtcdDataEncryptionUpToDate))
-		if cond != nil {
-			cond.ObservedGeneration = hcluster.Generation
-			meta.SetStatusCondition(&hcluster.Status.Conditions, *cond)
-		}
-	}
+	// Bubble up the EtcdDataEncryptionUpToDate condition from HCP when encryption is configured.
+	if hcluster.Spec.SecretEncryption == nil || hcp == nil {
+		meta.RemoveStatusCondition(&hcluster.Status.Conditions, string(hyperv1.EtcdDataEncryptionUpToDate))
+	} else if cond := meta.FindStatusCondition(hcp.Status.Conditions, string(hyperv1.EtcdDataEncryptionUpToDate)); cond != nil {
+		cond.ObservedGeneration = hcluster.Generation
+		meta.SetStatusCondition(&hcluster.Status.Conditions, *cond)
+	} else {
+		meta.RemoveStatusCondition(&hcluster.Status.Conditions, string(hyperv1.EtcdDataEncryptionUpToDate))
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`
around lines 897 - 904, The current bubble-up only sets
EtcdDataEncryptionUpToDate when HCP has the condition, but never removes it when
HCP no longer reports it; update the block in hostedcluster_controller.go that
references hcp, hcluster, meta.FindStatusCondition and meta.SetStatusCondition
to also handle the nil case: if hcp is nil, hcluster.Spec.SecretEncryption is
nil, or meta.FindStatusCondition(...) returns nil, call
meta.RemoveStatusCondition(&hcluster.Status.Conditions,
string(hyperv1.EtcdDataEncryptionUpToDate)) to delete the stale condition;
otherwise keep the existing behavior of updating ObservedGeneration and calling
meta.SetStatusCondition.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`:
- Around line 127-130: The Get calls that fetch the HostedControlPlane (hcp :=
resourcemanifests.HostedControlPlane(...) followed by r.cpClient.Get(...)) and
the similar Gets for the encryption config Secret and KAS Deployment should
treat apierrors.IsNotFound(err) as a transient condition instead of returning a
hard error; update the error handling in the r.cpClient.Get(...) blocks
(including the HCP fetch and the other Get calls around 158-176) to check
apierrors.IsNotFound(err) and return a non-error reconcile result (e.g.,
reconcile.Result{} or a short requeue with nil error) so the controller
idles/publishes waiting state instead of entering error backoff.
- Around line 165-166: The code uses
encryptionSecret.Annotations[kas.EncryptionActiveKeyHashAnnotation] (and the
same pattern at the other occurrence around lines ~200–201) as the writeKey for
EnsureMigration which is a deterministic key fingerprint and can collide when
rotating back to a previously used key; change this to a monotonic rotation
token annotation (e.g., kas.EncryptionRotationTokenAnnotation) that is
incremented (or replaced with a monotonic nonce) each time a rotation is
initiated and persisted on encryptionSecret, and update the code that reads the
annotation (the places referencing
encryptionSecret.Annotations[kas.EncryptionActiveKeyHashAnnotation]) to read the
rotation token instead and pass that token as the writeKey into EnsureMigration
so each rotation event is unique even if the underlying key value repeats.

---

Duplicate comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`:
- Around line 245-251: The code currently removes the encryption-rekey-needed
annotation (removeRekeyAnnotation) before the controller persists the updated
HCP status (setReEncryptionCondition), which can cause the requeue logic to skip
remaining work if the status update later fails; change the order so you first
set and persist the ReEncryptionCompleted condition on the HostedControlPlane
(call setReEncryptionCondition and ensure the status update is saved via the
status writer used in Reconcile), verify that the status update succeeded, and
only then call removeRekeyAnnotation to delete the encryption-rekey-needed
annotation; use the existing functions removeRekeyAnnotation,
setReEncryptionCondition and the same status update flow in Reconcile to
implement this two-phase commit.

In `@hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go`:
- Around line 897-904: The current bubble-up only sets
EtcdDataEncryptionUpToDate when HCP has the condition, but never removes it when
HCP no longer reports it; update the block in hostedcluster_controller.go that
references hcp, hcluster, meta.FindStatusCondition and meta.SetStatusCondition
to also handle the nil case: if hcp is nil, hcluster.Spec.SecretEncryption is
nil, or meta.FindStatusCondition(...) returns nil, call
meta.RemoveStatusCondition(&hcluster.Status.Conditions,
string(hyperv1.EtcdDataEncryptionUpToDate)) to delete the stale condition;
otherwise keep the existing behavior of updating ObservedGeneration and calling
meta.SetStatusCondition.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: fe17a0ce-bc5e-4d0c-87ad-e9027725f712

📥 Commits

Reviewing files that changed from the base of the PR and between 0d54a40 and e29501b.

⛔ Files ignored due to path filters (4)
  • control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/AROSwift/zz_fixture_TestControlPlaneComponents_kas_secret_encryption_config_secret.yaml is excluded by !**/testdata/**
  • docs/content/reference/aggregated-docs.md is excluded by !docs/content/reference/aggregated-docs.md
  • docs/content/reference/api.md is excluded by !docs/content/reference/api.md
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.go is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (9)
  • api/hypershift/v1beta1/hostedcluster_conditions.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption_test.go
  • control-plane-operator/hostedclusterconfigoperator/api/scheme.go
  • control-plane-operator/hostedclusterconfigoperator/cmd.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption_test.go
  • go.mod
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
✅ Files skipped from review due to trivial changes (1)
  • control-plane-operator/hostedclusterconfigoperator/api/scheme.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • control-plane-operator/hostedclusterconfigoperator/cmd.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption_test.go
  • api/hypershift/v1beta1/hostedcluster_conditions.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 54.87365% with 125 lines in your changes missing coverage. Please review.
✅ Project coverage is 34.71%. Comparing base (4d904c8) to head (851c998).
⚠️ Report is 222 commits behind head on main.

Files with missing lines Patch % Lines
...goperator/controllers/reencryption/reencryption.go 55.80% 100 Missing and 18 partials ⚠️
...trollers/hostedcluster/hostedcluster_controller.go 25.00% 5 Missing and 1 partial ⚠️
...operator/hostedclusterconfigoperator/api/scheme.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8219      +/-   ##
==========================================
+ Coverage   34.63%   34.71%   +0.07%     
==========================================
  Files         767      768       +1     
  Lines       93186    93540     +354     
==========================================
+ Hits        32277    32470     +193     
- Misses      58236    58372     +136     
- Partials     2673     2698      +25     
Files with missing lines Coverage Δ
...lers/hostedcontrolplane/v2/kas/secretencryption.go 28.73% <100.00%> (+0.82%) ⬆️
...-plane-operator/hostedclusterconfigoperator/cmd.go 0.00% <ø> (ø)
...operator/hostedclusterconfigoperator/api/scheme.go 0.00% <0.00%> (ø)
...trollers/hostedcluster/hostedcluster_controller.go 43.23% <25.00%> (-0.04%) ⬇️
...goperator/controllers/reencryption/reencryption.go 55.80% <55.80%> (ø)

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@muraee muraee force-pushed the etcd-reencryption-on-key-rotation branch from e29501b to e1b396f Compare April 13, 2026 19:07
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go (2)

254-263: ⚠️ Potential issue | 🟡 Minor

Completed status can still be lost on the failure path.

Line 261 persists annotation removal before the caller persists the updated HCP status. If removeRekeyAnnotation() succeeds and the later Status().Update() fails, the next reconcile hits rekeyNeeded != "true" and exits early, leaving EtcdDataEncryptionUpToDate stale instead of ReEncryptionCompleted.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`
around lines 254 - 263, The code removes the rekey-needed annotation before the
updated HCP status is persisted, risking loss of the Completed condition if the
subsequent status update fails; change the order so you persist the updated HCP
status with the ReEncryptionCompleted condition (call the HCP status update via
r.Client.Status().Update or the existing status update helper) first, check that
it succeeded, and only then call removeRekeyAnnotation(encryptionSecret);
reference setReEncryptionCondition, the HCP Status().Update call, and
removeRekeyAnnotation to locate and reorder the operations.

169-171: ⚠️ Potential issue | 🟠 Major

Use a rotation token here, not the active-key fingerprint.

activeKeyHash identifies the current key, not the rotation event. If a cluster rotates A → B → A, this passes the same writeKey back into EnsureMigration, so an older completed migration for key A can satisfy the check even though the latest rotation still needs objects rewritten from B to A.

Also applies to: 208-210

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`
around lines 169 - 171, The code is using the active-key fingerprint
(encryptionSecret.Annotations[kas.EncryptionActiveKeyHashAnnotation]) to drive
EnsureMigration, which breaks for A→B→A rotations; instead read and use the
rotation token annotation (e.g.,
encryptionSecret.Annotations[kas.EncryptionRotationTokenAnnotation]) as the
writeKey passed to EnsureMigration so each rotation event is unique; update the
usage in the block that reads rekeyNeeded and activeKeyHash and the duplicate
occurrence around the EnsureMigration call (the lines referencing
kas.EncryptionActiveKeyHashAnnotation and the EnsureMigration/writeKey variable)
to use the rotation-token annotation and pass that token through.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go`:
- Around line 188-199: The fingerprinting currently includes the Secret name
(secretEncryption.AESCBC.ActiveKey.Name) which causes renames/copies to change
the fingerprint; change the input construction in the AESCBC case to only
fingerprint the key bytes: compute the SHA256 of
activeKeySecret.Data[hyperv1.AESCBCKeySecretKey] and set input to include only
the algorithm and the hash (e.g. "aescbc/<sha256>"), removing any use of
ActiveKey.Name; ensure you reference activeKeySecret and
hyperv1.AESCBCKeySecretKey when locating the key bytes and preserve existing
error handling around cpContext.Client.Get.

---

Duplicate comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`:
- Around line 254-263: The code removes the rekey-needed annotation before the
updated HCP status is persisted, risking loss of the Completed condition if the
subsequent status update fails; change the order so you persist the updated HCP
status with the ReEncryptionCompleted condition (call the HCP status update via
r.Client.Status().Update or the existing status update helper) first, check that
it succeeded, and only then call removeRekeyAnnotation(encryptionSecret);
reference setReEncryptionCondition, the HCP Status().Update call, and
removeRekeyAnnotation to locate and reorder the operations.
- Around line 169-171: The code is using the active-key fingerprint
(encryptionSecret.Annotations[kas.EncryptionActiveKeyHashAnnotation]) to drive
EnsureMigration, which breaks for A→B→A rotations; instead read and use the
rotation token annotation (e.g.,
encryptionSecret.Annotations[kas.EncryptionRotationTokenAnnotation]) as the
writeKey passed to EnsureMigration so each rotation event is unique; update the
usage in the block that reads rekeyNeeded and activeKeyHash and the duplicate
occurrence around the EnsureMigration call (the lines referencing
kas.EncryptionActiveKeyHashAnnotation and the EnsureMigration/writeKey variable)
to use the rotation-token annotation and pass that token through.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: b08bffff-7931-4567-990b-d445f0cabf23

📥 Commits

Reviewing files that changed from the base of the PR and between e29501b and e1b396f.

⛔ Files ignored due to path filters (4)
  • control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/AROSwift/zz_fixture_TestControlPlaneComponents_kas_secret_encryption_config_secret.yaml is excluded by !**/testdata/**
  • docs/content/reference/aggregated-docs.md is excluded by !docs/content/reference/aggregated-docs.md
  • docs/content/reference/api.md is excluded by !docs/content/reference/api.md
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.go is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (9)
  • api/hypershift/v1beta1/hostedcluster_conditions.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption_test.go
  • control-plane-operator/hostedclusterconfigoperator/api/scheme.go
  • control-plane-operator/hostedclusterconfigoperator/cmd.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption_test.go
  • go.mod
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
✅ Files skipped from review due to trivial changes (3)
  • control-plane-operator/hostedclusterconfigoperator/api/scheme.go
  • go.mod
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption_test.go
  • api/hypershift/v1beta1/hostedcluster_conditions.go

Comment thread control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go Outdated
@muraee muraee force-pushed the etcd-reencryption-on-key-rotation branch from e1b396f to d9f1f6a Compare April 14, 2026 10:04
@devguyio
Copy link
Copy Markdown
Contributor

devguyio commented Apr 14, 2026

I now have all the information needed for a complete root cause analysis. Here is the report:


Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

Commit fab20778d3:
1: CT1 Title does not start with one of fix, feat, chore, docs, style, refactor, perf, test, revert, ci, build:
   "vendor: add library-go migrators and kube-storage-version-migrator informer/lister"

Root Cause

The first commit in the PR (`fab20778d3`) has a title that **violates the Conventional Commits specification** enforced by the repository's `.gitlint` configuration.

The commit title is:

vendor: add library-go migrators and kube-storage-version-migrator informer/lister

The .gitlint configuration at the repository root requires titles to follow Conventional Commits format, with the allowed type prefixes being:

types = fix,feat,chore,docs,style,refactor,perf,test,revert,ci,build

The prefix vendor is not in the allowed list. The gitlint contrib-title-conventional-commits rule (CT1) rejects any commit title that does not begin with one of these types followed by a colon.

The second commit (d5e509743d) passes validation — its title feat: add etcd data re-encryption after encryption key rotation correctly uses the feat prefix.

Recommendations

These are some recommendations based on the failure analysis 1. **Reword the first commit's title** to use an allowed Conventional Commits prefix. The most appropriate prefix for a vendoring commit is `chore` or `build`: ``` chore: vendor library-go migrators and kube-storage-version-migrator informer/lister ``` or: ``` build: vendor library-go migrators and kube-storage-version-migrator informer/lister ```
  1. Interactive rebase to amend the commit message:
    git rebase -i HEAD~2
    # Change 'pick' to 'reword' for the first commit
    # Update the title to use chore: or build: prefix
    git push --force-with-lease

Evidence

This is some collected evidence that helped reach the mentioned root cause
Evidence Detail
Failing commit fab20778d318dbac81809b5f18434bbfacd3716e
Commit title vendor: add library-go migrators and kube-storage-version-migrator informer/lister
Gitlint rule violated CT1 (contrib-title-conventional-commits)
Invalid prefix used vendor
Allowed prefixes fix, feat, chore, docs, style, refactor, perf, test, revert, ci, build
Passing commit d5e509743d — title correctly uses feat: prefix

@muraee muraee force-pushed the etcd-reencryption-on-key-rotation branch from d9f1f6a to 968dac4 Compare April 14, 2026 10:14
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go (1)

135-136: ⚠️ Potential issue | 🟡 Minor

Handle deleted HostedControlPlane as a no-op.

If the watched HCP is already gone, this returns an error and the controller backs off during normal teardown. Treat apierrors.IsNotFound(err) here the same way as the Secret/Deployment lookups and just exit cleanly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`
around lines 135 - 136, When fetching the HostedControlPlane (hcp) with
r.cpClient.Get in the Reconcile handler, treat a not-found error as a benign
delete and exit cleanly instead of returning an error; update the error handling
around r.cpClient.Get(ctx, client.ObjectKeyFromObject(hcp), hcp) to check
apierrors.IsNotFound(err) and return reconcile.Result{}, nil for that case,
while preserving the existing fmt.Errorf path for other errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`:
- Around line 185-200: The code treats a missing
EncryptionMigratedKeyHashAnnotation as "first deployment" and initializes it to
activeKeyHash (via setMigratedKeyHash), which incorrectly skips re-encryption
when kas-secret-encryption-config is recreated during a pending rotation; modify
the logic in the reconcile path that reads EncryptionMigratedKeyHashAnnotation
and compares it to activeKeyHash so that both the "initialize migrated key hash"
branch and the migratedKeyHash == activeKeyHash fast-path are only taken when
the rekey-needed signal (hypershift.openshift.io/encryption-rekey-needed) is not
present/true; i.e., check the hypershift.openshift.io/encryption-rekey-needed
annotation before calling setMigratedKeyHash or returning early on equal hashes
so pending rekey flows continue to perform re-encryption.
- Around line 363-367: The code uses
activeKeySecret.Data[hyperv1.AESCBCKeySecretKey] without validating it and then
hashes potentially empty bytes; update the logic around the c.Get(...) result in
reencryption.go to check that activeKeySecret.Data contains a non-empty entry
for hyperv1.AESCBCKeySecretKey before computing input (the fingerprint), and if
the key is missing or empty return a clear error (or handle it upstream) instead
of generating a fingerprint for empty bytes; adjust any callers that rely on
input/EncryptionMigratedKeyHashAnnotation to expect this error path.

---

Duplicate comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go`:
- Around line 135-136: When fetching the HostedControlPlane (hcp) with
r.cpClient.Get in the Reconcile handler, treat a not-found error as a benign
delete and exit cleanly instead of returning an error; update the error handling
around r.cpClient.Get(ctx, client.ObjectKeyFromObject(hcp), hcp) to check
apierrors.IsNotFound(err) and return reconcile.Result{}, nil for that case,
while preserving the existing fmt.Errorf path for other errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Pro Plus

Run ID: 68ffe0d4-c58d-4278-b70b-957f7c8b917b

📥 Commits

Reviewing files that changed from the base of the PR and between e1b396f and d9f1f6a.

⛔ Files ignored due to path filters (4)
  • control-plane-operator/controllers/hostedcontrolplane/testdata/kube-apiserver/AROSwift/zz_fixture_TestControlPlaneComponents_kas_secret_encryption_config_secret.yaml is excluded by !**/testdata/**
  • docs/content/reference/aggregated-docs.md is excluded by !docs/content/reference/aggregated-docs.md
  • docs/content/reference/api.md is excluded by !docs/content/reference/api.md
  • vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_conditions.go is excluded by !vendor/**, !**/vendor/**
📒 Files selected for processing (8)
  • api/hypershift/v1beta1/hostedcluster_conditions.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go
  • control-plane-operator/hostedclusterconfigoperator/api/scheme.go
  • control-plane-operator/hostedclusterconfigoperator/cmd.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption.go
  • control-plane-operator/hostedclusterconfigoperator/controllers/reencryption/reencryption_test.go
  • go.mod
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
✅ Files skipped from review due to trivial changes (3)
  • control-plane-operator/hostedclusterconfigoperator/api/scheme.go
  • hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go
  • control-plane-operator/controllers/hostedcontrolplane/v2/kas/secretencryption.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • go.mod

@muraee
Copy link
Copy Markdown
Contributor Author

muraee commented Apr 14, 2026

Re: @coderabbitai's comment on AESCBC key validation:

Done. Added validation to return empty fingerprint when key data is missing or empty.


AI-assisted response via Claude Code

@muraee
Copy link
Copy Markdown
Contributor Author

muraee commented Apr 14, 2026

Re: @coderabbitai's comment on missing migrated-hash during rotation:

The encryption-rekey-needed annotation was intentionally removed in this revision. The controller now detects re-encryption need by comparing the active key fingerprint (computed from HCP spec) against encryption-migrated-key-hash on the secret — no boolean signal involved.

The scenario described (secret recreated during pending rotation) requires manual deletion of an operator-managed secret, which would also disrupt KAS itself. In that edge case, the controller initializes the migrated hash to the current active key, which is correct — the CPO will recreate the secret with the current encryption config, and KAS will use the current key going forward. If re-encryption for the previous key was incomplete, the next key rotation will trigger a full re-encryption.


AI-assisted response via Claude Code

@muraee muraee force-pushed the etcd-reencryption-on-key-rotation branch from 968dac4 to 2b0d02c Compare April 14, 2026 10:42
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 14, 2026

Tip

For best results, initiate chat on the files or code changes.

@muraee: Thanks for addressing this! Adding the guard to return an empty fingerprint when the AESCBC key data is missing or empty is the right approach — it prevents computing a potentially incorrect fingerprint from incomplete data and avoids triggering unnecessary re-encryption cycles.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 14, 2026

Tip

For best results, initiate chat on the files or code changes.

@muraee, thanks for the detailed explanation — the design rationale makes sense.

The fingerprint comparison approach (active key fingerprint vs. encryption-migrated-key-hash) is indeed cleaner and self-contained without needing a separate boolean signal. And the edge case analysis is convincing: manual deletion of an operator-managed secret is a disruptive operation that would already break KAS, so initializing the migrated hash to the current active key in that scenario is the correct and safe fallback. The "next rotation triggers full re-encryption" safety net also covers the gap if a prior incomplete re-encryption was interrupted.


📝 Learnings were identified and not saved (knowledge base disabled). Enable

Copy link
Copy Markdown
Member

@cblecker cblecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This is a solid PR overall — clean controller design, good test coverage, and the security-conscious approach to fingerprinting (SHA-256 of key identity, no raw material in annotations) is well done. A few things worth addressing before merge.

Things to fix

The most important issue: computeActiveKeyFingerprint silently returns ("", nil) for any unrecognized SecretEncryption.Type (reencryption.go:371-372). If someone adds a new encryption type to the API and forgets to update this function, re-encryption will just never trigger — no error, no log, nothing. This is especially surprising given that line 351 already correctly returns an error for unsupported KMS providers. The outer default case should do the same.

The PR description needs updating — it describes the CPO computing fingerprints and setting EncryptionActiveKeyHashAnnotation / EncryptionRekeyNeededAnnotation annotations, but the actual implementation has the HCCO computing fingerprints directly from the HCP spec using EncryptionMigratedKeyHashAnnotation. The description should match what was built.

ReEncryptionWaitingForKAS at hostedcluster_conditions.go:336 is missing the Reason suffix that the other three constants have (ReEncryptionInProgressReason, ReEncryptionCompletedReason, ReEncryptionFailedReason). Should be ReEncryptionWaitingForKASReason.

In reencryption.go:140-146, if r.reconcile() fails and the subsequent Status().Update() also fails, the original reconcile error gets silently dropped. Only the status update error bubbles up, which will make debugging painful.

The comment on EncryptionMigratedKeyHashAnnotation (reencryption.go:44-48) says it's "Written by this controller after all StorageVersionMigrations complete," but it's also written at line 192 during first-deployment initialization with no migrations involved. Worth fixing so it doesn't mislead someone later.

Worth considering

The no-op AddEventHandler at reencryption.go:113-116 looks like dead code, but it actually initializes the cacheSynced function pointer inside KubeStorageVersionMigrator — without it HasSynced() would panic. The comment should explain why it's there so nobody helpfully removes it.

If the EncryptionMigratedKeyHashAnnotation gets accidentally deleted after a key rotation (reencryption.go:187-194), the controller treats it as a fresh first deployment and just stamps the current hash without re-encrypting. The log at line 191 says "first deployment" which would be confusing in that scenario. A warning-level log with a more descriptive message would help.

On the testing side: the EtcdDataEncryptionUpToDate condition bubble-up logic in hostedcluster_controller.go:897-906 has three branches and is the user-facing surface of this feature, but it's untested. Similarly, EnsureMigration returning an infrastructure error is never exercised in tests since the mock always returns nil for the error. And there are no tests for NotFound scenarios (encryption config secret absent, KAS deployment absent, AESCBC key secret absent) — these are handled gracefully in the code but would benefit from coverage.

Minor: the condition documentation at hostedcluster_conditions.go:250-251 is inconsistent — the **False** description omits the WaitingForKASConvergence reason, and Absent: uses different formatting than **True**/**False** above it.

What's good

The controller design is clean — separating outer Reconcile for status orchestration from inner reconcile for business logic is a nice pattern. The test suite is thorough with 9 table-driven cases plus dedicated fingerprint and prune tests, all following project conventions (Gherkin naming, gomega, WithStatusSubresource). The condition bubble-up is actually more robust than existing patterns like ValidAWSKMSConfig/ValidAzureKMSConfig since it properly removes the condition when the HCP is nil or the condition is absent. Good reuse of existing utilities throughout.

Comment on lines +371 to +372
default:
return "", nil
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical: Silent empty return for unknown encryption types

The outer default case silently returns ("", nil) for any unrecognized SecretEncryption.Type. If a new encryption type is added to the API without updating this function, re-encryption will silently never trigger — no error, no log, no condition. The user gets no indication that their data isn't being re-encrypted.

Contrast with line 351 where an unsupported KMS provider correctly returns an error:

return "", fmt.Errorf("unsupported KMS provider: %s", secretEncryption.KMS.Provider)

The outer default should follow the same pattern:

default:
    return "", fmt.Errorf("unsupported encryption type: %s", secretEncryption.Type)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Changed the outer default to return an error for unsupported encryption types. Added a corresponding test case in TestComputeActiveKeyFingerprint.


AI-assisted response via Claude Code

ReEncryptionInProgressReason = "ReEncryptionInProgress"
ReEncryptionCompletedReason = "ReEncryptionCompleted"
ReEncryptionFailedReason = "ReEncryptionFailed"
ReEncryptionWaitingForKAS = "WaitingForKASConvergence"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

important: Missing Reason suffix

The other three re-encryption reason constants all use the *Reason suffix (ReEncryptionInProgressReason, ReEncryptionCompletedReason, ReEncryptionFailedReason), but this one omits it. The overwhelming majority of reason constants in this file follow this convention.

Suggested change
ReEncryptionWaitingForKAS = "WaitingForKASConvergence"
ReEncryptionWaitingForKASReason = "WaitingForKASConvergence"

(All references would need updating too.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Renamed ReEncryptionWaitingForKAS to ReEncryptionWaitingForKASReason for consistency with the other three reason constants. Updated all references.


AI-assisted response via Claude Code

Comment on lines +140 to +146
result, err := r.reconcile(ctx, hcp)

// Update HCP status if changed.
if !reflect.DeepEqual(hcp.Status, originalHCP.Status) {
log.Info("Updating HCP status with re-encryption condition")
if updateErr := r.cpClient.Status().Update(ctx, hcp); updateErr != nil {
return reconcile.Result{}, fmt.Errorf("failed to update HCP status: %w", updateErr)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

important: Original reconcile error lost when status update also fails

If r.reconcile() returns an error AND has mutated hcp.Status, and then Status().Update() also fails (e.g., conflict), the original reconcile error is silently dropped — only the status update error is returned. This makes debugging harder because the root cause error is lost.

Consider combining both errors:

if updateErr := r.cpClient.Status().Update(ctx, hcp); updateErr != nil {
    if err != nil {
        return reconcile.Result{}, fmt.Errorf("failed to update HCP status: %w (original error: %v)", updateErr, err)
    }
    return reconcile.Result{}, fmt.Errorf("failed to update HCP status: %w", updateErr)
}

Or check the reconcile error first and return it immediately (the hcpstatus controller follows a similar pattern where it checks reconcile error before status update). In practice, the double-failure case is rare, but when it happens the lost error makes diagnosis very difficult.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Now preserves the original reconcile error in the status update failure message: fmt.Errorf("failed to update HCP status: %w (original error: %v)", updateErr, err).


AI-assisted response via Claude Code

Comment on lines +44 to +48
// EncryptionMigratedKeyHashAnnotation stores the fingerprint of the encryption
// key that etcd data was last re-encrypted with. Written by this controller
// after all StorageVersionMigrations complete. Compared against the active key
// fingerprint (computed from HCP spec) to decide whether re-encryption is needed.
EncryptionMigratedKeyHashAnnotation = "hypershift.openshift.io/encryption-migrated-key-hash"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

important: Comment is factually incomplete

The comment says this annotation is "Written by this controller after all StorageVersionMigrations complete," but the annotation is also written at line 192 during first-deployment initialization without any migrations running. A future maintainer reading only this comment would incorrectly conclude that the annotation's presence always implies a full re-encryption cycle completed.

Consider something like:

Stores the fingerprint of the encryption key that etcd data is known to be encrypted with. Initialized on first deployment (assuming data is already encrypted with the current key), and updated after all StorageVersionMigrations complete following a key rotation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Updated the annotation comment to describe the full lifecycle: initialized on first deployment assuming data is already encrypted, and updated after all StorageVersionMigrations complete following a key rotation. Also noted the comparison against the active key fingerprint.


AI-assisted response via Claude Code

Comment on lines +113 to +116
// Register an event handler on the SVM informer so the migrator's cache stays
// in sync. The controller reconciles on a requeue interval while migrations are
// in progress, so we just need a no-op handler for cache sync.
if _, err := migratorInstance.AddEventHandler(cache.ResourceEventHandlerFuncs{}); err != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Comment should explain the nil-deref risk if this is removed

The no-op cache.ResourceEventHandlerFuncs{} looks removable, but AddEventHandler actually initializes the cacheSynced function pointer inside KubeStorageVersionMigrator (see kubestorageversionmigrator.go:47). Without this call, HasSynced() at line 224 would panic with a nil pointer dereference.

Consider updating the comment:

// Register an event handler on the SVM informer. This call is required
// because AddEventHandler initializes the migrator's internal cacheSynced
// function; without it, HasSynced() will panic. The handler itself is a
// no-op because the controller reconciles on a requeue interval while
// migrations are in progress.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Updated the comment to explain that AddEventHandler initializes the migrator's internal cacheSynced function pointer, and without it HasSynced() will panic with a nil dereference.


AI-assisted response via Claude Code

Comment on lines +187 to +194
if migratedKeyHash == "" {
// First deployment or upgrade — no prior migration record.
// Initialize the migrated hash to the active hash without re-encrypting,
// since the data is already encrypted with the current key.
log.Info("Initializing migrated key hash (first deployment)", "activeKeyHash", activeKeyHash)
if err := r.setMigratedKeyHash(ctx, encryptionSecret, activeKeyHash); err != nil {
return reconcile.Result{}, fmt.Errorf("failed to initialize migrated key hash: %w", err)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Silent re-initialization risk if annotation is deleted

If the EncryptionMigratedKeyHashAnnotation is accidentally deleted (by a user, a bug, or a controller recreating the secret) after a key rotation has occurred but before re-encryption completed, the controller will treat this as "first deployment" and initialize the migrated hash to the active hash — silently skipping re-encryption of data still encrypted with the old key.

Consider logging at Warning level instead of Info here, so accidental annotation deletion is at least observable. Or adding a brief comment noting the assumption (that the annotation absence genuinely means first deployment, not data loss).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Changed to log.V(1).Info(...) so it appears in debug output, and added a caveat comment noting that if the annotation is accidentally deleted after a key rotation, this will skip re-encryption for that rotation but the next rotation will trigger a full re-encryption.


AI-assisted response via Claude Code

Comment on lines +250 to +251
// **False** means re-encryption is in progress or has failed.
// Absent: encryption is not configured or no key rotation has occurred.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Condition doc could be slightly more precise

Two minor points:

  1. **False** can also mean WaitingForKASConvergence (a distinct pre-condition state), not just "in progress or failed." Consider: **False** means the KAS deployment has not yet converged, re-encryption is in progress, or re-encryption has failed. Check the Reason field for specifics.
  2. Absent: uses different formatting than **True** and **False** above it. Consider **Absent** for consistency.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added **Absent** with bold formatting for consistency, and updated the **False** description to mention the WaitingForKASConvergence reason alongside in-progress and failed states.


AI-assisted response via Claude Code

Copy link
Copy Markdown
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @cblecker 's comments related to GoDoc verbosity on the new condition type for clarifying what each state+reason pairing represents, but other than that this LGTM from an API perspective.

Add a re-encryption controller in the HCCO that triggers
StorageVersionMigration after an encryption key rotation, ensuring
all existing etcd data is re-encrypted with the new active key.

Components:
- API: EtcdDataEncryptionUpToDate condition type and reasons
- CPO: key fingerprint computation and rekey-needed annotation
  on kas-secret-encryption-config secret
- HCCO: new reencryption controller using library-go's
  KubeStorageVersionMigrator to drive StorageVersionMigration CRs
- HyperShift Operator: condition bubble-up from HCP to HostedCluster

Ref: OCPSTRAT-2527, OCPSTRAT-2540
Enhancement: openshift/enhancements#1969

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@muraee muraee force-pushed the etcd-reencryption-on-key-rotation branch from 2b0d02c to 851c998 Compare April 15, 2026 16:21
@muraee muraee changed the title feat: etcd data re-encryption after encryption key rotation feat: add etcd data re-encryption after encryption key rotation Apr 15, 2026
@everettraven
Copy link
Copy Markdown
Contributor

Approved from API perspective.

/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 16, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: everettraven, muraee

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 16, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 17, 2026

@muraee: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 1, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 1, 2026

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hypershift-jira-solve-ci
Copy link
Copy Markdown

Now I have all the information needed. Let me compile the final report.

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

1) Tide:   "Not mergeable. PR has a merge conflict."
2) Lint:   "hypershift/v1beta1/hostedcluster_conditions.go:334:1: File is not properly formatted (gci)"
3) Verify: "docs/content/reference/aggregated-docs.md" and "docs/content/reference/api.md" are stale after regeneration

Summary

Three independent issues block this PR. Tide reports a merge conflict with the target branch — this is the "error" state, not a reflection of the other failures. Lint fails because a single gci (Go import/declaration ordering) violation exists in the newly-added constant ReEncryptionInProgressReason at line 334 of hypershift/v1beta1/hostedcluster_conditions.go. Verify fails because the PR introduced new API types/conditions but did not regenerate the reference documentation (api.md and aggregated-docs.md). All 8 pending Prow e2e jobs were never triggered because they are gated behind these lighter checks passing first.

Root Cause

1. Merge Conflict (Tide — error state)
The PR branch etcd-reencryption-on-key-rotation has diverged from the target branch and now has conflicts. Tide independently checks mergeability and reports ERROR when a conflict exists, regardless of other check statuses.

2. Lint Failure — gci formatting violation
The gci linter (Go import grouping/ordering) flagged hypershift/v1beta1/hostedcluster_conditions.go at line 334. The newly added constant ReEncryptionInProgressReason = "ReEncryptionInProgress" (and potentially surrounding declarations) does not conform to the project's configured gci formatting rules. The api-lint make target caught this as the sole remaining issue out of 1,126 scanned. The full-repo lint pass succeeded — the problem is isolated to the API types.

3. Verify Failure — stale generated docs
The PR adds new API conditions/reasons (e.g., ReEncryptionInProgress) but did not regenerate the API reference documentation. The verify step runs controller-gen, codegen, gen-crd-api-reference-docs, and hack/tools/docs-aggregator, then performs a git diff-files --quiet check. Two files differ from what was committed:

  • docs/content/reference/api.md
  • docs/content/reference/aggregated-docs.md
Recommendations
  1. Rebase onto latest target branch to resolve the merge conflict (fixes Tide error):

    git fetch upstream && git rebase upstream/main
  2. Fix the gci lint violation by running the formatter on the affected file:

    gci write hypershift/v1beta1/hostedcluster_conditions.go

    Or run the project's lint-fix target: make api-lint to identify, then fix ordering.

  3. Regenerate API reference docs by running the project's generation targets:

    make generate   # or make verify locally to confirm clean tree

    Then commit the updated docs/content/reference/api.md and docs/content/reference/aggregated-docs.md.

  4. Push the fixed branch — once Lint and Verify pass, the 8 gated Prow e2e jobs will automatically trigger.

Evidence
Evidence Detail
Tide error "Not mergeable. PR has a merge conflict." — mergeStateStatus: UNKNOWN
Lint linter gci (Go import/declaration ordering) — api-lint target
Lint file hypershift/v1beta1/hostedcluster_conditions.go:334:1
Lint message File is not properly formatted (gci) on ReEncryptionInProgressReason = "ReEncryptionInProgress"
Lint scope 1 issue remaining after filtering 1,126 scanned; full-repo lint passed clean
Verify failed step git update-index --refresh + git diff-files --quiet exited code 1
Verify stale file 1 docs/content/reference/aggregated-docs.md (276 doc files aggregated, output differs)
Verify stale file 2 docs/content/reference/api.md (regenerated by gen-crd-api-reference-docs)
Pending e2e jobs 8 Prow e2e jobs in PENDING state — gated behind Lint/Verify, never triggered

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/api Indicates the PR includes changes for the API area/control-plane-operator Indicates the PR includes changes for the control plane operator - in an OCP release area/documentation Indicates the PR includes changes for documentation area/hypershift-operator Indicates the PR includes changes for the hypershift operator and API - outside an OCP release needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants