feat(etcd): implement multi-shard etcd support#7375
feat(etcd): implement multi-shard etcd support#7375jhjaggars wants to merge 11 commits intoopenshift:mainfrom
Conversation
|
Skipping CI for Draft Pull Request. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds etcd sharding: new API types/CRD schema, helper methods to derive effective shards, CLI flags/parsing, shard-aware manifests and reconcilers (StatefulSets, Services, ServiceMonitors, PDBs), orphaned-shard cleanup, PKI updates, kube-apiserver/other component adaptations, init/restore script, tests, and user documentation/examples.
sequenceDiagram
participant HCP as HostedControlPlane Controller
participant REC as Etcd Shard Reconciler
participant MAN as Manifest Provider
participant K8S as Kubernetes API
HCP->>REC: reconcileEtcdShards(hcp, params)
REC->>REC: compute EffectiveShards(hcp)
loop Per shard
REC->>MAN: EtcdStatefulSetForShard(ns, shard)
MAN-->>REC: StatefulSet manifest
REC->>K8S: CreateOrUpdate(StatefulSet)
REC->>MAN: EtcdClientServiceForShard(ns, shard)
MAN-->>REC: Client Service manifest
REC->>K8S: CreateOrUpdate(Service)
REC->>MAN: EtcdDiscoveryServiceForShard(ns, shard)
MAN-->>REC: Discovery Service manifest
REC->>K8S: CreateOrUpdate(Service)
REC->>MAN: EtcdServiceMonitorForShard(ns, shard)
MAN-->>REC: ServiceMonitor manifest
REC->>K8S: CreateOrUpdate(ServiceMonitor)
alt HighlyAvailable
REC->>MAN: EtcdPodDisruptionBudgetForShard(ns, shard)
MAN-->>REC: PDB manifest
REC->>K8S: CreateOrUpdate(PodDisruptionBudget)
end
end
REC->>K8S: CleanupOrphanedShards(activeShards)
K8S-->>REC: orphaned resources removed
REC-->>HCP: Update aggregated etcd status / ControlPlaneComponent
🎯 5 (Critical) | ⏱️ ~120 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
|
|
/test ? |
|
@jtaleric: The following commands are available to trigger required jobs: The following commands are available to trigger optional jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
| import ( | ||
| "k8s.io/utils/ptr" | ||
| ) |
There was a problem hiding this comment.
Bad practice to include dependencies like this in your API types. Impacts users building projects against your APIs. Prefer to have utils like this somewhere separate and not as methods on the types themselves please
| // When shards are specified, this serves as the default for all shards | ||
| // unless overridden per-shard. |
There was a problem hiding this comment.
Could also make it so that this field cannot be set, forcing a choice in each shard to be explicit
| // When specified, exactly one shard must have "/" in its resourcePrefixes. | ||
| // +optional | ||
| // +kubebuilder:validation:MinItems=1 | ||
| // +kubebuilder:validation:MaxItems=10 |
There was a problem hiding this comment.
Any particular decision process to get to 10 as a maximum?
Should document in the godoc that this list accepts at most 10 items
| // +kubebuilder:validation:MaxItems=10 | ||
| // +listType=map | ||
| // +listMapKey=name | ||
| // +kubebuilder:validation:XValidation:rule="self.exists(s, '/' in s.resourcePrefixes)",message="exactly one shard must have '/' prefix" |
There was a problem hiding this comment.
The rule itself checks that at least 1 item has / in the resourcePrefixes, but the message says exactly one, perhaps you want
| // +kubebuilder:validation:XValidation:rule="self.exists(s, '/' in s.resourcePrefixes)",message="exactly one shard must have '/' prefix" | |
| // +kubebuilder:validation:XValidation:rule="self.exists_one(s, '/' in s.resourcePrefixes)",message="exactly one shard must have '/' prefix" |
| // +listType=map | ||
| // +listMapKey=name | ||
| // +kubebuilder:validation:XValidation:rule="self.exists(s, '/' in s.resourcePrefixes)",message="exactly one shard must have '/' prefix" | ||
| // +kubebuilder:validation:XValidation:rule="self.all(s, s.resourcePrefixes.all(p, p == '/' || p.endsWith('#')))",message="non-default prefixes must end with '#'" |
There was a problem hiding this comment.
Why is this rule here vs on the resourcePrefixes field itself?
| // All CRITICAL shards must have quorum | ||
| if criticalTotal > 0 && criticalReady < criticalTotal/2+1 { | ||
| return &metav1.Condition{ | ||
| Type: string(hyperv1.EtcdAvailable), |
There was a problem hiding this comment.
This gets thrown away in the parent scope where this function is called, why are we creating other conditions based on this one vs this being the condition we set?
| // resourceNameForShard generates resource names for etcd shards | ||
| // For backward compatibility, the default shard uses the base name without suffix | ||
| // Named shards use the pattern: baseName-shardName | ||
| func resourceNameForShard(baseName, shardName string) string { |
There was a problem hiding this comment.
This is duplicated in at least one other location in this PR, is there a way to avoid the duplication/possibility of drift
| if len(shards) == 0 { | ||
| return fmt.Errorf("no etcd shards configured") | ||
| } | ||
| defaultShard := shards[0] |
There was a problem hiding this comment.
Is this a good assumption to make?
| args.Set("etcd-servers", p.EtcdURL) | ||
|
|
||
| // Build etcd-servers-overrides if shards are configured | ||
| // Format per Kubernetes docs: /group/resource#url1;url2,/group2/resource2#url3 |
There was a problem hiding this comment.
Why do we require the # in the API vs just constructing it here? Seems to be passing complexity to end users vs something that would be relatively straightforward for us to handle?
| - "/events#" | ||
| priority: Low | ||
| replicas: 1 | ||
| # No backupSchedule = no backups for this shard |
There was a problem hiding this comment.
Docs on the field says it falls back to priority based backup?
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
The changes to control-plane-operator/controllers/hostedcontrolplane/v2/etcd/component.go in this PR don't actually achieve anything since the functions there are not currently called. Importantly, this means that the defrag bits are not deployed if you are trying to use an HA control plane for testing. Will need to fix this, for now will work around this by deploying the components manually |
|
Stale PRs are closed after 21d of inactivity. If this PR is still relevant, comment to refresh it or remove the stale label. If this PR is safe to close now please do so with /lifecycle stale |
Moved etcd management from v2 component framework to direct reconciliation to enable dynamic multi-shard deployments. The v2 framework's single-workload limitation prevented deploying multiple etcd shards. Implementation: - New etcd package with multi-shard reconciliation (shards.go, statefulset.go, services.go, monitoring.go, status.go, cleanup.go, pdb.go) - Shard-aware manifest constructors loading from v2 assets - ControlPlaneComponent resource for dependency tracking - Priority-based status aggregation (Critical/Medium/Low) - Automatic orphan cleanup when shards removed - Asset loading for complete base structures Integration updates: - kube-apiserver: --etcd-servers-overrides with correct format - openshift-apiserver: etcd URL configuration for default shard - KAS wait-for-etcd init container: waits for all shard services - PKI certificates: SANs for all shard-specific services - NO_PROXY: includes all shard service FQDNs Signed-off-by: Jesse Jaggars <jjaggars@redhat.com> Commit-Message-Assisted-by: Claude (via Claude Code)
Add comprehensive documentation for the newly implemented CLI flags for etcd sharding configuration (--etcd-sharding-config and --etcd-shard). Changes: - Add new "CLI Usage" section with file-based and inline examples - Update "Limitations" section to remove outdated "No CLI support" - Update "Future Enhancements" to mark CLI integration as complete - Add example-inline-usage.sh demonstrating inline flag usage The documentation now covers both configuration methods with examples for AWS, Azure, and other platforms, including global storage defaults, manifest rendering, and validation rules. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Implement CLI flags for configuring etcd sharding during cluster creation. Users can now specify etcd shard configuration via command-line flags or configuration files. New flags: - --etcd-sharding-config: Path to YAML/JSON configuration file - --etcd-shard: Inline shard definition (repeatable) Features: - File-based configuration for complex multi-shard setups - Inline flags for simple 2-3 shard configurations - Validation of shard configuration (catch-all requirement, DNS-1035 names, etc.) - Global storage defaults via --etcd-storage-class and --etcd-storage-size - Support for all shard properties: name, prefixes, priority, replicas, storage, backup schedule - Mutual exclusivity between file and inline configuration methods - Full render support for previewing configurations Implementation includes: - RawCreateOptions struct fields for both configuration methods - Flag bindings in bindCoreOptions() - Validation in Validate() method - Shard building in prototypeResources() - Helper functions: buildEtcdShards, parseShardingConfigFile, parseInlineShards, parseInlineShard, applyGlobalDefaults, validateEtcdSharding 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
The OAuth APIServer was failing to connect to sharded etcd services because NO_PROXY only contained the short hostname (e.g., etcd-client-main) but the actual connection URL used the FQDN (e.g., etcd-client-main.namespace.svc). This caused connections to incorrectly route through the konnectivity proxy, resulting in connection failures. Changes: - Extract etcd FQDN from the etcd URL for sharded configurations - Add both short hostname and FQDN to NO_PROXY list - Support for managed etcd sharding with default shard detection 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
This field was accidentally removed during etcd sharding implementation. Restores the field that enables --service-account-max-token-expiration flag support for graceful service account key rotation. Fixes regression from commits 4ce7874 and cbb3666. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
|
@jhjaggars: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jhjaggars The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 19
♻️ Duplicate comments (3)
control-plane-operator/controllers/hostedcontrolplane/etcd/pdb.go (1)
18-21:⚠️ Potential issue | 🟠 MajorComment and code disagree on quorum requirements.
The comment states "For a 3-node etcd cluster, we need at least 2 available (quorum = n/2 + 1)" but
MinAvailableis set to 1. WithMinAvailable: 1, up to 2 nodes could be unavailable simultaneously, breaking quorum.Either update
MinAvailableto 2, or useMaxUnavailable: 1to ensure only one node can be disrupted at a time, or correct the comment if the current value is intentional.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/controllers/hostedcontrolplane/etcd/pdb.go` around lines 18 - 21, The comment and code disagree: the comment says a 3-node etcd cluster requires quorum of 2 but the pdb.Spec.MinAvailable is set to 1 (minAvailable variable). Update the PDB so the code matches the comment by setting minAvailable to 2 (i.e., set pdb.Spec.MinAvailable via minAvailable := intstr.FromInt(2)), or alternatively replace the MinAvailable usage with MaxUnavailable := intstr.FromInt(1) and set pdb.Spec.MaxUnavailable to enforce only one disruption; ensure the chosen change is reflected in the comment and remove the inconsistency between pdb.Spec.MinAvailable/minAvailable and the comment.cmd/cluster/core/create.go (1)
1268-1280:⚠️ Potential issue | 🟠 Major
--etcd-sharding-configrejects the checked-in example shape.This parser only accepts a top-level
shards:document. The example indocs/content/how-to/etcd-sharding/example-3-shard.yamlis a fullHostedClusterwith shards underspec.etcd.managed.shards, so passing that file to--etcd-sharding-configwill always fail with “must contain at least one shard”.Suggested fix
- var config struct { - Shards []hyperv1.ManagedEtcdShardSpec `json:"shards"` - } + var config struct { + Shards []hyperv1.ManagedEtcdShardSpec `json:"shards"` + Spec struct { + Etcd struct { + Managed struct { + Shards []hyperv1.ManagedEtcdShardSpec `json:"shards"` + } `json:"managed"` + } `json:"etcd"` + } `json:"spec"` + } @@ - if len(config.Shards) == 0 { + shards := config.Shards + if len(shards) == 0 { + shards = config.Spec.Etcd.Managed.Shards + } + if len(shards) == 0 { return nil, fmt.Errorf("etcd sharding config file must contain at least one shard") } - return config.Shards, nil + return shards, nil🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/cluster/core/create.go` around lines 1268 - 1280, The YAML parsing currently only accepts a top-level "shards" document and rejects full HostedCluster manifests; update the parsing to accept both shapes by first unmarshalling into the existing config (with Shards []hyperv1.ManagedEtcdShardSpec) and if that yields zero shards, unmarshal (or decode) into a HostedCluster-shaped struct that contains Spec.Etcd.Managed.Shards and extract those into config.Shards; ensure you continue to return config.Shards and keep the same error messages when no shards are found. Use the existing variables/data and the config.Shards identifier and the HostedCluster nested path (Spec -> Etcd -> Managed -> Shards) to locate and extract the shards.control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go (1)
23-29:⚠️ Potential issue | 🟠 MajorDon’t assume
shards[0]is the catch-all shard.
EffectiveShards()preserves the user-defined order; it does not move the"/"shard to index 0. Ifeventsorleasesis listed first, this StatefulSet will reconcile the wrong shard.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go` around lines 23 - 29, The code incorrectly assumes the first element of the slice returned by managedEtcdSpec.EffectiveShards(hcp) is the catch-all shard; update the selection logic so you iterate the shards slice and pick the shard whose Name (or equivalent identifier) equals "/" as the defaultShard, and only if no "/" shard exists fall back to shards[0]; adjust any error handling around the shards slice accordingly (look for managedEtcdSpec.EffectiveShards, the shards variable and the defaultShard usage).
🧹 Nitpick comments (4)
docs/content/how-to/etcd-sharding.md (2)
194-194: Fix spaces inside emphasis markers.The line has spaces inside emphasis markers (e.g.,
* * * * *being interpreted as emphasis). Escape the asterisks or use backticks to prevent markdown parsing issues.Suggested fix
-- `backup-schedule`: Cron format (e.g., "*/30 * * * *") +- `backup-schedule`: Cron format (e.g., `*/30 * * * *`)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/content/how-to/etcd-sharding.md` at line 194, The markdown line documenting `backup-schedule` contains a cron example with unescaped asterisks ("*/30 * * * *") that can be parsed as emphasis; update the text for the `backup-schedule` entry so the cron expression is protected by inline code or escaped asterisks (e.g., wrap "*/30 * * * *" in backticks or escape each *), ensuring the `backup-schedule` label and its cron example render correctly.
181-183: Add language identifier to fenced code block.The code block showing inline flag format is missing a language identifier, which triggers markdownlint MD040. Since this is showing command-line syntax, consider using
textorbash.Suggested fix
**Inline Flag Format:** -``` +```text --etcd-shard name=<name>,prefixes=<prefix1>|<prefix2>,priority=<priority>,replicas=<1|3>,storage-size=<size>,storage-class=<class>,backup-schedule=<cron></details> <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against the current code and only fix it if needed.
In
@docs/content/how-to/etcd-sharding.mdaround lines 181 - 183, The fenced code
block that shows the inline flag format is missing a language identifier
(triggers MD040); update the triple-backtick fence that contains the line
starting with "--etcd-shard name=..." to include a language identifier such as
text or bash (e.g., ```text) so the code block is properly annotated.</details> </blockquote></details> <details> <summary>control-plane-operator/controllers/hostedcontrolplane/etcd/statefulset.go (2)</summary><blockquote> `91-101`: **Fragile string replacement for script modification.** The approach of using `strings.ReplaceAll` to modify shell script content (lines 94-98) is fragile. If the original script in the YAML asset changes format (e.g., extra whitespace, different quoting), these replacements will silently fail to match. Consider either: 1. Regenerating the script entirely with the correct service names 2. Using environment variables in the script that can be set via the container spec <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/controllers/hostedcontrolplane/etcd/statefulset.go` around lines 91 - 101, The current use of strings.ReplaceAll on c.Args[1] (variable script) to substitute service names is fragile and can silently fail; instead either (A) regenerate the entire script string explicitly using fmt.Sprintf and assign it to c.Args (replace the strings.ReplaceAll block with a constructed script string that interpolates clientService and discoveryService into the exact script you need), or (B) avoid editing the script text and set container environment variables (e.g., ETCDCTL_ENDPOINTS and a DISCOVERY_SERVICE or similar) in the pod spec and update the asset script to reference those env vars; update the code paths that touch c.Args, script, clientService and discoveryService accordingly. ``` </details> --- `192-198`: **Defrag controller container added but not removed when no longer needed.** The defrag controller is added when `params.NeedsDefragController` is true, but there's no corresponding logic to remove it if the availability policy changes from HighlyAvailable to SingleReplica. This could leave orphaned containers. However, since StatefulSet selector is immutable and changing availability policy likely requires cluster recreation, this may be acceptable. <details> <summary>🤖 Prompt for AI Agents</summary> ``` Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/controllers/hostedcontrolplane/etcd/statefulset.go` around lines 192 - 198, The code adds the etcd-defrag controller container when params.NeedsDefragController is true but never removes it when that flag becomes false; update the logic around params.NeedsDefragController so it also removes the container and resets ServiceAccountName when no longer needed: detect presence with hasContainer(sts.Spec.Template.Spec.Containers, "etcd-defrag"), remove the container entry from sts.Spec.Template.Spec.Containers when params.NeedsDefragController is false, and clear or restore sts.Spec.Template.Spec.ServiceAccountName (manifests.EtcdDefragControllerServiceAccount("")) as appropriate; use buildEtcdDefragControllerContainer only when adding and ensure removal is safe for immutable selector scenarios. ``` </details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against the current code and only fix it if needed.
Inline comments:
In@api/hypershift/v1beta1/hostedcluster_types.go:
- Around line 2058-2085: Add XValidation rules on the UnmanagedEtcdSpec type to
enforce that either Shards is provided or both Endpoint and TLS are provided,
and to prevent supplying Endpoint/TLS when Shards is set; specifically add a
rule like "has(self.shards) || (self.endpoint != '' && has(self.tls))" with an
appropriate message, and a second rule that forbids endpoint/tls when shards
exists (e.g. "not(has(self.shards)) || (self.endpoint == '' &&
not(has(self.tls)))") so UnmanagedEtcdSpec (type name) cannot be left without
configuration and cannot mix legacy fields (Endpoint, TLS) with Shards.In
@cmd/cluster/core/create.go:
- Around line 1302-1310: parseInlineShard currently naively splits def on every
comma which breaks cron expressions; change the parsing in parseInlineShard so
commas that occur inside a value (e.g. a cron schedule) are not treated as
separators: when iterating pairs from strings.Split(def, ",") if
strings.SplitN(pair, "=", 2) yields len != 2 then treat that token as part of
the previous value (append ","+pair to the last accumulated token) instead of
returning an error; update the loop that builds pairs and the kv handling to
only error if you encounter a token without "=" when there is no previous token
to attach to, keeping references to def, pairs, and the kv split logic for
location.In
@control-plane-operator/controllers/hostedcontrolplane/etcd/etcd-init.sh:
- Line 12: The curl invocation using the RESTORE_URL variable is unquoted which
can cause word-splitting or unexpected behavior; update the command that writes
the snapshot (the curl call that currently references ${RESTORE_URL}) to quote
the variable (use "${RESTORE_URL}") so the entire URL is passed as a single
argument to curl.- Around line 1-3: Add a shebang as the first line of etcd-init.sh (before the
existing mkdir -p /var/lib/data and the ls check) so the script runs with a
defined shell interpreter (e.g., use #!/bin/bash or #!/bin/sh); ensure the
shebang is the very first line to guarantee correct interpreter selection when
executing the etcd-init.sh script that contains the mkdir -p /var/lib/data and
the [ "$(ls -A /var/lib/data)" ] check.In
@control-plane-operator/controllers/hostedcontrolplane/etcd/params.go:
- Around line 66-70: The code accesses hcp.Spec.Networking.ClusterNetwork[0]
without checking the slice length; add a defensive bounds check before calling
util.IsIPv4CIDR in the function that returns (*ShardParams, error) so that if
len(hcp.Spec.Networking.ClusterNetwork) == 0 you return a clear error (e.g.,
fmt.Errorf("empty ClusterNetwork in HCP spec")) instead of panicking; then
proceed to call
util.IsIPv4CIDR(hcp.Spec.Networking.ClusterNetwork[0].CIDR.String()) as before.In
@control-plane-operator/controllers/hostedcontrolplane/etcd/services.go:
- Around line 17-22: The Service's shard label is only being set on
svc.Spec.Selector but must also be added to Service.metadata.labels so
ServiceMonitor discovery can match it; update the code in the service
reconciliation (the block that sets
svc.Spec.Selector["hypershift.openshift.io/etcd-shard"] = shard.Name and the
similar block at the later occurrence) to ensure svc.Labels is initialized (if
nil) and then set svc.Labels["hypershift.openshift.io/etcd-shard"] = shard.Name
(and keep the existing selector assignment) so ReconcileServiceMonitor can
discover shard-specific etcd Services.In
@control-plane-operator/controllers/hostedcontrolplane/etcd/shards_test.go:
- Around line 26-31: The test's scheme variable used in the ReconcileEtcdShards
test is missing the prometheusoperatorv1 registration, so the fake client cannot
recognize ServiceMonitor GVKs created by manifests.EtcdServiceMonitorForShard;
add prometheusoperatorv1.AddToScheme(scheme) alongside the existing AddToScheme
calls (where scheme is created) so ServiceMonitor objects are properly
registered before invoking ReconcileEtcdShards.In
@control-plane-operator/controllers/hostedcontrolplane/etcd/shards.go:
- Around line 26-30: ReconcileEtcdShards currently calls
hcp.Spec.Etcd.Managed.EffectiveShards(hcp) without ensuring Managed is non-nil;
add a defensive nil check for hcp.Spec.Etcd.Managed at the top of
ReconcileEtcdShards and return a clear error (or nil/short-circuit as
appropriate) if Managed is nil before calling EffectiveShards, referencing the
ReconcileEtcdShards function and the hcp.Spec.Etcd.Managed.EffectiveShards(hcp)
call so the fix is applied where the dereference occurs.In
@control-plane-operator/controllers/hostedcontrolplane/etcd/statefulset.go:
- Around line 215-226: The code indexes sts.Spec.VolumeClaimTemplates[0] without
ensuring the slice is non-empty; modify the block handling storage.Type ==
hyperv1.PersistentVolumeEtcdStorage (and when storage.PersistentVolume pv !=
nil) to first check len(sts.Spec.VolumeClaimTemplates) > 0 before accessing [0],
and if it is zero either append a new corev1.PersistentVolumeClaim template
(constructed to match the expected fields) or skip the mutation depending on
intended behavior; ensure you update the same sts.Spec.VolumeClaimTemplates
slice used elsewhere so subsequent code (and functions referencing sts) see the
new template if you choose to append.- Around line 44-48: The v2 adapter's service name construction for Etcd shards
is using the incorrect pattern ("etcd--discovery") and must match the
main implementation's "etcd-discovery" or "etcd-discovery-" pattern;
update the code in the v2 statefulset creation (the block that sets
sts.Spec.ServiceName in
control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go) to
use the same logic as the main implementation (if shard.Name == "default" then
"etcd-discovery" else fmt.Sprintf("etcd-discovery-%s", shard.Name)) so it aligns
with manifests.EtcdDiscoveryServiceForShard/resourceNameForShard.In
@control-plane-operator/controllers/hostedcontrolplane/etcd/status.go:
- Around line 143-150: The current quorum check uses aggregated counters
(criticalTotal/criticalReady) which can hide a single shard lacking quorum;
update the logic in the Etcd availability calculation (the block that returns a
metav1.Condition with Type=string(hyperv1.EtcdAvailable) and
Reason=hyperv1.EtcdWaitingForQuorumReason) to iterate over each critical shard
instead of using criticalTotal/criticalReady, compute quorum per shard as
(replicas/2 + 1) and ensure each shard's ready count >= its quorum, collect
per-shard failure messages into messages and only return the Condition when any
shard fails its own quorum check.- Line 122: The code dereferences sts.Spec.Replicas into requiredReplicas
without checking for nil; update the logic around the assignment of
requiredReplicas to first check if sts.Spec != nil and sts.Spec.Replicas != nil
and handle the nil case (e.g., set requiredReplicas to a safe default like 1 or
return an error/early exit), so replace the direct dereference of
sts.Spec.Replicas with a guarded read that uses the chosen fallback and avoid a
possible nil pointer panic when evaluating requiredReplicas.In
@control-plane-operator/controllers/hostedcontrolplane/v2/etcd/service.go:
- Around line 23-29: Service names for etcd shards are using the wrong pattern
("etcd--client"/"etcd--discovery") instead of the canonical
"etcd-client-"/"etcd-discovery-", causing hostname mismatches;
update the assignments where svc.Name is set (the block that checks
defaultShard.Name == "default" and the similar block at the later 53-59 lines)
so that the default shard uses "etcd-client" / "etcd-discovery" and non-default
shards use fmt.Sprintf("etcd-client-%s", defaultShard.Name) /
fmt.Sprintf("etcd-discovery-%s", defaultShard.Name) respectively, leaving the
default-case naming unchanged.In
@control-plane-operator/controllers/hostedcontrolplane/v2/etcd/servicemonitor.go:
- Around line 43-54: The code accesses sm.Spec.Endpoints[0] (setting
MetricRelabelConfigs and calling util.ApplyClusterIDLabel) before confirming
there is at least one endpoint, which can panic; change the logic to first check
len(sm.Spec.Endpoints) > 0 and only then set
sm.Spec.Endpoints[0].MetricRelabelConfigs, call
util.ApplyClusterIDLabel(&sm.Spec.Endpoints[0], cpContext.HCP.Spec.ClusterID),
and append the shard RelabelConfig (using defaultShard) to
sm.Spec.Endpoints[0].RelabelConfigs; ensure all references to
cpContext.MetricsSet and cpContext.HCP.Spec.ClusterID are used inside that
guarded block so no access occurs when there are no endpoints.In
@control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go:
- Around line 45-50: The Service name for named shards is using the wrong
convention; when defaultShard.Name != "default" update sts.Spec.ServiceName (and
any code building ETCD_INITIAL_CLUSTER entries) to use the canonical pattern
"etcd-discovery-" instead of "etcd--discovery"; leave sts.Name as
"etcd-" if that is desired but ensure every reference that computes the
discovery Service (including where ETCD_INITIAL_CLUSTER or related env vars are
constructed) uses fmt.Sprintf("etcd-discovery-%s", defaultShard.Name) for named
shards.In
@control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment.go:
- Around line 98-124: The wait-for-etcd init container builds DNS names using
the wrong shard service naming convention; in UpdateContainer("wait-for-etcd",
...) where serviceName is constructed for non-default shards, change the
fmt.Sprintf pattern from "etcd-client-%s" to "etcd-%s-client" so it matches the
service names produced by hcp.Spec.Etcd.Managed.EffectiveShards and the service
creation in v2/etcd/service.go; then run a quick grep (including kas/params.go
and other kas-related components) to ensure all consumers use the same
"etcd-{shard}-client" format and update any mismatches.In
@control-plane-operator/controllers/hostedcontrolplane/v2/kas/params.go:
- Around line 255-271: buildUnmanagedEtcdConfig can return an empty defaultURL
when no shard has the "/" prefix; update the function (buildUnmanagedEtcdConfig)
to guarantee a non-empty default by either (A) adding a fallback: if defaultURL
== "" and len(shards) > 0 then set defaultURL = shards[0].Endpoint (or pick the
first shard you deem appropriate) and optionally add a warning log, or (B) more
robustly change the signature to return an error (string, map[string]string,
error), validate that a catch‑all "/" shard exists and return an error to
callers if it does not—apply the chosen fix and update all callers to handle the
fallback or the new error return.In
@control-plane-operator/controllers/hostedcontrolplane/v2/oauth_apiserver/deployment.go:
- Around line 39-48: The current loop that constructs the etcd client service
name using shard.Name (in the code inspecting
cpContext.HCP.Spec.Etcd.Managed.Shards and comparing ResourcePrefixes to "/")
will produce "etcd-client-default" when the shard.Name is "default", but the
canonical service is "etcd-client"; update the logic inside that loop (where
serviceName is built and etcdURL/etcdHostname are set) to special-case
shard.Name == "default" and use the unsuffixed "etcd-client" serviceName in that
case, otherwise keep the existing fmt.Sprintf("etcd-client-%s", shard.Name)
behavior so --etcd-servers and NO_PROXY point to the real Service.
Duplicate comments:
In@cmd/cluster/core/create.go:
- Around line 1268-1280: The YAML parsing currently only accepts a top-level
"shards" document and rejects full HostedCluster manifests; update the parsing
to accept both shapes by first unmarshalling into the existing config (with
Shards []hyperv1.ManagedEtcdShardSpec) and if that yields zero shards, unmarshal
(or decode) into a HostedCluster-shaped struct that contains
Spec.Etcd.Managed.Shards and extract those into config.Shards; ensure you
continue to return config.Shards and keep the same error messages when no shards
are found. Use the existing variables/data and the config.Shards identifier and
the HostedCluster nested path (Spec -> Etcd -> Managed -> Shards) to locate and
extract the shards.In
@control-plane-operator/controllers/hostedcontrolplane/etcd/pdb.go:
- Around line 18-21: The comment and code disagree: the comment says a 3-node
etcd cluster requires quorum of 2 but the pdb.Spec.MinAvailable is set to 1
(minAvailable variable). Update the PDB so the code matches the comment by
setting minAvailable to 2 (i.e., set pdb.Spec.MinAvailable via minAvailable :=
intstr.FromInt(2)), or alternatively replace the MinAvailable usage with
MaxUnavailable := intstr.FromInt(1) and set pdb.Spec.MaxUnavailable to enforce
only one disruption; ensure the chosen change is reflected in the comment and
remove the inconsistency between pdb.Spec.MinAvailable/minAvailable and the
comment.In
@control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go:
- Around line 23-29: The code incorrectly assumes the first element of the slice
returned by managedEtcdSpec.EffectiveShards(hcp) is the catch-all shard; update
the selection logic so you iterate the shards slice and pick the shard whose
Name (or equivalent identifier) equals "/" as the defaultShard, and only if no
"/" shard exists fall back to shards[0]; adjust any error handling around the
shards slice accordingly (look for managedEtcdSpec.EffectiveShards, the shards
variable and the defaultShard usage).
Nitpick comments:
In@control-plane-operator/controllers/hostedcontrolplane/etcd/statefulset.go:
- Around line 91-101: The current use of strings.ReplaceAll on c.Args[1]
(variable script) to substitute service names is fragile and can silently fail;
instead either (A) regenerate the entire script string explicitly using
fmt.Sprintf and assign it to c.Args (replace the strings.ReplaceAll block with a
constructed script string that interpolates clientService and discoveryService
into the exact script you need), or (B) avoid editing the script text and set
container environment variables (e.g., ETCDCTL_ENDPOINTS and a DISCOVERY_SERVICE
or similar) in the pod spec and update the asset script to reference those env
vars; update the code paths that touch c.Args, script, clientService and
discoveryService accordingly.- Around line 192-198: The code adds the etcd-defrag controller container when
params.NeedsDefragController is true but never removes it when that flag becomes
false; update the logic around params.NeedsDefragController so it also removes
the container and resets ServiceAccountName when no longer needed: detect
presence with hasContainer(sts.Spec.Template.Spec.Containers, "etcd-defrag"),
remove the container entry from sts.Spec.Template.Spec.Containers when
params.NeedsDefragController is false, and clear or restore
sts.Spec.Template.Spec.ServiceAccountName
(manifests.EtcdDefragControllerServiceAccount("")) as appropriate; use
buildEtcdDefragControllerContainer only when adding and ensure removal is safe
for immutable selector scenarios.In
@docs/content/how-to/etcd-sharding.md:
- Line 194: The markdown line documenting
backup-schedulecontains a cron
example with unescaped asterisks ("*/30 * * * ") that can be parsed as
emphasis; update the text for thebackup-scheduleentry so the cron expression
is protected by inline code or escaped asterisks (e.g., wrap "/30 * * * *" in
backticks or escape each *), ensuring thebackup-schedulelabel and its cron
example render correctly.- Around line 181-183: The fenced code block that shows the inline flag format
is missing a language identifier (triggers MD040); update the triple-backtick
fence that contains the line starting with "--etcd-shard name=..." to include a
language identifier such as text or bash (e.g., ```text) so the code block is
properly annotated.</details> <details> <summary>🪄 Autofix (Beta)</summary> Fix all unresolved CodeRabbit comments on this PR: - [ ] <!-- {"checkboxId": "4b0d0e0a-96d7-4f10-b296-3a18ea78f0b9"} --> Push a commit to this branch (recommended) - [ ] <!-- {"checkboxId": "ff5b1114-7d8c-49e6-8ac1-43f82af23a33"} --> Create a new PR with the fixes </details> --- <details> <summary>ℹ️ Review info</summary> <details> <summary>⚙️ Run configuration</summary> **Configuration used**: Repository YAML (base), Central YAML (inherited) **Review profile**: CHILL **Plan**: Pro Plus **Run ID**: `e1f21541-991d-4a6f-b3d9-789587f4c738` </details> <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 1180bcaf557d7df5da61fc97197c69c68c32c4e2 and a83140a3e23eed929f4b312d37b0500c5d86a477. </details> <details> <summary>⛔ Files ignored due to path filters (34)</summary> * `api/hypershift/v1beta1/zz_generated.deepcopy.go` is excluded by `!**/zz_generated*.go`, `!**/zz_generated*` * `api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/AAA_ungated.yaml` is excluded by `!**/zz_generated.featuregated-crd-manifests/**` * `api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/AutoNodeKarpenter.yaml` is excluded by `!**/zz_generated.featuregated-crd-manifests/**` * `api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ClusterUpdateAcceptRisks.yaml` is excluded by `!**/zz_generated.featuregated-crd-manifests/**` * `api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ClusterVersionOperatorConfiguration.yaml` is excluded by `!**/zz_generated.featuregated-crd-manifests/**` * `api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDC.yaml` is excluded by `!**/zz_generated.featuregated-crd-manifests/**` * `api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDCWithUIDAndExtraClaimMappings.yaml` is excluded by `!**/zz_generated.featuregated-crd-manifests/**` * `api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/GCPPlatform.yaml` is excluded by `!**/zz_generated.featuregated-crd-manifests/**` * `api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/HyperShiftOnlyDynamicResourceAllocation.yaml` is excluded by `!**/zz_generated.featuregated-crd-manifests/**` * `api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ImageStreamImportMode.yaml` is excluded by `!**/zz_generated.featuregated-crd-manifests/**` * `api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/KMSEncryptionProvider.yaml` is excluded by `!**/zz_generated.featuregated-crd-manifests/**` * `api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/OpenStack.yaml` is excluded by `!**/zz_generated.featuregated-crd-manifests/**` * `api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/AAA_ungated.yaml` is excluded by `!**/zz_generated.featuregated-crd-manifests/**` * `api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/AutoNodeKarpenter.yaml` is excluded by `!**/zz_generated.featuregated-crd-manifests/**` * `api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ClusterUpdateAcceptRisks.yaml` is excluded by `!**/zz_generated.featuregated-crd-manifests/**` * `api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ClusterVersionOperatorConfiguration.yaml` is excluded by `!**/zz_generated.featuregated-crd-manifests/**` * `api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ExternalOIDC.yaml` is excluded by `!**/zz_generated.featuregated-crd-manifests/**` * `api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ExternalOIDCWithUIDAndExtraClaimMappings.yaml` is excluded by `!**/zz_generated.featuregated-crd-manifests/**` * `api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/GCPPlatform.yaml` is excluded by `!**/zz_generated.featuregated-crd-manifests/**` * `api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/HyperShiftOnlyDynamicResourceAllocation.yaml` is excluded by `!**/zz_generated.featuregated-crd-manifests/**` * `api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ImageStreamImportMode.yaml` is excluded by `!**/zz_generated.featuregated-crd-manifests/**` * `api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/KMSEncryptionProvider.yaml` is excluded by `!**/zz_generated.featuregated-crd-manifests/**` * `api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/OpenStack.yaml` is excluded by `!**/zz_generated.featuregated-crd-manifests/**` * `client/applyconfiguration/hypershift/v1beta1/managedetcdshardspec.go` is excluded by `!client/**` * `client/applyconfiguration/hypershift/v1beta1/managedetcdspec.go` is excluded by `!client/**` * `client/applyconfiguration/hypershift/v1beta1/unmanagedetcdshardspec.go` is excluded by `!client/**` * `client/applyconfiguration/hypershift/v1beta1/unmanagedetcdspec.go` is excluded by `!client/**` * `client/applyconfiguration/utils.go` is excluded by `!client/**` * `cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-Default.crd.yaml` is excluded by `!**/zz_generated.crd-manifests/**`, `!cmd/install/assets/**/*.yaml` * `cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-Default.crd.yaml` is excluded by `!**/zz_generated.crd-manifests/**`, `!cmd/install/assets/**/*.yaml` * `docs/content/reference/api.md` is excluded by `!docs/content/reference/api.md` * `vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_helpers.go` is excluded by `!vendor/**`, `!**/vendor/**` * `vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_types.go` is excluded by `!vendor/**`, `!**/vendor/**` * `vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/zz_generated.deepcopy.go` is excluded by `!vendor/**`, `!**/vendor/**`, `!**/zz_generated*.go`, `!**/zz_generated*` </details> <details> <summary>📒 Files selected for processing (30)</summary> * `api/hypershift/v1beta1/hostedcluster_helpers.go` * `api/hypershift/v1beta1/hostedcluster_types.go` * `cmd/cluster/core/create.go` * `control-plane-operator/controllers/hostedcontrolplane/etcd/cleanup.go` * `control-plane-operator/controllers/hostedcontrolplane/etcd/etcd-init.sh` * `control-plane-operator/controllers/hostedcontrolplane/etcd/monitoring.go` * `control-plane-operator/controllers/hostedcontrolplane/etcd/params.go` * `control-plane-operator/controllers/hostedcontrolplane/etcd/pdb.go` * `control-plane-operator/controllers/hostedcontrolplane/etcd/services.go` * `control-plane-operator/controllers/hostedcontrolplane/etcd/shards.go` * `control-plane-operator/controllers/hostedcontrolplane/etcd/shards_test.go` * `control-plane-operator/controllers/hostedcontrolplane/etcd/statefulset.go` * `control-plane-operator/controllers/hostedcontrolplane/etcd/statefulset_test.go` * `control-plane-operator/controllers/hostedcontrolplane/etcd/status.go` * `control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go` * `control-plane-operator/controllers/hostedcontrolplane/manifests/etcd.go` * `control-plane-operator/controllers/hostedcontrolplane/pki/etcd.go` * `control-plane-operator/controllers/hostedcontrolplane/v2/etcd/component.go` * `control-plane-operator/controllers/hostedcontrolplane/v2/etcd/service.go` * `control-plane-operator/controllers/hostedcontrolplane/v2/etcd/servicemonitor.go` * `control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go` * `control-plane-operator/controllers/hostedcontrolplane/v2/kas/config.go` * `control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment.go` * `control-plane-operator/controllers/hostedcontrolplane/v2/kas/params.go` * `control-plane-operator/controllers/hostedcontrolplane/v2/oapi/config.go` * `control-plane-operator/controllers/hostedcontrolplane/v2/oapi/deployment.go` * `control-plane-operator/controllers/hostedcontrolplane/v2/oauth_apiserver/deployment.go` * `docs/content/how-to/etcd-sharding.md` * `docs/content/how-to/etcd-sharding/example-3-shard.yaml` * `docs/content/how-to/etcd-sharding/example-inline-usage.sh` </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
| // integrate with an externally managed etcd cluster. | ||
| type UnmanagedEtcdSpec struct { | ||
| // endpoint is the full etcd cluster client endpoint URL. For example: | ||
| // | ||
| // https://etcd-client:2379 | ||
| // | ||
| // If the URL uses an HTTPS scheme, the TLS field is required. | ||
| // | ||
| // endpoint is the full etcd cluster client endpoint URL. | ||
| // Used only when shards is not specified (legacy single-etcd mode). | ||
| // When shards are specified, this field is ignored. | ||
| // +optional | ||
| // +kubebuilder:validation:Pattern=`^https://` | ||
| // +kubebuilder:validation:MaxLength=255 | ||
| Endpoint string `json:"endpoint,omitempty"` | ||
|
|
||
| // tls specifies TLS configuration for HTTPS etcd client endpoints. | ||
| // Used only when shards is not specified (legacy single-etcd mode). | ||
| // When shards are specified, this field is ignored. | ||
| // +optional | ||
| TLS *EtcdTLSConfig `json:"tls,omitempty"` | ||
|
|
||
| // shards configures etcd sharding by Kubernetes resource kind. | ||
| // When not specified, uses endpoint and tls fields (legacy single-etcd mode). | ||
| // When specified, exactly one shard must have "/" in its resourcePrefixes. | ||
| // +optional | ||
| // +kubebuilder:validation:MinItems=1 | ||
| // +kubebuilder:validation:MaxItems=10 | ||
| // +listType=map | ||
| // +listMapKey=name | ||
| // +kubebuilder:validation:XValidation:rule="self.exists(s, '/' in s.resourcePrefixes)",message="exactly one shard must have '/' prefix" | ||
| // +kubebuilder:validation:XValidation:rule="self.all(s, s.resourcePrefixes.all(p, p == '/' || p.endsWith('#')))",message="non-default prefixes must end with '#'" | ||
| Shards []UnmanagedEtcdShardSpec `json:"shards,omitempty"` | ||
| } |
There was a problem hiding this comment.
Validation gap allows invalid UnmanagedEtcdSpec state.
The endpoint and tls fields are marked optional, and documentation states they're "ignored when shards is specified." However, there's no validation ensuring that when shards is NOT specified, endpoint and tls ARE provided. This allows an invalid state where neither legacy fields nor shards are configured.
Consider adding an XValidation rule:
// +kubebuilder:validation:XValidation:rule="has(self.shards) || (self.endpoint != '' && has(self.tls))",message="either shards must be specified, or both endpoint and tls are required"The mutual exclusivity validation (preventing both shards and endpoint/tls from being set simultaneously) was also flagged in a previous review.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@api/hypershift/v1beta1/hostedcluster_types.go` around lines 2058 - 2085, Add
XValidation rules on the UnmanagedEtcdSpec type to enforce that either Shards is
provided or both Endpoint and TLS are provided, and to prevent supplying
Endpoint/TLS when Shards is set; specifically add a rule like "has(self.shards)
|| (self.endpoint != '' && has(self.tls))" with an appropriate message, and a
second rule that forbids endpoint/tls when shards exists (e.g.
"not(has(self.shards)) || (self.endpoint == '' && not(has(self.tls)))") so
UnmanagedEtcdSpec (type name) cannot be left without configuration and cannot
mix legacy fields (Endpoint, TLS) with Shards.
| pairs := strings.Split(def, ",") | ||
|
|
||
| var hasName, hasPrefixes bool | ||
|
|
||
| for _, pair := range pairs { | ||
| kv := strings.SplitN(pair, "=", 2) | ||
| if len(kv) != 2 { | ||
| return shard, fmt.Errorf("invalid key=value pair: %s", pair) | ||
| } |
There was a problem hiding this comment.
Comma-delimited parsing conflicts with cron syntax.
parseInlineShard splits on every ,, but valid cron expressions can contain commas (for example 0,30 * * * *). That makes a documented backup-schedule value impossible to express with --etcd-shard.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/cluster/core/create.go` around lines 1302 - 1310, parseInlineShard
currently naively splits def on every comma which breaks cron expressions;
change the parsing in parseInlineShard so commas that occur inside a value (e.g.
a cron schedule) are not treated as separators: when iterating pairs from
strings.Split(def, ",") if strings.SplitN(pair, "=", 2) yields len != 2 then
treat that token as part of the previous value (append ","+pair to the last
accumulated token) instead of returning an error; update the loop that builds
pairs and the kv handling to only error if you encounter a token without "="
when there is no previous token to attach to, keeping references to def, pairs,
and the kv split logic for location.
| # If /var/lib/data is not empty we exit early, since this means an etcd database already exists | ||
| mkdir -p /var/lib/data | ||
| [ "$(ls -A /var/lib/data)" ] && echo "/var/lib/data not empty, not restoring snapshot" && exit 0 |
There was a problem hiding this comment.
Missing shebang line.
The script lacks a shebang (#!/bin/bash or #!/bin/sh) at the start. This could cause issues depending on how the script is executed, as the shell interpreter won't be explicitly specified.
🔧 Proposed fix
+#!/bin/sh
# If /var/lib/data is not empty we exit early, since this means an etcd database already exists
mkdir -p /var/lib/data📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # If /var/lib/data is not empty we exit early, since this means an etcd database already exists | |
| mkdir -p /var/lib/data | |
| [ "$(ls -A /var/lib/data)" ] && echo "/var/lib/data not empty, not restoring snapshot" && exit 0 | |
| #!/bin/sh | |
| # If /var/lib/data is not empty we exit early, since this means an etcd database already exists | |
| mkdir -p /var/lib/data | |
| [ "$(ls -A /var/lib/data)" ] && echo "/var/lib/data not empty, not restoring snapshot" && exit 0 |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@control-plane-operator/controllers/hostedcontrolplane/etcd/etcd-init.sh`
around lines 1 - 3, Add a shebang as the first line of etcd-init.sh (before the
existing mkdir -p /var/lib/data and the ls check) so the script runs with a
defined shell interpreter (e.g., use #!/bin/bash or #!/bin/sh); ensure the
shebang is the very first line to guarantee correct interpreter selection when
executing the etcd-init.sh script that contains the mkdir -p /var/lib/data and
the [ "$(ls -A /var/lib/data)" ] check.
| # and also when a pre-signed URL is expired. | ||
| # In this case we get an XML file which can be detected with `file` so we show | ||
| # the contents via the logs then exit with an error status | ||
| curl -o /tmp/snapshot ${RESTORE_URL} |
There was a problem hiding this comment.
Quote the URL variable to prevent word splitting.
The ${RESTORE_URL} variable should be quoted to prevent unexpected behavior if the URL contains spaces or special characters.
🔧 Proposed fix
-curl -o /tmp/snapshot ${RESTORE_URL}
+curl -o /tmp/snapshot "${RESTORE_URL}"🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 12-12: Double quote to prevent globbing and word splitting.
(SC2086)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@control-plane-operator/controllers/hostedcontrolplane/etcd/etcd-init.sh` at
line 12, The curl invocation using the RESTORE_URL variable is unquoted which
can cause word-splitting or unexpected behavior; update the command that writes
the snapshot (the curl call that currently references ${RESTORE_URL}) to quote
the variable (use "${RESTORE_URL}") so the entire URL is passed as a single
argument to curl.
| ) (*ShardParams, error) { | ||
| ipv4, err := util.IsIPv4CIDR(hcp.Spec.Networking.ClusterNetwork[0].CIDR.String()) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("error checking the ClusterNetworkCIDR: %w", err) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if ClusterNetwork validation is performed elsewhere before this function is called
rg -n "ClusterNetwork" --type go -A3 -B3 | head -100Repository: openshift/hypershift
Length of output: 10172
🏁 Script executed:
#!/bin/bash
# Find the params.go file in control-plane-operator
find . -name "params.go" -path "*/hostedcontrolplane/etcd/*" | head -5Repository: openshift/hypershift
Length of output: 134
🏁 Script executed:
#!/bin/bash
# Look for HostedControlPlane definition and ClusterNetwork field
rg -n "type HostedControlPlane struct" --type go -A 20 | head -50Repository: openshift/hypershift
Length of output: 5643
🏁 Script executed:
#!/bin/bash
# Search for Networking struct definition
rg -n "type.*Networking struct" --type go -A 10 | head -50Repository: openshift/hypershift
Length of output: 6178
🏁 Script executed:
#!/bin/bash
# Read the params.go file to see the actual code
cat -n ./control-plane-operator/controllers/hostedcontrolplane/etcd/params.go | sed -n '50,85p'Repository: openshift/hypershift
Length of output: 1494
🏁 Script executed:
#!/bin/bash
# Find the Networking field in HostedControlPlaneSpec
rg -n "Networking.*struct" --type go -A 10 | head -40Repository: openshift/hypershift
Length of output: 4227
🏁 Script executed:
#!/bin/bash
# Find HostedControlPlaneSpec definition and Networking field
rg -n "type HostedControlPlaneSpec struct" --type go -A 150 | grep -A 150 "type HostedControlPlaneSpec" | grep -E "(Networking|ClusterNetwork)" | head -20Repository: openshift/hypershift
Length of output: 320
🏁 Script executed:
#!/bin/bash
# Search for ClusterNetwork field type definition in api directory
rg -n "ClusterNetwork\s*\[\]" --type go | head -10Repository: openshift/hypershift
Length of output: 1707
🏁 Script executed:
#!/bin/bash
# Find the Networking field definition
rg -n "Networking\s*ClusterNetworking" --type go | head -10Repository: openshift/hypershift
Length of output: 559
🏁 Script executed:
#!/bin/bash
# Read ClusterNetworking struct definition to see validation constraints
rg -n "type ClusterNetworking struct" --type go -A 30 | head -80Repository: openshift/hypershift
Length of output: 7547
🏁 Script executed:
#!/bin/bash
# Check if there's MinItems validation on ClusterNetwork field
rg -n "ClusterNetwork.*kubebuilder" --type go -B 2 -A 2 | head -30Repository: openshift/hypershift
Length of output: 46
🏁 Script executed:
#!/bin/bash
# Check if there are other places accessing ClusterNetwork[0]
rg -n "ClusterNetwork\[0\]" --type goRepository: openshift/hypershift
Length of output: 1887
🏁 Script executed:
#!/bin/bash
# Check if validation is enforced before this function is called
rg -n "NewShardParams" --type go -B 3 -A 3 | head -40Repository: openshift/hypershift
Length of output: 1740
Add bounds check on ClusterNetwork slice access for defensive programming.
Line 67 accesses hcp.Spec.Networking.ClusterNetwork[0] without verifying the slice is non-empty. While Kubernetes schema validation with MinItems=1 guarantees this field is populated for valid objects, an explicit bounds check improves code robustness and clarity.
Proposed fix
func NewShardParams(
hcp *hyperv1.HostedControlPlane,
etcdImage string,
controlPlaneOperatorImage string,
clusterEtcdOperatorImage string,
snapshotRestored bool,
securityContext *corev1.PodSecurityContext,
) (*ShardParams, error) {
+ if len(hcp.Spec.Networking.ClusterNetwork) == 0 {
+ return nil, fmt.Errorf("ClusterNetwork must have at least one entry")
+ }
ipv4, err := util.IsIPv4CIDR(hcp.Spec.Networking.ClusterNetwork[0].CIDR.String())
if err != nil {
return nil, fmt.Errorf("error checking the ClusterNetworkCIDR: %w", err)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ) (*ShardParams, error) { | |
| ipv4, err := util.IsIPv4CIDR(hcp.Spec.Networking.ClusterNetwork[0].CIDR.String()) | |
| if err != nil { | |
| return nil, fmt.Errorf("error checking the ClusterNetworkCIDR: %w", err) | |
| } | |
| ) (*ShardParams, error) { | |
| if len(hcp.Spec.Networking.ClusterNetwork) == 0 { | |
| return nil, fmt.Errorf("ClusterNetwork must have at least one entry") | |
| } | |
| ipv4, err := util.IsIPv4CIDR(hcp.Spec.Networking.ClusterNetwork[0].CIDR.String()) | |
| if err != nil { | |
| return nil, fmt.Errorf("error checking the ClusterNetworkCIDR: %w", err) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@control-plane-operator/controllers/hostedcontrolplane/etcd/params.go` around
lines 66 - 70, The code accesses hcp.Spec.Networking.ClusterNetwork[0] without
checking the slice length; add a defensive bounds check before calling
util.IsIPv4CIDR in the function that returns (*ShardParams, error) so that if
len(hcp.Spec.Networking.ClusterNetwork) == 0 you return a clear error (e.g.,
fmt.Errorf("empty ClusterNetwork in HCP spec")) instead of panicking; then
proceed to call
util.IsIPv4CIDR(hcp.Spec.Networking.ClusterNetwork[0].CIDR.String()) as before.
| if defaultShard.Name == "default" { | ||
| sts.Name = "etcd" | ||
| sts.Spec.ServiceName = "etcd-discovery" | ||
| } else { | ||
| sts.Name = fmt.Sprintf("etcd-%s", defaultShard.Name) | ||
| sts.Spec.ServiceName = fmt.Sprintf("etcd-%s-discovery", defaultShard.Name) |
There was a problem hiding this comment.
Use the canonical shard discovery Service name.
Named shards are created as etcd-discovery-<shard>, not etcd-<shard>-discovery. With the current value, both sts.Spec.ServiceName and ETCD_INITIAL_CLUSTER point at non-existent DNS names for named shards.
Suggested fix
if defaultShard.Name == "default" {
sts.Name = "etcd"
sts.Spec.ServiceName = "etcd-discovery"
} else {
sts.Name = fmt.Sprintf("etcd-%s", defaultShard.Name)
- sts.Spec.ServiceName = fmt.Sprintf("etcd-%s-discovery", defaultShard.Name)
+ sts.Spec.ServiceName = fmt.Sprintf("etcd-discovery-%s", defaultShard.Name)
}
@@
if defaultShard.Name == "default" {
podPrefix = "etcd"
discoveryService = "etcd-discovery"
} else {
podPrefix = fmt.Sprintf("etcd-%s", defaultShard.Name)
- discoveryService = fmt.Sprintf("etcd-%s-discovery", defaultShard.Name)
+ discoveryService = fmt.Sprintf("etcd-discovery-%s", defaultShard.Name)
}Also applies to: 70-80
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go`
around lines 45 - 50, The Service name for named shards is using the wrong
convention; when defaultShard.Name != "default" update sts.Spec.ServiceName (and
any code building ETCD_INITIAL_CLUSTER entries) to use the canonical pattern
"etcd-discovery-<shard>" instead of "etcd-<shard>-discovery"; leave sts.Name as
"etcd-<shard>" if that is desired but ensure every reference that computes the
discovery Service (including where ETCD_INITIAL_CLUSTER or related env vars are
constructed) uses fmt.Sprintf("etcd-discovery-%s", defaultShard.Name) for named
shards.
| } else if hcp.Spec.Etcd.ManagementType == hyperv1.Managed { | ||
| // Update wait-for-etcd init container to wait for ALL shard services | ||
| util.UpdateContainer("wait-for-etcd", deployment.Spec.Template.Spec.InitContainers, func(c *corev1.Container) { | ||
| shards := hcp.Spec.Etcd.Managed.EffectiveShards(hcp) | ||
|
|
||
| // Build list of all client service names | ||
| var serviceNames []string | ||
| for _, shard := range shards { | ||
| var serviceName string | ||
| if shard.Name == "default" { | ||
| serviceName = "etcd-client" | ||
| } else { | ||
| serviceName = fmt.Sprintf("etcd-client-%s", shard.Name) | ||
| } | ||
| serviceNames = append(serviceNames, serviceName) | ||
| } | ||
|
|
||
| // Build script that waits for all services | ||
| var scriptLines []string | ||
| scriptLines = append(scriptLines, "#!/bin/sh") | ||
| for _, serviceName := range serviceNames { | ||
| scriptLines = append(scriptLines, fmt.Sprintf("while ! nslookup %s.$(POD_NAMESPACE).svc; do sleep 1; done", serviceName)) | ||
| } | ||
|
|
||
| c.Args = []string{"-c", strings.Join(scriptLines, "\n")} | ||
| }) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Find all etcd client service naming patterns to verify consistency
echo "=== Searching for etcd-client service naming patterns ==="
rg -n 'etcd-client-%s|etcd-%s-client|etcd-client-' --type go -C2Repository: openshift/hypershift
Length of output: 17507
Service naming inconsistency between etcd service creation and deployment wait logic.
In control-plane-operator/controllers/hostedcontrolplane/v2/etcd/service.go, non-default shards create services with the format etcd-{shard}-client (e.g., etcd-events-client), but this code expects etcd-client-{shard} (e.g., etcd-client-events). The wait-for-etcd init container will fail DNS resolution for non-default shards because the service names won't match those actually created.
Align the naming format in this file with v2/etcd/service.go, or update service.go to use the etcd-client-{shard} format consistently across the codebase (also verify kas/params.go and other components use the same format).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment.go`
around lines 98 - 124, The wait-for-etcd init container builds DNS names using
the wrong shard service naming convention; in UpdateContainer("wait-for-etcd",
...) where serviceName is constructed for non-default shards, change the
fmt.Sprintf pattern from "etcd-client-%s" to "etcd-%s-client" so it matches the
service names produced by hcp.Spec.Etcd.Managed.EffectiveShards and the service
creation in v2/etcd/service.go; then run a quick grep (including kas/params.go
and other kas-related components) to ensure all consumers use the same
"etcd-{shard}-client" format and update any mismatches.
| // buildUnmanagedEtcdConfig constructs etcd URLs for unmanaged etcd shards | ||
| func buildUnmanagedEtcdConfig(shards []hyperv1.UnmanagedEtcdShardSpec) (string, map[string]string) { | ||
| var defaultURL string | ||
| overrides := make(map[string]string) | ||
|
|
||
| for _, shard := range shards { | ||
| for _, prefix := range shard.ResourcePrefixes { | ||
| if prefix == "/" { | ||
| defaultURL = shard.Endpoint | ||
| } else { | ||
| overrides[prefix] = shard.Endpoint | ||
| } | ||
| } | ||
| } | ||
|
|
||
| return defaultURL, overrides | ||
| } |
There was a problem hiding this comment.
Potential empty defaultURL when no shard has "/" prefix.
If all shards have specific ResourcePrefixes (e.g., "/events#", "/coordination.k8s.io/leases#") but none has "/" (catch-all), defaultURL will be empty string. This could cause the kube-apiserver to have no default etcd endpoint.
Consider validating that at least one shard has "/" or handling this case explicitly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@control-plane-operator/controllers/hostedcontrolplane/v2/kas/params.go`
around lines 255 - 271, buildUnmanagedEtcdConfig can return an empty defaultURL
when no shard has the "/" prefix; update the function (buildUnmanagedEtcdConfig)
to guarantee a non-empty default by either (A) adding a fallback: if defaultURL
== "" and len(shards) > 0 then set defaultURL = shards[0].Endpoint (or pick the
first shard you deem appropriate) and optionally add a warning log, or (B) more
robustly change the signature to return an error (string, map[string]string,
error), validate that a catch‑all "/" shard exists and return an error to
callers if it does not—apply the chosen fix and update all callers to handle the
fallback or the new error return.
| case hyperv1.Managed: | ||
| // For managed etcd with sharding, find the default shard (with "/" prefix) | ||
| if hcp.Spec.Etcd.Managed != nil { | ||
| shards := hcp.Spec.Etcd.Managed.EffectiveShards(hcp) | ||
| for _, shard := range shards { | ||
| for _, prefix := range shard.ResourcePrefixes { | ||
| if prefix == "/" { | ||
| // Found the default shard - use its client service | ||
| var serviceName string | ||
| if shard.Name == "default" { | ||
| serviceName = "etcd-client" | ||
| } else { | ||
| serviceName = fmt.Sprintf("etcd-client-%s", shard.Name) | ||
| } | ||
| cfg.StorageConfig.EtcdConnectionInfo.URLs = []string{ | ||
| fmt.Sprintf("https://%s.%s.svc:2379", serviceName, hcp.Namespace), | ||
| } | ||
| break | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Service naming inconsistency with etcd/service.go.
The service name format etcd-client-%s (line 106) does not match etcd/service.go which uses etcd-%s-client. This is the same inconsistency flagged in kas/deployment.go.
Additionally, the break on line 111 only exits the inner loop over ResourcePrefixes. After finding the "/" prefix and setting the URL, the outer loop continues iterating through remaining shards unnecessarily.
🐛 Proposed fix
case hyperv1.Managed:
// For managed etcd with sharding, find the default shard (with "/" prefix)
if hcp.Spec.Etcd.Managed != nil {
shards := hcp.Spec.Etcd.Managed.EffectiveShards(hcp)
+ shardLoop:
for _, shard := range shards {
for _, prefix := range shard.ResourcePrefixes {
if prefix == "/" {
// Found the default shard - use its client service
var serviceName string
if shard.Name == "default" {
serviceName = "etcd-client"
} else {
- serviceName = fmt.Sprintf("etcd-client-%s", shard.Name)
+ serviceName = fmt.Sprintf("etcd-%s-client", shard.Name)
}
cfg.StorageConfig.EtcdConnectionInfo.URLs = []string{
fmt.Sprintf("https://%s.%s.svc:2379", serviceName, hcp.Namespace),
}
- break
+ break shardLoop
}
}
}
}
}…des prefixes Non-core API group resource prefixes (e.g., /coordination.k8s.io/leases#, /events.k8s.io/events#) were passed to KAS --etcd-servers-overrides with a leading slash, which KAS rejects. KAS expects: - Core group resources: /resource#servers (leading slash preserved) - Non-core group resources: group/resource#servers (no leading slash) Add normalizeOverridePrefix() to detect non-core groups by the presence of a dot in the group segment and strip the leading slash for those entries. Core resources (e.g., /events#) are unaffected. Signed-off-by: Jesse Jaggars <jjaggars@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The oauth-apiserver was CrashLoopBackOff because it tried to connect to
etcd-client-default instead of etcd-client for the default shard. Apply
the same naming logic used in kas/params.go: the "default" shard uses
base name etcd-client, other shards use etcd-client-{name}.
Also fix nil pointer dereference when Managed is nil but ManagementType
is Managed.
Signed-off-by: Jesse Jaggars <jjaggars@redhat.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
control-plane-operator/controllers/hostedcontrolplane/v2/oauth_apiserver/deployment_test.go (1)
60-76: Add NO_PROXY assertions for sharded managed-etcd paths.These tests validate URL selection well, but the same code path also builds shard-specific
NO_PROXYentries (short host + FQDN). Adding assertions here would better guard the oauth-apiserver sharded-etcd regression fix.✅ Suggested test additions
@@ validate: func(t *testing.T, g *GomegaWithT, hcp *hyperv1.HostedControlPlane) { @@ container := util.FindContainer(ComponentName, deployment.Spec.Template.Spec.Containers) g.Expect(container).ToNot(BeNil()) g.Expect(container.Args).To(ContainElement("--etcd-servers=https://etcd-client.test-ns.svc:2379")) g.Expect(container.Args).ToNot(ContainElement(ContainSubstring("etcd-client-default"))) + noProxyEnv := util.FindEnvVar("NO_PROXY", container.Env) + g.Expect(noProxyEnv).ToNot(BeNil()) + g.Expect(noProxyEnv.Value).To(ContainSubstring("etcd-client")) + g.Expect(noProxyEnv.Value).To(ContainSubstring("etcd-client.test-ns.svc")) }, }, @@ validate: func(t *testing.T, g *GomegaWithT, hcp *hyperv1.HostedControlPlane) { @@ container := util.FindContainer(ComponentName, deployment.Spec.Template.Spec.Containers) g.Expect(container).ToNot(BeNil()) g.Expect(container.Args).To(ContainElement("--etcd-servers=https://etcd-client-core.test-ns.svc:2379")) + noProxyEnv := util.FindEnvVar("NO_PROXY", container.Env) + g.Expect(noProxyEnv).ToNot(BeNil()) + g.Expect(noProxyEnv.Value).To(ContainSubstring("etcd-client-core")) + g.Expect(noProxyEnv.Value).To(ContainSubstring("etcd-client-core.test-ns.svc")) }, },Also applies to: 107-122
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/controllers/hostedcontrolplane/v2/oauth_apiserver/deployment_test.go` around lines 60 - 76, The test's validate closure should be extended to assert that the oauth-apiserver container's NO_PROXY entries include both the shard short host and its FQDN for sharded managed-etcd; after calling adaptDeployment and locating the container via util.FindContainer(ComponentName, ...), add Gomega assertions on container.Args to ContainElement strings like "--env=NO_PROXY=..." or the specific NO_PROXY arg entries that include the shard short host (e.g., "etcd-client-default") and the FQDN (e.g., "etcd-client-default.test-ns.svc") so both forms are present; apply the same additions to the other validate block that tests the alternate scenario.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In
`@control-plane-operator/controllers/hostedcontrolplane/v2/oauth_apiserver/deployment_test.go`:
- Around line 60-76: The test's validate closure should be extended to assert
that the oauth-apiserver container's NO_PROXY entries include both the shard
short host and its FQDN for sharded managed-etcd; after calling adaptDeployment
and locating the container via util.FindContainer(ComponentName, ...), add
Gomega assertions on container.Args to ContainElement strings like
"--env=NO_PROXY=..." or the specific NO_PROXY arg entries that include the shard
short host (e.g., "etcd-client-default") and the FQDN (e.g.,
"etcd-client-default.test-ns.svc") so both forms are present; apply the same
additions to the other validate block that tests the alternate scenario.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 857eb9b6-449d-4cc7-996f-f36ef4445243
📒 Files selected for processing (2)
control-plane-operator/controllers/hostedcontrolplane/v2/oauth_apiserver/deployment.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/oauth_apiserver/deployment_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- control-plane-operator/controllers/hostedcontrolplane/v2/oauth_apiserver/deployment.go
Remove unused etcdv2 import, register ServiceMonitor/PDB types in test schemes, fix discovery service naming convention in test expectations, set Managed spec in KAS params test, and regenerate CRD manifests. Signed-off-by: Jesse Jaggars <jjaggars@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
control-plane-operator/controllers/hostedcontrolplane/etcd/shards_test.go (1)
320-323:⚠️ Potential issue | 🟡 MinorAssert
NotFoundexplicitly after cleanup delete.On Line 321, the test currently passes for any error, which can hide unrelated failures. Validate
apierrors.IsNotFound(err)instead.Proposed assertion hardening diff
import ( "context" "testing" hyperv1 "github.com/openshift/hypershift/api/hypershift/v1beta1" "github.com/openshift/hypershift/api/util/ipnet" "github.com/openshift/hypershift/control-plane-operator/controllers/hostedcontrolplane/manifests" "github.com/openshift/hypershift/support/metrics" "github.com/openshift/hypershift/support/upsert" prometheusoperatorv1 "github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring/v1" appsv1 "k8s.io/api/apps/v1" corev1 "k8s.io/api/core/v1" policyv1 "k8s.io/api/policy/v1" + apierrors "k8s.io/apimachinery/pkg/api/errors" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" "k8s.io/apimachinery/pkg/runtime" "k8s.io/utils/ptr" @@ // Verify orphaned StatefulSet was deleted err = fakeClient.Get(ctx, client.ObjectKeyFromObject(orphanedSts), orphanedSts) - if err == nil { + if err == nil { t.Error("Expected orphaned StatefulSet to be deleted") + } + if !apierrors.IsNotFound(err) { + t.Fatalf("Expected NotFound after deletion, got: %v", err) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/controllers/hostedcontrolplane/etcd/shards_test.go` around lines 320 - 323, The test currently treats any non-nil error as success after calling fakeClient.Get(ctx, client.ObjectKeyFromObject(orphanedSts), orphanedSts); change this to explicitly assert that the error is a NotFound error by using apierrors.IsNotFound(err) (import "k8s.io/apimachinery/pkg/api/errors" as apierrors) and fail the test if the error is nil or not a NotFound (e.g., if err == nil -> t.Error("expected orphaned StatefulSet to be deleted") else if !apierrors.IsNotFound(err) -> t.Fatalf("unexpected error getting orphaned StatefulSet: %v", err)); keep references to orphanedSts and fakeClient.Get to locate the code.
🧹 Nitpick comments (1)
control-plane-operator/controllers/hostedcontrolplane/etcd/shards_test.go (1)
28-33: Consider extracting shared test setup helpers.Scheme registration and HCP scaffolding are repeated across tests. A small helper (
newTestScheme,newTestHCP) would cut duplication and make future shard-case additions easier.Also applies to: 94-99, 185-188, 234-237, 283-289
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/controllers/hostedcontrolplane/etcd/shards_test.go` around lines 28 - 33, Tests repeat scheme registration and hosted control plane scaffolding; extract helpers to reduce duplication by adding a newTestScheme helper that calls runtime.NewScheme() and registers appsv1, corev1, policyv1, hyperv1, prometheusoperatorv1 (replacing the repeated runtime.NewScheme() + AddToScheme calls), and a newTestHCP helper that returns a pre-populated hyperv1.HostedControlPlane used by tests (replace the repeated HCP scaffolding). Update tests to call newTestScheme() and newTestHCP() where the scheme variable and HCP scaffolding currently appear (references: runtime.NewScheme(), AddToScheme calls, and the HostedControlPlane construction).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@control-plane-operator/controllers/hostedcontrolplane/etcd/shards_test.go`:
- Around line 320-323: The test currently treats any non-nil error as success
after calling fakeClient.Get(ctx, client.ObjectKeyFromObject(orphanedSts),
orphanedSts); change this to explicitly assert that the error is a NotFound
error by using apierrors.IsNotFound(err) (import
"k8s.io/apimachinery/pkg/api/errors" as apierrors) and fail the test if the
error is nil or not a NotFound (e.g., if err == nil -> t.Error("expected
orphaned StatefulSet to be deleted") else if !apierrors.IsNotFound(err) ->
t.Fatalf("unexpected error getting orphaned StatefulSet: %v", err)); keep
references to orphanedSts and fakeClient.Get to locate the code.
---
Nitpick comments:
In `@control-plane-operator/controllers/hostedcontrolplane/etcd/shards_test.go`:
- Around line 28-33: Tests repeat scheme registration and hosted control plane
scaffolding; extract helpers to reduce duplication by adding a newTestScheme
helper that calls runtime.NewScheme() and registers appsv1, corev1, policyv1,
hyperv1, prometheusoperatorv1 (replacing the repeated runtime.NewScheme() +
AddToScheme calls), and a newTestHCP helper that returns a pre-populated
hyperv1.HostedControlPlane used by tests (replace the repeated HCP scaffolding).
Update tests to call newTestScheme() and newTestHCP() where the scheme variable
and HCP scaffolding currently appear (references: runtime.NewScheme(),
AddToScheme calls, and the HostedControlPlane construction).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro Plus
Run ID: 913f63c9-12cb-4099-b290-930b0de63d15
⛔ Files ignored due to path filters (34)
api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/AAA_ungated.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/AutoNodeKarpenter.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ClusterUpdateAcceptRisks.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ClusterVersionOperatorConfiguration.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDC.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDCWithUIDAndExtraClaimMappings.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDCWithUpstreamParity.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/GCPPlatform.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/HCPEtcdBackup.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/HyperShiftOnlyDynamicResourceAllocation.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ImageStreamImportMode.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/KMSEncryptionProvider.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/OpenStack.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/AAA_ungated.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/AutoNodeKarpenter.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ClusterUpdateAcceptRisks.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ClusterVersionOperatorConfiguration.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ExternalOIDC.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ExternalOIDCWithUIDAndExtraClaimMappings.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ExternalOIDCWithUpstreamParity.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/GCPPlatform.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/HCPEtcdBackup.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/HyperShiftOnlyDynamicResourceAllocation.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ImageStreamImportMode.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/KMSEncryptionProvider.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/OpenStack.yamlis excluded by!**/zz_generated.featuregated-crd-manifests/**cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-CustomNoUpgrade.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-Default.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-TechPreviewNoUpgrade.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-CustomNoUpgrade.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-Default.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-TechPreviewNoUpgrade.crd.yamlis excluded by!**/zz_generated.crd-manifests/**,!cmd/install/assets/**/*.yamldocs/content/reference/aggregated-docs.mdis excluded by!docs/content/reference/aggregated-docs.mddocs/content/reference/api.mdis excluded by!docs/content/reference/api.md
📒 Files selected for processing (7)
control-plane-operator/controllers/hostedcontrolplane/etcd/shards_test.gocontrol-plane-operator/controllers/hostedcontrolplane/etcd/statefulset_test.gocontrol-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/kas/params.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/kas/params_test.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/oauth_apiserver/deployment.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/oauth_apiserver/deployment_test.go
✅ Files skipped from review due to trivial changes (3)
- control-plane-operator/controllers/hostedcontrolplane/v2/oauth_apiserver/deployment.go
- control-plane-operator/controllers/hostedcontrolplane/etcd/statefulset_test.go
- control-plane-operator/controllers/hostedcontrolplane/v2/oauth_apiserver/deployment_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
- control-plane-operator/controllers/hostedcontrolplane/v2/kas/params_test.go
- control-plane-operator/controllers/hostedcontrolplane/v2/kas/params.go
- control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go
Fix kubeapilinter violations on new etcd sharding API types: add MinLength markers, use XValidation instead of Pattern, use +default instead of +kubebuilder:default, add omitempty/omitzero tags, and convert optional pointer fields to value types with omitzero. Fix gci import ordering in shards_test.go, convert if/else chain to tagged switch in kas/deployment.go, and update CLI code for non-pointer Storage/BackupSchedule fields. Regenerate CRDs and docs. Signed-off-by: Jesse Jaggars <jjaggars@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (18)
api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/AutoNodeKarpenter.yaml (1)
2767-2901:⚠️ Potential issue | 🟠 MajorRequire exactly one unmanaged etcd mode.
After the legacy
requiredfields were removed,managementType: Unmanagednow accepts{}and partial legacy configs likeendpointwithouttls. That punts an obviously invalid spec to reconciliation time. The schema should require eithershards, or both legacyendpointandtls—and ideally forbid mixing both modes.Suggested fix
type: object + x-kubernetes-validations: + - message: either shards or both legacy endpoint and tls must be configured + rule: 'has(self.shards) || (has(self.endpoint) && has(self.tls))' + - message: legacy endpoint/tls cannot be combined with shards + rule: '!has(self.shards) || (!has(self.endpoint) && !has(self.tls))'Also applies to: 2905-2913
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/AutoNodeKarpenter.yaml` around lines 2767 - 2901, The schema for Unmanaged etcd mode currently allows empty or partial legacy configs; update the manifest validations so that when managementType is Unmanaged the object requires either the "shards" array OR both legacy "endpoint" and "tls" (and rejects configs mixing them). Concretely, add x-kubernetes-validations on the Unmanaged spec (the block that defines endpoint, shards, and tls) with rules: 1) ensure (self.shards != null) XOR (self.endpoint != null && self.tls != null) so exactly one mode is provided, 2) if shards is present forbid endpoint/tls, and 3) if endpoint or tls is present require both endpoint and tls; reference the existing properties "shards", "endpoint", "tls" and the Unmanaged type name (UnmanagedEtcdShardSpec / the parent Unmanaged spec) when adding these validation rules.api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/KMSEncryptionProvider.yaml (1)
2788-2922:⚠️ Potential issue | 🟠 MajorUnmanaged mode now accepts configs with no usable etcd connection.
After removing the old required
endpoint/tlspair, there is no replacement cross-field validation requiring eithershardsor legacyendpointandtls. As written,spec.etcd.unmanaged: {}and partial legacy configs are valid against the CRD but cannot be reconciled.Suggested fix
type: object + x-kubernetes-validations: + - message: either shards or both endpoint and tls must be set + rule: 'has(self.shards) || (has(self.endpoint) && has(self.tls))'If mixed legacy + shard config should also be rejected instead of ignored, this rule should be tightened further in the source API type before regenerating the manifest.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/KMSEncryptionProvider.yaml` around lines 2788 - 2922, The UnmanagedEtcdSpec validation is missing a cross-field rule to require either a non-empty shards array or both legacy endpoint and tls, so empty/partial configs like spec.etcd.unmanaged: {} validate but cannot be reconciled; add an x-kubernetes-validations entry on the UnmanagedEtcdSpec (the object that currently has properties endpoint, shards, tls) with a rule such as self.shards != null && self.shards.size() > 0 || (self.endpoint != null && self.tls != null) and a clear message like "must specify either shards or both endpoint and tls" to reject empty/partial configs (adjust CEL expression syntax to match the other rules in this file).control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go (1)
125-136:⚠️ Potential issue | 🟠 MajorUse the resolved shard storage for restore as well.
The restore init container still reads
managedEtcdSpec.Storage.RestoreSnapshotURL, so a shard-levelstorage.restoreSnapshotURLoverride is ignored even though PVC sizing/class now usesdefaultShard.Storage.Suggested fix
+ storage := managedEtcdSpec.Storage + if defaultShard.Storage.Type != "" { + storage = defaultShard.Storage + } + snapshotRestored := meta.IsStatusConditionTrue(hcp.Status.Conditions, string(hyperv1.EtcdSnapshotRestored)) - if managedEtcdSpec != nil && len(managedEtcdSpec.Storage.RestoreSnapshotURL) > 0 && !snapshotRestored { + if len(storage.RestoreSnapshotURL) > 0 && !snapshotRestored { sts.Spec.Template.Spec.InitContainers = append(sts.Spec.Template.Spec.InitContainers, - buildEtcdInitContainer(managedEtcdSpec.Storage.RestoreSnapshotURL[0]), // RestoreSnapshotURL can only have 1 entry + buildEtcdInitContainer(storage.RestoreSnapshotURL[0]), // RestoreSnapshotURL can only have 1 entry ) } - - // adapt PersistentVolume using shard-specific storage or default - storage := managedEtcdSpec.Storage - if defaultShard.Storage.Type != "" { - storage = defaultShard.Storage - }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go` around lines 125 - 136, The restore init container is still using managedEtcdSpec.Storage.RestoreSnapshotURL directly so shard-level overrides are ignored; modify the logic that appends buildEtcdInitContainer to use the resolved storage variable (the one set from defaultShard.Storage fallback) instead of managedEtcdSpec.Storage.RestoreSnapshotURL, e.g. reference storage.RestoreSnapshotURL (and check for nil/len>0) when calling buildEtcdInitContainer and when computing snapshotRestored so shard-specific storage.restoreSnapshotURL is honored; update usages involving snapshotRestored, buildEtcdInitContainer, and sts.Spec.Template.Spec.InitContainers accordingly.cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-CustomNoUpgrade.crd.yaml (1)
3549-3683:⚠️ Potential issue | 🟠 MajorRequire either shard config or legacy endpoint/tls for unmanaged etcd.
After dropping the old
requiredfields,managementType: Unmanagednow accepts unusable specs likeunmanaged: {}. The CRD should reject configs that provide neithershardsnor legacyendpoint+tls.Suggested validation
type: object + x-kubernetes-validations: + - message: unmanaged etcd requires shards or legacy endpoint/tls + rule: 'has(self.shards) || (has(self.endpoint) && has(self.tls))'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-CustomNoUpgrade.crd.yaml` around lines 3549 - 3683, The Unmanaged etcd schema currently allows an empty spec (unmanaged: {}) because there's no validation requiring either shards OR the legacy endpoint+tls pair; add an x-kubernetes-validations CEL rule on the UnmanagedEtcdSpec object (the same object that declares properties endpoint, shards, tls) that enforces "shards is present OR (endpoint and tls are present)". Implement it as a validation entry with a clear message like "must specify either shards or both endpoint and tls" and a rule such as: self.shards != null || (self.endpoint != null && self.tls != null).api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ImageStreamImportMode.yaml (1)
2647-2781:⚠️ Potential issue | 🟠 MajorRestore validation for legacy unmanaged mode.
After dropping the object-level required fields,
managementType: Unmanagednow acceptsunmanaged: {}whenshardsis omitted. That pushes an invalid config past admission even though neither the legacyendpoint/tlspath nor the shard list is actually configured.Suggested validation fix
type: object + x-kubernetes-validations: + - message: endpoint and tls are required when shards are not specified + rule: has(self.shards) ? true : has(self.endpoint) && has(self.tls)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ImageStreamImportMode.yaml` around lines 2647 - 2781, When managementType == "Unmanaged" ensure legacy single-etcd mode cannot be empty by adding an object-level validation that requires either shards be present with at least one item OR the legacy endpoint+tls fields be populated; update the validation on the Unmanaged field/object (referencing managementType, unmanaged, shards, endpoint, tls) to enforce: if self.managementType == 'Unmanaged' then (self.shards != null && self.shards.size() > 0) || (self.endpoint != null && self.endpoint != '' && self.tls != null), and include a clear error message like "for Unmanaged management either shards must be configured or endpoint and tls must be set".api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDCWithUIDAndExtraClaimMappings.yaml (1)
3185-3318:⚠️ Potential issue | 🟠 MajorReject incomplete or mixed unmanaged etcd configs at admission time.
After making
endpointandtlsoptional, this schema no longer enforces either valid legacy single-etcd mode or valid sharded mode.managementType: Unmanagedcan now admit{},endpointwithouttls, orshardsplus legacy fields that are merely documented as ignored. That turns an API contract error into a reconcile/runtime failure.Suggested validation fix
unmanaged: description: |- unmanaged specifies configuration which enables the control plane to integrate with an externally managed etcd cluster. properties: endpoint: description: |- endpoint is the full etcd cluster client endpoint URL. Used only when shards is not specified (legacy single-etcd mode). When shards are specified, this field is ignored. ... tls: description: |- tls specifies TLS configuration for HTTPS etcd client endpoints. Used only when shards is not specified (legacy single-etcd mode). When shards are specified, this field is ignored. properties: clientSecret: ... required: - clientSecret type: object + x-kubernetes-validations: + - message: endpoint and tls are required when shards are not configured + rule: 'has(self.shards) ? true : has(self.endpoint) && has(self.tls)' + - message: endpoint and tls must be omitted when shards are configured + rule: '!has(self.shards) || (!has(self.endpoint) && !has(self.tls))' type: object🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDCWithUIDAndExtraClaimMappings.yaml` around lines 3185 - 3318, The Unmanaged etcd schema now permits invalid/mixed configs because making endpoint and tls optional allows empty objects or combinations (e.g., endpoint without tls, or shards alongside legacy fields) under managementType: Unmanaged; update the CRD validations to reject incomplete or mixed unmanaged configs by adding validation rules that (1) when managementType == "Unmanaged" require either (A) legacy single-etcd mode: endpoint AND tls.clientSecret present and endpoint startsWith('https://'), or (B) sharded mode: shards present with minItems >=1 and exactly one shard whose resourcePrefixes contains "/" and each non-default prefix endsWith('#'), and (2) forbid providing both shards and legacy endpoint/tls together; reference the top-level fields endpoint, tls, shards, the UnmanagedEtcdShardSpec (items.endpoint, items.tls, items.resourcePrefixes) and the x-kubernetes-validations block to implement these checks.api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/GCPPlatform.yaml (1)
2629-2763:⚠️ Potential issue | 🟠 MajorRequire a complete unmanaged config when
shardsis omitted.After dropping the old
required: [endpoint, tls], the schema now admitsspec.etcd.unmanaged: {}and half-configured legacy specs like onlyendpointor onlytls. This should be guarded with an object-level validation requiring eithershards, or both legacy fields together.Suggested validation
type: object + x-kubernetes-validations: + - message: either shards or both endpoint and tls must be set + rule: has(self.shards) || (has(self.endpoint) && has(self.tls))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/GCPPlatform.yaml` around lines 2629 - 2763, Add an object-level x-kubernetes-validations on the unmanaged etcd spec to require either shards or both legacy fields (endpoint and tls). Specifically, in the same object that contains the existing shards/endpoint/tls definitions add a validation entry with message like "must provide either shards, or both endpoint and tls" and rule: self.shards != null || (self.endpoint != null && self.tls != null). Update the existing x-kubernetes-validations list for the unmanaged object (the one that contains fields named "shards", "endpoint", and "tls") to include this new rule.api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/OpenStack.yaml (1)
2712-2846:⚠️ Potential issue | 🟠 MajorRestore legacy single-etcd validation when
shardsis absent.After relaxing the root-level required fields,
managementType: Unmanagednow accepts{}or partial legacy configs as long asspec.etcd.unmanagedexists. That breaks admission-time validation for the non-sharded path and will surface later as reconcile failures.🔧 Suggested fix
unmanaged: description: |- unmanaged specifies configuration which enables the control plane to integrate with an externally managed etcd cluster. properties: endpoint: description: |- endpoint is the full etcd cluster client endpoint URL. Used only when shards is not specified (legacy single-etcd mode). When shards are specified, this field is ignored. @@ tls: description: |- tls specifies TLS configuration for HTTPS etcd client endpoints. Used only when shards is not specified (legacy single-etcd mode). When shards are specified, this field is ignored. @@ - type: object + type: object + x-kubernetes-validations: + - message: endpoint and tls are required when shards is not specified + rule: has(self.shards) || (has(self.endpoint) && has(self.tls))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/OpenStack.yaml` around lines 2712 - 2846, The Unmanaged etcd spec currently allows missing legacy single-etcd fields when shards is absent; add CEL validations on the same object that enforce the legacy path: when shards is null/absent require endpoint and tls (and ensure endpoint startsWith('https://') and tls.clientSecret exists). Concretely, add x-kubernetes-validations rules on the UnmanagedEtcdSpec object (the same object that defines endpoint, shards, tls) such as: "rule: self.shards == null implies (self.endpoint != null && self.tls != null)" and a rule to validate endpoint format and tls.clientSecret presence (e.g. "self.shards == null implies self.endpoint.startsWith('https://')" and "self.shards == null implies self.tls.clientSecret != null") so legacy single-etcd configs are rejected when required fields are missing.api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/HyperShiftOnlyDynamicResourceAllocation.yaml (1)
2651-2785:⚠️ Potential issue | 🟠 MajorRestore validation for legacy unmanaged configs.
After removing the old required fields, this object no longer enforces the non-sharded path.
managementType: Unmanagedcan now validate with neithershardsnor legacyendpoint/tls, leaving reconciliation with no usable etcd target.🔧 Suggested fix
type: object + x-kubernetes-validations: + - message: either shards must be set, or both endpoint and tls are required + rule: has(self.shards) ? true : has(self.endpoint) && has(self.tls)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/HyperShiftOnlyDynamicResourceAllocation.yaml` around lines 2651 - 2785, The schema no longer enforces that Unmanaged hosts have an etcd target; add an x-kubernetes-validations rule on the same object that requires, when managementType == "Unmanaged", either shards is present (non-empty) OR the legacy single-etcd fields (endpoint and tls.clientSecret) are present; implement using a CEL expression such as: self.managementType != 'Unmanaged' || (self.shards != null && self.shards.size() > 0) || (self.endpoint != null && self.tls != null && self.tls.clientSecret != null) and provide a clear message like "Unmanaged requires either shards or endpoint+tls".api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDCWithUpstreamParity.yaml (1)
3166-3300:⚠️ Potential issue | 🟠 MajorReject empty unmanaged etcd configurations.
Relaxing the legacy top-level requirements is fine for sharded mode, but this block now accepts unusable configs such as
spec.etcd.unmanaged: {}or a legacyendpointwithouttlswhenshardsis absent. That should be rejected at admission time.Suggested fix
unmanaged: description: |- unmanaged specifies configuration which enables the control plane to integrate with an externally managed etcd cluster. properties: endpoint: ... shards: ... tls: ... + x-kubernetes-validations: + - message: set shards, or set both endpoint and tls for legacy single-etcd mode + rule: 'has(self.shards) || (has(self.endpoint) && has(self.tls))' type: objectThis preserves backward compatibility for legacy single-etcd mode while still allowing the new sharded shape.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDCWithUpstreamParity.yaml` around lines 3166 - 3300, The unmanaged etcd schema currently permits empty or incomplete configs (e.g., spec.etcd.unmanaged: {} or legacy endpoint without tls when shards is absent); add validation to the parent unmanaged object so that when shards is not provided the legacy single-etcd mode requires both endpoint and tls and reject an empty unmanaged object, and ensure the existing shards array validation remains unchanged; locate the unmanaged schema block (the object containing properties endpoint, shards, tls and the UnmanagedEtcdShardSpec), add an "x-kubernetes-validations" rule to require (a) if shards is missing then endpoint AND tls must be present and non-empty, and (b) the unmanaged object cannot be entirely empty, using self and exists/has checks referencing endpoint, tls, and shards to enforce these constraints.api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ClusterUpdateAcceptRisks.yaml (1)
2629-2763:⚠️ Potential issue | 🟠 MajorKeep legacy unmanaged mode valid only when
endpointandtlsare both present.After removing the unconditional
requiredlist,managementType: Unmanagednow acceptsunmanaged: {}whenshardsis omitted. That admits an invalid legacy single-etcd config.Suggested fix
type: object + x-kubernetes-validations: + - message: either shards or legacy endpoint/tls must be specified + rule: 'has(self.shards) ? true : has(self.endpoint) && has(self.tls)'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ClusterUpdateAcceptRisks.yaml` around lines 2629 - 2763, The Unmanaged legacy single-etcd mode currently allows an empty unmanaged object when shards is omitted; update the CRD validations so that when managementType == "Unmanaged" (or the top-level unmanaged block is present) and shards is not specified, both endpoint and tls must be present: add a conditional x-kubernetes-validations rule on the unmanaged object (or the parent spec) that checks if shards is empty/absent then self.endpoint exists() AND self.tls exists(), and ensure existing per-shard rules remain unchanged (reference: managementType, unmanaged, endpoint, tls, shards, UnmanagedEtcdShardSpec).api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/GCPPlatform.yaml (1)
2712-2846:⚠️ Potential issue | 🟠 MajorRestore CRD validation for legacy unmanaged etcd mode.
After removing the old top-level
requiredpair, this schema now accepts.spec.etcd.unmanagedwithoutshardsand without a complete legacy{endpoint,tls}config. That lets invalid specs through API validation and defers the failure to reconciliation.Suggested fix
type: object + x-kubernetes-validations: + - message: either shards or both endpoint and tls must be specified + rule: has(self.shards) || (has(self.endpoint) && has(self.tls))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/GCPPlatform.yaml` around lines 2712 - 2846, The schema for UnmanagedEtcdSpec currently permits missing legacy fields because the top-level required constraint was removed; add a conditional validation on the parent object (the UnmanagedEtcdSpec that declares "endpoint", "shards", and "tls") to require that when "shards" is not present, both "endpoint" and "tls" must be present (e.g. add an x-kubernetes-validations rule that checks: if not self.exists(s, s == 'shards') then self.exists(e, e == 'endpoint') and self.exists(t, t == 'tls') with an explicit message like "when shards is not specified, endpoint and tls are required"), referencing the existing properties "shards", "endpoint", and "tls" so legacy single-etcd mode is validated again.cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-Default.crd.yaml (1)
3214-3348:⚠️ Potential issue | 🟠 Major
spec.etcd.unmanagedcan now be empty and still validate.Dropping the old
required: [endpoint, tls]without adding a replacement object-level rule meansmanagementType: Unmanagednow accepts{},{endpoint: ...}, or{tls: ...}. Those specs are incomplete and will fail later instead of being rejected by the API.Suggested fix
type: object + x-kubernetes-validations: + - message: either shards or both endpoint and tls must be specified + rule: 'has(self.shards) || (has(self.endpoint) && has(self.tls))'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-Default.crd.yaml` around lines 3214 - 3348, The Unmanaged etcd spec currently allows empty objects because the old required: [endpoint, tls] was removed; add an object-level validation on the Unmanaged spec (the object that contains endpoint, shards, tls) via x-kubernetes-validations to enforce that either shards is present OR both endpoint and tls are present (e.g., rule: self.exists(s, s.shards) || (self.endpoint and self.tls) expressed in the CRD's CEL/validation syntax), referencing the existing properties endpoint, shards, and tls so incomplete {} or single-field specs are rejected by the API.api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/AAA_ungated.yaml (1)
2721-2855:⚠️ Potential issue | 🟠 MajorReject empty or partial legacy unmanaged etcd configs.
This schema now accepts
managementType: Unmanagedwithunmanaged: {}and alsoendpoint-only /tls-only objects whenshardsis omitted. In legacy single-etcd mode those objects are unusable, so they should be rejected at CRD validation time.Because this file is generated, please add the conditional validation in the source API type/markers and regenerate.
Proposed fix
unmanaged: description: |- unmanaged specifies configuration which enables the control plane to integrate with an externally managed etcd cluster. properties: endpoint: ... shards: ... tls: ... type: object + x-kubernetes-validations: + - message: configure either unmanaged.shards or legacy unmanaged.endpoint and unmanaged.tls + rule: has(self.shards) || (has(self.endpoint) && has(self.tls))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/AAA_ungated.yaml` around lines 2721 - 2855, The CRD currently allows managementType: Unmanaged with an empty unmanaged object and permits legacy single-etcd objects that have only endpoint or only tls when shards is omitted; update the API type markers to add conditional validations: when managementType == "Unmanaged" require that the unmanaged spec object is non-empty (reject {}), and when shards is not specified require both unmanaged.endpoint and unmanaged.tls to be present (reject endpoint-only or tls-only); add the corresponding x-kubernetes-validation markers on the UnmanagedEtcdSpec (and the parent Unmanaged field) to enforce these rules, then regenerate the YAML so the generated file (hostedclusters...AAA_ungated.yaml) includes the new validations.api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDC.yaml (1)
3045-3179:⚠️ Potential issue | 🟠 MajorRe-add legacy unmanaged requiredness when
shardsis unset.After making
endpointandtlsshard-conditional, the schema no longer requires either legacy config or shard config.managementType: Unmanagedcan now admitunmanaged: {}, which is invalid and will only fail later.Suggested fix
type: object + x-kubernetes-validations: + - message: endpoint and tls are required when shards is not specified + rule: has(self.shards) ? true : has(self.endpoint) && has(self.tls)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDC.yaml` around lines 3045 - 3179, The Unmanaged etcd schema allows an empty unmanaged config because after making endpoint and tls conditional on shards, nothing enforces legacy fields when managementType is Unmanaged; add a CEL validation at the same object level that enforces when managementType == "Unmanaged" then either shards is present or both endpoint and tls are present. Concretely, add an x-kubernetes-validations entry referencing managementType/unmanaged with a message like "Unmanaged must specify either shards or both endpoint and tls" and a rule such as self.managementType == 'Unmanaged' => (self.shards.exists() || (self.endpoint != null && self.tls != null)), targeting the schema that contains properties shards, endpoint, and tls.cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-Default.crd.yaml (1)
3131-3265:⚠️ Potential issue | 🟠 MajorLegacy unmanaged mode no longer validates required connection fields.
After removing the old required coupling, there is no replacement rule that requires
endpointandtlswhenshardsis absent. That means invalid legacy configs likespec.etcd.unmanaged: {}now pass CRD validation even though single-etcd mode cannot work without both fields. Please add an object-level conditional validation in the source type and regenerate this file.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-Default.crd.yaml` around lines 3131 - 3265, Add an object-level conditional validation to the UnmanagedEtcdSpec/source type so that when shards is absent the fields endpoint and tls are required: update the Go CRD struct tags or validation markers on the UnmanagedEtcdSpec (or the parent type that defines spec.etcd.unmanaged) to express "if shards == nil then endpoint and tls are required" (use kubebuilder+validation marker for conditional/object-level validation), then regenerate the CRD so the generated YAML for fields endpoint, tls, and shards includes the corresponding x-kubernetes-validations rule enforcing that legacy single-etcd mode must include both endpoint and tls when shards is not specified.api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/HCPEtcdBackup.yaml (1)
2777-2911:⚠️ Potential issue | 🟠 MajorAdd a one-of validation for unmanaged single-etcd vs sharded mode.
This schema now accepts invalid unmanaged specs: neither
endpoint/tlsnorshards, or both at once. Both cases should be rejected at admission instead of creating an unreconcilable HostedCluster or silently ignoring fields.Suggested fix
type: object + x-kubernetes-validations: + - message: endpoint and tls are required when shards is not specified + rule: has(self.shards) || (has(self.endpoint) && has(self.tls)) + - message: endpoint and tls must not be set when shards is specified + rule: '!has(self.shards) || (!has(self.endpoint) && !has(self.tls))'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/HCPEtcdBackup.yaml` around lines 2777 - 2911, The schema currently allows invalid unmanaged etcd specs when neither or both modes are set; add a CEL validation on the parent object (the UnmanagedEtcdSpec represented by the block containing endpoint, shards, tls) to enforce exactly-one-of: either shards is present (self.shards != null) OR legacy single-etcd mode is present (both self.endpoint != null and self.tls != null), but not both; implement this as an x-kubernetes-validations rule referencing self.shards, self.endpoint and self.tls that returns true only when (self.shards != null && self.endpoint == null && self.tls == null) || (self.shards == null && self.endpoint != null && self.tls != null), and include a clear message like "must specify either shards or endpoint+tls (mutually exclusive)".api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ClusterUpdateAcceptRisks.yaml (1)
2707-2846:⚠️ Potential issue | 🟠 MajorReject empty unmanaged etcd configs.
After removing the top-level
required: [endpoint, tls], this schema no longer requires either the legacy single-etcd fields orshards.spec.etcd.managementType: Unmanagedwithspec.etcd.unmanaged: {}now passes CRD validation and will fail later.Suggested object-level validation
unmanaged: description: |- unmanaged specifies configuration which enables the control plane to integrate with an externally managed etcd cluster. properties: endpoint: ... tls: ... + x-kubernetes-validations: + - message: either shards or endpoint/tls must be set for unmanaged etcd + rule: has(self.shards) || (has(self.endpoint) && has(self.tls)) type: object🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ClusterUpdateAcceptRisks.yaml` around lines 2707 - 2846, The unmanaged etcd schema allows an empty object because the top-level required for endpoint/tls was removed; update the UnmanagedEtcdSpec validation to require either a non-empty shards array or the legacy single-etcd fields (endpoint and tls). Concretely, add an x-kubernetes-validations rule on the unmanaged object (the "unmanaged" schema) that enforces: either self.shards != null && self.shards.size() >= 1 OR (self.endpoint != null && self.tls != null), and keep the existing shard-level validations for resourcePrefixes; reference the "unmanaged", "shards", "endpoint", and "tls" schema entries when adding this rule.
♻️ Duplicate comments (3)
control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go (1)
45-50:⚠️ Potential issue | 🔴 CriticalUse the canonical discovery Service name for named shards.
Named shards are created as
etcd-discovery-<shard>, notetcd-<shard>-discovery. The current value breaks bothsts.Spec.ServiceNameand the peer DNS names inETCD_INITIAL_CLUSTER.Suggested fix
if defaultShard.Name == "default" { sts.Name = "etcd" sts.Spec.ServiceName = "etcd-discovery" } else { sts.Name = fmt.Sprintf("etcd-%s", defaultShard.Name) - sts.Spec.ServiceName = fmt.Sprintf("etcd-%s-discovery", defaultShard.Name) + sts.Spec.ServiceName = fmt.Sprintf("etcd-discovery-%s", defaultShard.Name) } @@ if defaultShard.Name == "default" { podPrefix = "etcd" discoveryService = "etcd-discovery" } else { podPrefix = fmt.Sprintf("etcd-%s", defaultShard.Name) - discoveryService = fmt.Sprintf("etcd-%s-discovery", defaultShard.Name) + discoveryService = fmt.Sprintf("etcd-discovery-%s", defaultShard.Name) }Also applies to: 69-80
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go` around lines 45 - 50, The service name for named shards is constructed incorrectly; when defaultShard.Name != "default" set sts.Name to fmt.Sprintf("etcd-%s", defaultShard.Name) but sts.Spec.ServiceName must use the canonical pattern "etcd-discovery-<shard>" (not "etcd-<shard>-discovery"), and the same correction must be applied to the other occurrence around the ETCD_INITIAL_CLUSTER construction (the block that builds peer DNS names and service references); update sts.Spec.ServiceName and any code that formats peer/service DNS (referencing defaultShard.Name, sts.Name, and ETCD_INITIAL_CLUSTER assembly) to use "etcd-discovery-%s" for named shards.api/hypershift/v1beta1/hostedcluster_types.go (2)
1916-1926:⚠️ Potential issue | 🟠 MajorThe catch-all prefix rule is still “at least one”, not “exactly one”.
self.exists(s, '/' in s.resourcePrefixes)permits multiple shards with"/", even though the doc and error message say there must be exactly one. That ambiguity leaks directly into controller-side shard selection.Suggested fix
- // +kubebuilder:validation:XValidation:rule="self.exists(s, '/' in s.resourcePrefixes)",message="exactly one shard must have '/' prefix" + // +kubebuilder:validation:XValidation:rule="self.filter(s, '/' in s.resourcePrefixes).size() == 1",message="exactly one shard must have '/' prefix" @@ - // +kubebuilder:validation:XValidation:rule="self.exists(s, '/' in s.resourcePrefixes)",message="exactly one shard must have '/' prefix" + // +kubebuilder:validation:XValidation:rule="self.filter(s, '/' in s.resourcePrefixes).size() == 1",message="exactly one shard must have '/' prefix"Also applies to: 2078-2088
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/hypershift/v1beta1/hostedcluster_types.go` around lines 1916 - 1926, The XValidation rule on the Shards field currently uses self.exists(...) which allows multiple shards to include "/" but the requirement is exactly one; replace the rule with a CEL expression that asserts exactly one shard has "/" (for example use self.filter(s, '/' in s.resourcePrefixes).size() == 1) and update the XValidation message accordingly; apply the same change to the other duplicate Shards validation instance elsewhere in the file.
2062-2089:⚠️ Potential issue | 🟠 MajorReject mixed or empty unmanaged etcd configuration.
endpoint/tlsare documented as legacy single-shard fields, but nothing currently enforces either “shards” or “endpoint + tls”. As written, this accepts both modes at once and also accepts an unmanaged spec with neither configured.Suggested fix
+// +kubebuilder:validation:XValidation:rule="has(self.shards) || (self.endpoint != '' && has(self.tls.clientSecret.name))",message="either shards must be specified, or both endpoint and tls are required" +// +kubebuilder:validation:XValidation:rule="!has(self.shards) || (self.endpoint == '' && !has(self.tls.clientSecret.name))",message="endpoint and tls must not be set when shards is specified" type UnmanagedEtcdSpec struct {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/hypershift/v1beta1/hostedcluster_types.go` around lines 2062 - 2089, UnmanagedEtcdSpec currently allows empty or mixed configs; add kubebuilder XValidation rules on UnmanagedEtcdSpec to enforce that exactly one mode is chosen: either Shards is present (len>0) OR Endpoint is set (non-empty) — but not both — and when Shards is empty Endpoint must be provided; update the UnmanagedEtcdSpec validation to include an XValidation rule that enforces mutual exclusivity between Shards and Endpoint/TLS (XOR) and another rule ensuring Endpoint is non-empty when Shards is absent, referencing the UnmanagedEtcdSpec type and its fields Endpoint, TLS, and Shards so the API server rejects mixed or empty unmanaged etcd configurations.
🟡 Minor comments (3)
api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ClusterVersionOperatorConfiguration.yaml-2452-2519 (1)
2452-2519:⚠️ Potential issue | 🟡 MinorMirror the restore snapshot add/remove guard on per-shard storage.
spec.etcd.managed.storageprevents adding or removingrestoreSnapshotURLafter creation, but the duplicated per-shard storage schema does not. That lets users make day-2 changes that won't be honored once the shard has already bootstrapped its PV.Suggested fix
required: - type type: object + x-kubernetes-validations: + - message: restoreSnapshotURL cannot be added or removed after creation + rule: has(self.restoreSnapshotURL) == has(oldSelf.restoreSnapshotURL)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ClusterVersionOperatorConfiguration.yaml` around lines 2452 - 2519, The per-shard storage schema's restoreSnapshotURL must enforce the same add/remove guard as spec.etcd.managed.storage: update the per-shard storage property named restoreSnapshotURL to include x-kubernetes-list-type: set and the same x-kubernetes-validations as the top-level managed.storage (a max size of 1 rule like rule: self.size() <= 1, an immutability rule like rule: self == oldSelf, and a URL-scheme validation like rule: self.size() == 0 || self[0].matches('^(https|s3)://.*')) so users cannot add/remove or change the URL after initial creation; locate the restoreSnapshotURL block in the per-shard storage definition and add those validations to match the behavior of spec.etcd.managed.storage.restoreSnapshotURL.docs/content/reference/aggregated-docs.md-17732-17733 (1)
17732-17733:⚠️ Potential issue | 🟡 MinorReference links appear to target internal paths.
Line 17732-Line 17733 references
.spec/...files, which are typically not available in published docs. This will create dead references for readers.Suggested doc fix
-- Implementation Plan: `.spec/001-etcd-shard/implementation-plan.md` -- Requirements: `.spec/001-etcd-shard/requirements.md` +- Implementation Plan: (link to published design doc/KEP in repository docs) +- Requirements: (link to published requirements doc in repository docs)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/content/reference/aggregated-docs.md` around lines 17732 - 17733, The doc references use internal repo paths `.spec/001-etcd-shard/implementation-plan.md` and `.spec/001-etcd-shard/requirements.md` which will be dead in published docs; update those link targets in the aggregated-docs entry so they point to public/relative published pages or remove them and replace with a note describing that the implementation plan and requirements are internal, e.g., change the link targets to the public equivalents or to a single external/permalink for the ETCD shard design, and update the link text (Implementation Plan / Requirements) accordingly in aggregated-docs.md.docs/content/reference/api.md-6735-6760 (1)
6735-6760:⚠️ Potential issue | 🟡 MinorAdd semantic descriptions for
EtcdShardPriorityenum values.All four enum values are documented with empty descriptions, which makes priority-based behavior hard to interpret from API docs. Please define each value’s operational meaning (e.g., scheduling/backup expectations) in the source type comments so generated docs are self-explanatory.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/content/reference/api.md` around lines 6735 - 6760, The docs show empty descriptions for the EtcdShardPriority enum; update the source Go type comments for the EtcdShardPriority enum (the declaration named EtcdShardPriority used by ManagedEtcdShardSpec and UnmanagedEtcdShardSpec) to provide clear semantic meanings for each value: "Critical" = highest operational priority (prefer dedicated nodes, avoid eviction, highest backup frequency and fastest restore), "High" = prefer scheduling on stable nodes with frequent backups and priority for recovery, "Medium" = default operational priority with standard scheduling/backups and normal eviction tolerance, and "Low" = lowest priority (best-effort scheduling, infrequent backups, and acceptable for eviction/deprioritized recovery); regenerate docs so the table entries for Critical, High, Medium, Low contain these descriptions.
🧹 Nitpick comments (1)
control-plane-operator/controllers/hostedcontrolplane/etcd/statefulset.go (1)
91-101: Fragile string replacement may silently fail if base template changes.The
strings.ReplaceAllcalls depend on exact string matches in the base template script. If the YAML asset is modified (e.g., formatting changes, variable reordering), these replacements will silently do nothing, leaving non-default shards with incorrect service names.Consider either:
- Validating that the replacement actually occurred (check if string changed)
- Rebuilding the script programmatically rather than patching it
- Adding a comment documenting the exact strings expected in the base template
🛡️ Option 1: Add validation that replacement occurred
if len(c.Args) >= 2 { script := c.Args[1] + originalScript := script // Replace ETCDCTL_ENDPOINTS script = strings.ReplaceAll(script, "export ETCDCTL_ENDPOINTS=https://etcd-client:2379", fmt.Sprintf("export ETCDCTL_ENDPOINTS=https://%s:2379", clientService)) // Replace member add peer-urls script = strings.ReplaceAll(script, "https://${HOSTNAME}.etcd-discovery.${NAMESPACE}.svc:2380", fmt.Sprintf("https://${HOSTNAME}.%s.${NAMESPACE}.svc:2380", discoveryService)) + // For non-default shards, verify replacements actually happened + if shard.Name != "default" && script == originalScript { + // Log warning or consider returning error - service names may be incorrect + } c.Args = []string{"-c", script} }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/controllers/hostedcontrolplane/etcd/statefulset.go` around lines 91 - 101, The current replacement logic that mutates c.Args by creating script and calling strings.ReplaceAll may silently fail if the base template changes; modify the block that manipulates c.Args (the script variable and the two strings.ReplaceAll calls) to validate that each replacement actually occurred by comparing the script before and after each ReplaceAll (or using strings.Contains to confirm expected tokens), and if a replacement did not happen, log or return an error (or fallback to a programmatic rebuild) referencing clientService and discoveryService so the failure is visible and handled rather than leaving stale placeholders in c.Args.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 5fd96a88-afb0-49cc-b91a-c5dae1cae4a1
⛔ Files ignored due to path filters (4)
api/hypershift/v1beta1/zz_generated.deepcopy.gois excluded by!**/zz_generated*vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_helpers.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/hostedcluster_types.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/hypershift/api/hypershift/v1beta1/zz_generated.deepcopy.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*
📒 Files selected for processing (41)
api/hypershift/v1beta1/hostedcluster_helpers.goapi/hypershift/v1beta1/hostedcluster_types.goapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/AAA_ungated.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/AutoNodeKarpenter.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ClusterUpdateAcceptRisks.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ClusterVersionOperatorConfiguration.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDC.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDCWithUIDAndExtraClaimMappings.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ExternalOIDCWithUpstreamParity.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/GCPPlatform.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/HCPEtcdBackup.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/HyperShiftOnlyDynamicResourceAllocation.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/ImageStreamImportMode.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/KMSEncryptionProvider.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedclusters.hypershift.openshift.io/OpenStack.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/AAA_ungated.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/AutoNodeKarpenter.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ClusterUpdateAcceptRisks.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ClusterVersionOperatorConfiguration.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ExternalOIDC.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ExternalOIDCWithUIDAndExtraClaimMappings.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ExternalOIDCWithUpstreamParity.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/GCPPlatform.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/HCPEtcdBackup.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/HyperShiftOnlyDynamicResourceAllocation.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/ImageStreamImportMode.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/KMSEncryptionProvider.yamlapi/hypershift/v1beta1/zz_generated.featuregated-crd-manifests/hostedcontrolplanes.hypershift.openshift.io/OpenStack.yamlcmd/cluster/core/create.gocmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-CustomNoUpgrade.crd.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-Default.crd.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedclusters-Hypershift-TechPreviewNoUpgrade.crd.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-CustomNoUpgrade.crd.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-Default.crd.yamlcmd/install/assets/crds/hypershift-operator/zz_generated.crd-manifests/hostedcontrolplanes-Hypershift-TechPreviewNoUpgrade.crd.yamlcontrol-plane-operator/controllers/hostedcontrolplane/etcd/shards_test.gocontrol-plane-operator/controllers/hostedcontrolplane/etcd/statefulset.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment.godocs/content/reference/aggregated-docs.mddocs/content/reference/api.md
🚧 Files skipped from review as they are similar to previous changes (3)
- cmd/cluster/core/create.go
- control-plane-operator/controllers/hostedcontrolplane/etcd/shards_test.go
- control-plane-operator/controllers/hostedcontrolplane/v2/kas/deployment.go
| // Add shard label to pod template | ||
| if sts.Spec.Template.Labels == nil { | ||
| sts.Spec.Template.Labels = make(map[string]string) | ||
| } | ||
| sts.Spec.Template.Labels["hypershift.openshift.io/etcd-shard"] = defaultShard.Name | ||
|
|
||
| // Update selector to include shard label | ||
| if sts.Spec.Selector.MatchLabels == nil { | ||
| sts.Spec.Selector.MatchLabels = make(map[string]string) | ||
| } | ||
| sts.Spec.Selector.MatchLabels["hypershift.openshift.io/etcd-shard"] = defaultShard.Name |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Is apps/v1 StatefulSet.spec.selector immutable after creation in Kubernetes?
💡 Result:
Yes, in Kubernetes apps/v1, StatefulSet.spec.selector is immutable after creation. Attempting to update it results in a validation error. The Kubernetes API server enforces this through validation logic in ValidateStatefulSetUpdate (pkg/apis/apps/validation/validation.go), which checks changes to spec fields other than the allowed ones like 'replicas', 'template', 'updateStrategy', etc. The selector is explicitly treated as immutable, as evidenced by ongoing migration to declarative validation with +k8s:immutable tag (PR #138080, March 2026). Historical context from issue #50808 confirms selectors for controllers (including StatefulSet) were made immutable in apps/v1 to prevent undefined behaviors. Multiple error reports confirm updates to spec.selector trigger "Forbidden: updates to statefulset spec for fields other than..." errors, where selector is not in the allowed list.
Citations:
- 1: Controller selectors should be immutable after creation kubernetes/kubernetes#50808
- 2: Migrate StatefulSet.Spec.Selector to declarative immutable validation kubernetes/kubernetes#138080
- 3: Allow editing labels/selector in pod template spec in StatefulSets kubernetes/kubernetes#90519
- 4: https://stackoverflow.com/questions/49311911/forbidden-updates-to-statefulset-spec-for-fields-other-than-replicas-templa
- 5: https://stackoverflow.com/questions/77033799/how-to-update-a-statefulset
- 6: https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/apps/validation/validation.go
- 7: Subject: Inaccurate Error Message for StatefulSet Updates in Kubernetes kubernetes/kubernetes#120171
- 8: https://prabeshthapa.medium.com/upgrade-immutable-fields-in-a-k8s-statefulsets-66f3028028ad
🏁 Script executed:
cat -n control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go | head -100Repository: openshift/hypershift
Length of output: 1891
🏁 Script executed:
wc -l control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.goRepository: openshift/hypershift
Length of output: 1891
🏁 Script executed:
# Search for the function containing this code and understand its purpose
rg -A 20 -B 10 "Add shard label to pod template" control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.goRepository: openshift/hypershift
Length of output: 1891
🏁 Script executed:
# Search for where this function is called
rg -n "def\|func" control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go | head -20Repository: openshift/hypershift
Length of output: 1891
Remove selector mutation or implement migration-safe path for existing StatefulSets.
Mutating sts.Spec.Selector.MatchLabels will fail on already-created StatefulSets. In Kubernetes apps/v1, spec.selector is immutable after creation—the API server enforces this through validation logic that explicitly excludes selector from the allowed mutation list. Attempting updates will trigger a "Forbidden: updates to statefulset spec for fields other than..." error. Either remove the selector mutation or add conditional logic that preserves the selector for existing StatefulSets while applying it only to new ones.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@control-plane-operator/controllers/hostedcontrolplane/v2/etcd/statefulset.go`
around lines 53 - 63, The code is mutating sts.Spec.Selector.MatchLabels which
is immutable on existing StatefulSets; instead, only set the selector when
creating a new StatefulSet or when the selector is nil/empty (preserve existing
selector values for updates). Keep adding the shard label to
sts.Spec.Template.Labels (pod template) as before, but change the selector
mutation to a migration-safe check: if sts.Generation/ResourceVersion indicates
a new object or if sts.Spec.Selector == nil ||
len(sts.Spec.Selector.MatchLabels) == 0 then set
MatchLabels["hypershift.openshift.io/etcd-shard"]=defaultShard.Name, otherwise
leave sts.Spec.Selector.MatchLabels untouched so updates won’t be rejected by
the API server; reference the sts variable and the fields
sts.Spec.Selector.MatchLabels and sts.Spec.Template.Labels when making this
change.
Add nil check for hcp.Spec.Etcd.Managed in KAS deployment switch to prevent panic when ManagementType is Managed but Managed spec is not set. Update TestControlPlaneComponents fixtures for the if/else to switch refactor. Signed-off-by: Jesse Jaggars <jjaggars@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add nil check for hcp.Spec.Etcd.Managed in KAS deployment switch to prevent panic when ManagementType is Managed but Managed spec is not set. Remove ReconcileControlPlaneComponent call from etcd shard reconciliation since the CRD is managed by the v2 framework and is not available in all environments. Update TestControlPlaneComponents fixtures for the switch refactor. Signed-off-by: Jesse Jaggars <jjaggars@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThe Root CauseThe codecov/patch gate enforces a minimum test coverage percentage on lines changed in the PR. This PR introduces a large feature (multi-shard etcd) spanning 9 new files and many modified files, but several key areas lack any test coverage:
The project-level coverage ( Recommendations
Evidence
|
|
/remove-lifecycle stale |
Summary
This PR implements multi-shard etcd support for HyperShift, enabling hosted clusters to distribute etcd data across multiple independent etcd clusters for improved scalability and performance.
Key Features
Changes
--etcd-shard-*)Related Issues
This work is part of the etcd sharding initiative to improve hosted cluster scalability.
🤖 Generated with Claude Code
Summary by CodeRabbit