Enhancement: Etcd sharding by resource kind for HyperShift#1979
Enhancement: Etcd sharding by resource kind for HyperShift#1979jhjaggars wants to merge 4 commits intoopenshift:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
fdf69e3 to
d188bb0
Compare
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
d188bb0 to
313f0ce
Compare
| which forbid functions on API types). It always synthesizes a default shard from the | ||
| top-level `Storage` and `Scheduling` fields (with `ResourcePrefixes: ["/"]`, | ||
| replicas from `controllerAvailabilityPolicy`), and appends any explicit non-default | ||
| shards from the `Shards` list. When no |
There was a problem hiding this comment.
is the default shard still always required?
There was a problem hiding this comment.
In the current design, yes, the rationale is that you need one and you can't specify shards in advance for every possible api type (nor can you specify CRDs at all) so you'll need a catch-all anyhow.
There was a problem hiding this comment.
you can't specify shards in advance for every possible api type
Sharding only works for API types served by the core Kube API server, so the list is finite?
Currently it appears that we can go from just the default shard to adding a set of shards once on day 2, but not add additional shards later, is that desirable vs the cluster must be born with all shards pre-defined?
There was a problem hiding this comment.
This list is finite for a release yes, unless and until they change the way the KAS config works. The intention, at present, is to require clusters to be born with the shard configuration that they have forever. Adding a new shard afterward could orphan data and I haven't thought or planned how to mitigate that in this enhancement.
- Rename "v2 framework" to "CPO component framework" throughout - Scope hcp CLI references to self-hosted/MCE deployments - Replace EtcdLike interface with strings.HasPrefix prefix check - Replace per-manifest adapt functions with TemplatedProvider approach - Update API types to match kube-api-linter output (omitzero, value types, MinItems/MinLength bounds, omitempty on list map keys) - Drop data-policy annotation — backup determined by storage type (PVC = backed up, EmptyDir = not) - Add parent-level CEL rule preventing shard removal once configured - Add cross-shard duplicate resource prefix CEL validation - Add CEL rule preventing storageClassName on EmptyDir storage - Make scheduling mutable (no data migration needed for placement) - Clarify replicas override controllerAvailabilityPolicy when set - Document CPO restart idempotency for conditional registration - Make downgrade incompatibility for sharded HCPs explicit - Document wait-for-etcd extension mechanism for multi-shard - Document defrag controller sidecar behavior with multiple shards - Add ServiceMonitor and PDB to NewShardComponent registration - Fix stale DataPolicy reference in ManagedEtcdShardStorageSpec - Fix ResourcePrefixes godoc listing "/" as valid for non-default shards - Update Alternative C to reflect partial template adoption Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace resourcePrefixes []string with structured EtcdShardResource type (apiGroup + resource fields with proper validation) - Restructure ManagedEtcdShardStorageSpec as discriminated union with nested PersistentVolume spec and CEL union rule - Change shard name validation to DNS1123 label (was DNS1035-like) - Make storage mutable day-2 (switching PVC/EmptyDir doesn't need cluster recreation) - Make replicas non-pointer int32 (zero value is never valid) - Use CEL url library for endpoint validation (isURL + getScheme) - Prevent adding shards to unsharded clusters (has(oldSelf.shards) == has(self.shards)) - Move MinProperties=1 from Scheduling field to EtcdShardSchedulingSpec struct - Remove redundant omitempty on struct fields (omitzero only) - Add commented-out label key/value validation on nodeSelector (CEL cost budget) - Fix EmptyDir data loss description (survives container restarts) - Add single-replica failure mode note - Add shard rename envtest case - StorageClassName MaxLength 253, DNS1123 subdomain - Add @JoelSpeed to api-approvers - Update last-updated to 2026-04-30 All type changes verified against kube-api-linter (0 issues). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@jhjaggars: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| // +kubebuilder:validation:MinLength=0 | ||
| // +kubebuilder:validation:MaxLength=253 | ||
| // +kubebuilder:validation:XValidation:rule="self == '' || self.matches('^[a-z0-9]([a-z0-9-]*[a-z0-9])?([.][a-z0-9]([a-z0-9-]*[a-z0-9])?)*$')",message="apiGroup must be a valid DNS subdomain (lowercase alphanumeric, hyphens, dots)" | ||
| APIGroup string `json:"apiGroup,omitempty"` |
There was a problem hiding this comment.
Needs to be a pointer to be able to roundtrip "" as a value
| // apiGroup is the API group of the resource (e.g., "coordination.k8s.io" | ||
| // for leases). An empty string designates the core API group (e.g., | ||
| // events, pods, configmaps). | ||
| // When non-empty, must be a valid DNS subdomain (RFC 1123). |
There was a problem hiding this comment.
When non-empty, must be at most 253 characters in length and consist of only lowercase alphanumeric characters, hyphens and periods. Each period separated segment must start and end with an alphanumeric character.
We have a standard phrase for explaining this, would be good to be consistent
| APIGroup string `json:"apiGroup,omitempty"` | ||
|
|
||
| // resource is the plural resource name (e.g., "events", "leases", | ||
| // "configmaps"). Must be a valid DNS label (RFC 1123): lowercase |
There was a problem hiding this comment.
Add the maximum length constraint to this description too please
| // -kubebuilder:validation:XValidation:rule=`self.all(key, size(key) <= 317 && key.matches('^(([A-Za-z0-9]+(\\.[A-Za-z0-9]+)?)*[A-Za-z0-9]\\/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$'))`, message="label key must be a valid qualified name" | ||
| // -kubebuilder:validation:XValidation:rule=`self.all(key, size(self[key]) <= 63 && self[key].matches('^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?$'))`, message="label value must be 63 characters or less and consist of valid label characters" |
There was a problem hiding this comment.
These wouldn't actually work if you tried them. You can't validate map keys in CEL, to be able to validate the keys you'd currently have to use a webhook
Maybe it's not worth adding these as i think they'll show up in the godoc at the moment
Proposes etcd sharding by Kubernetes resource kind for HyperShift hosted control planes, enabling distribution of resources across multiple independent etcd deployments for improved scalability and performance.
Each etcd shard is registered as an independent
ControlPlaneComponentwithin the CPO v2 component framework, inheriting all framework features automatically. KAS is configured with--etcd-servers-overridesto route resources to the appropriate shard.NewStatefulSetComponentwithWithAssetDir("etcd"), inheriting priority class, topology spread, scale-to-zero, PDB, etc.EtcdLikeinterface: minimal framework extension for KAS dependency exclusion, priority class, and replica defaultsEtcdShardingfeature gate inTechPreviewNoUpgradeManagedEtcdShardSpec,UnmanagedEtcdShardSpec,EmptyDirEtcdStorageSpec,EtcdDataPolicyTypeadded tohypershift.openshift.io/v1beta1cc @enxebre @sjenning @csrwng