Skip to content

Enhancement: Etcd sharding by resource kind for HyperShift#1979

Open
jhjaggars wants to merge 4 commits intoopenshift:masterfrom
jhjaggars:hypershift-etcd-sharding
Open

Enhancement: Etcd sharding by resource kind for HyperShift#1979
jhjaggars wants to merge 4 commits intoopenshift:masterfrom
jhjaggars:hypershift-etcd-sharding

Conversation

@jhjaggars
Copy link
Copy Markdown

Proposes etcd sharding by Kubernetes resource kind for HyperShift hosted control planes, enabling distribution of resources across multiple independent etcd deployments for improved scalability and performance.
Each etcd shard is registered as an independent ControlPlaneComponent within the CPO v2 component framework, inheriting all framework features automatically. KAS is configured with --etcd-servers-overrides to route resources to the appropriate shard.

  • Conditional component registration: shard components registered at CPO startup based on HCP spec; shard list is immutable after creation
  • Framework reuse: each shard uses NewStatefulSetComponent with WithAssetDir("etcd"), inheriting priority class, topology spread, scale-to-zero, PDB, etc.
  • EtcdLike interface: minimal framework extension for KAS dependency exclusion, priority class, and replica defaults
  • Managed + Unmanaged: supports both managed (CPO-deployed) and unmanaged (external) etcd sharding
  • Feature-gated: EtcdSharding feature gate in TechPreviewNoUpgrade
  • API additions: ManagedEtcdShardSpec, UnmanagedEtcdShardSpec, EmptyDirEtcdStorageSpec, EtcdDataPolicyType added to hypershift.openshift.io/v1beta1
  • Jira: https://issues.redhat.com/browse/OCPBUGS-TBD
    cc @enxebre @sjenning @csrwng

@openshift-ci openshift-ci Bot requested review from csrwng and sjenning April 22, 2026 16:11
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 23, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign sjenning for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jhjaggars jhjaggars force-pushed the hypershift-etcd-sharding branch 3 times, most recently from fdf69e3 to d188bb0 Compare April 23, 2026 12:36
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jhjaggars jhjaggars force-pushed the hypershift-etcd-sharding branch from d188bb0 to 313f0ce Compare April 23, 2026 12:48
Comment thread enhancements/hypershift/hypershift-etcd-sharding.md Outdated
Comment thread enhancements/hypershift/hypershift-etcd-sharding.md Outdated
Comment thread enhancements/hypershift/hypershift-etcd-sharding.md Outdated
which forbid functions on API types). It always synthesizes a default shard from the
top-level `Storage` and `Scheduling` fields (with `ResourcePrefixes: ["/"]`,
replicas from `controllerAvailabilityPolicy`), and appends any explicit non-default
shards from the `Shards` list. When no
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the default shard still always required?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current design, yes, the rationale is that you need one and you can't specify shards in advance for every possible api type (nor can you specify CRDs at all) so you'll need a catch-all anyhow.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can't specify shards in advance for every possible api type

Sharding only works for API types served by the core Kube API server, so the list is finite?

Currently it appears that we can go from just the default shard to adding a set of shards once on day 2, but not add additional shards later, is that desirable vs the cluster must be born with all shards pre-defined?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list is finite for a release yes, unless and until they change the way the KAS config works. The intention, at present, is to require clusters to be born with the shard configuration that they have forever. Adding a new shard afterward could orphan data and I haven't thought or planned how to mitigate that in this enhancement.

Comment thread enhancements/hypershift/hypershift-etcd-sharding.md Outdated
Comment thread enhancements/hypershift/hypershift-etcd-sharding.md Outdated
Comment thread enhancements/hypershift/hypershift-etcd-sharding.md Outdated
Comment thread enhancements/hypershift/hypershift-etcd-sharding.md Outdated
jhjaggars and others added 2 commits April 28, 2026 14:16
- Rename "v2 framework" to "CPO component framework" throughout
- Scope hcp CLI references to self-hosted/MCE deployments
- Replace EtcdLike interface with strings.HasPrefix prefix check
- Replace per-manifest adapt functions with TemplatedProvider approach
- Update API types to match kube-api-linter output (omitzero, value
  types, MinItems/MinLength bounds, omitempty on list map keys)
- Drop data-policy annotation — backup determined by storage type
  (PVC = backed up, EmptyDir = not)
- Add parent-level CEL rule preventing shard removal once configured
- Add cross-shard duplicate resource prefix CEL validation
- Add CEL rule preventing storageClassName on EmptyDir storage
- Make scheduling mutable (no data migration needed for placement)
- Clarify replicas override controllerAvailabilityPolicy when set
- Document CPO restart idempotency for conditional registration
- Make downgrade incompatibility for sharded HCPs explicit
- Document wait-for-etcd extension mechanism for multi-shard
- Document defrag controller sidecar behavior with multiple shards
- Add ServiceMonitor and PDB to NewShardComponent registration
- Fix stale DataPolicy reference in ManagedEtcdShardStorageSpec
- Fix ResourcePrefixes godoc listing "/" as valid for non-default shards
- Update Alternative C to reflect partial template adoption

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread enhancements/hypershift/etcd-sharding-by-resource-kind.md
Comment thread enhancements/hypershift/etcd-sharding-by-resource-kind.md Outdated
Comment thread enhancements/hypershift/etcd-sharding-by-resource-kind.md
Comment thread enhancements/hypershift/etcd-sharding-by-resource-kind.md Outdated
Comment thread enhancements/hypershift/etcd-sharding-by-resource-kind.md Outdated
Comment thread enhancements/hypershift/etcd-sharding-by-resource-kind.md Outdated
Comment thread enhancements/hypershift/etcd-sharding-by-resource-kind.md
Comment thread enhancements/hypershift/etcd-sharding-by-resource-kind.md Outdated
Comment thread enhancements/hypershift/etcd-sharding-by-resource-kind.md
Comment thread enhancements/hypershift/etcd-sharding-by-resource-kind.md Outdated
- Replace resourcePrefixes []string with structured EtcdShardResource
  type (apiGroup + resource fields with proper validation)
- Restructure ManagedEtcdShardStorageSpec as discriminated union with
  nested PersistentVolume spec and CEL union rule
- Change shard name validation to DNS1123 label (was DNS1035-like)
- Make storage mutable day-2 (switching PVC/EmptyDir doesn't need
  cluster recreation)
- Make replicas non-pointer int32 (zero value is never valid)
- Use CEL url library for endpoint validation (isURL + getScheme)
- Prevent adding shards to unsharded clusters (has(oldSelf.shards) ==
  has(self.shards))
- Move MinProperties=1 from Scheduling field to EtcdShardSchedulingSpec
  struct
- Remove redundant omitempty on struct fields (omitzero only)
- Add commented-out label key/value validation on nodeSelector (CEL
  cost budget)
- Fix EmptyDir data loss description (survives container restarts)
- Add single-replica failure mode note
- Add shard rename envtest case
- StorageClassName MaxLength 253, DNS1123 subdomain
- Add @JoelSpeed to api-approvers
- Update last-updated to 2026-04-30

All type changes verified against kube-api-linter (0 issues).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 30, 2026

@jhjaggars: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

// +kubebuilder:validation:MinLength=0
// +kubebuilder:validation:MaxLength=253
// +kubebuilder:validation:XValidation:rule="self == '' || self.matches('^[a-z0-9]([a-z0-9-]*[a-z0-9])?([.][a-z0-9]([a-z0-9-]*[a-z0-9])?)*$')",message="apiGroup must be a valid DNS subdomain (lowercase alphanumeric, hyphens, dots)"
APIGroup string `json:"apiGroup,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be a pointer to be able to roundtrip "" as a value

// apiGroup is the API group of the resource (e.g., "coordination.k8s.io"
// for leases). An empty string designates the core API group (e.g.,
// events, pods, configmaps).
// When non-empty, must be a valid DNS subdomain (RFC 1123).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When non-empty, must be at most 253 characters in length and consist of only lowercase alphanumeric characters, hyphens and periods. Each period separated segment must start and end with an alphanumeric character.

We have a standard phrase for explaining this, would be good to be consistent

APIGroup string `json:"apiGroup,omitempty"`

// resource is the plural resource name (e.g., "events", "leases",
// "configmaps"). Must be a valid DNS label (RFC 1123): lowercase
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the maximum length constraint to this description too please

Comment on lines +339 to +340
// -kubebuilder:validation:XValidation:rule=`self.all(key, size(key) <= 317 && key.matches('^(([A-Za-z0-9]+(\\.[A-Za-z0-9]+)?)*[A-Za-z0-9]\\/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$'))`, message="label key must be a valid qualified name"
// -kubebuilder:validation:XValidation:rule=`self.all(key, size(self[key]) <= 63 && self[key].matches('^(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?$'))`, message="label value must be 63 characters or less and consist of valid label characters"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These wouldn't actually work if you tried them. You can't validate map keys in CEL, to be able to validate the keys you'd currently have to use a webhook

Maybe it's not worth adding these as i think they'll show up in the godoc at the moment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants