AUTOSCALE-558: Expose KubeletConfig on OpenShiftEC2NodeClass#7916
AUTOSCALE-558: Expose KubeletConfig on OpenShiftEC2NodeClass#7916jkyros wants to merge 7 commits intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@jkyros: This pull request references AUTOSCALE-558 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Excluded labels (none allowed) (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR extends Karpenter's OpenShift EC2 node class configuration to support kubelet settings management. It introduces a new Sequence Diagram(s)sequenceDiagram
actor User
participant KarpenterIgnition as KarpenterIgnition<br/>Controller
participant Management as Management Cluster<br/>(ConfigMap)
participant MachineConfig as MachineConfig<br/>System
participant Node as Provisioned<br/>Node
User->>KarpenterIgnition: Create/Update OpenshiftEC2NodeClass<br/>with spec.kubelet config
activate KarpenterIgnition
KarpenterIgnition->>KarpenterIgnition: Merge user kubelet config<br/>with base taint configuration
KarpenterIgnition->>Management: Reconcile kubelet config<br/>ConfigMap (per-NodeClass)
deactivate KarpenterIgnition
activate Management
Management->>Management: Store merged kubelet<br/>configuration as JSON
Management->>MachineConfig: ConfigMap propagates to<br/>MachineConfig system
deactivate Management
User->>KarpenterIgnition: Create NodePool referencing<br/>OpenshiftEC2NodeClass
activate KarpenterIgnition
KarpenterIgnition->>KarpenterIgnition: createInMemoryNodePool()<br/>includes kubelet ConfigRef
KarpenterIgnition->>Node: Provision new node<br/>with kubelet config reference
deactivate KarpenterIgnition
activate Node
Node->>Management: Retrieve kubelet config<br/>from ConfigMap
Node->>Node: Apply kubelet configuration<br/>(MaxPods, CPUCFSQuota, etc.)
deactivate Node
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
/test e2e-aws-autonode |
|
@jkyros: This pull request references AUTOSCALE-558 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
test/e2e/assets/karpenter-config-checker-ds.yaml (1)
19-19: Consider pinning the alpine image tag for reproducibility.Same suggestion as
karpenter-kubelet-checker-ds.yaml- pin to a specific version for test stability.♻️ Suggested fix
- image: alpine + image: alpine:3.21🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/assets/karpenter-config-checker-ds.yaml` at line 19, The image currently uses an unpinned tag ("image: alpine"); change it to a specific pinned tag (for example the same alpine:<version> used in karpenter-kubelet-checker-ds.yaml) to ensure reproducible tests, updating the image field in the karpenter-config-checker daemonset (the "image: alpine" line) to the exact version string.test/e2e/assets/karpenter-kubelet-checker-ds.yaml (1)
19-19: Consider pinning the alpine image tag for reproducibility.Using
alpinewithout a version tag means tests may behave differently over time as thelatesttag changes. For e2e test stability, consider pinning to a specific version.♻️ Suggested fix
- image: alpine + image: alpine:3.21🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/assets/karpenter-kubelet-checker-ds.yaml` at line 19, The container image is unpinned ("image: alpine"), which can cause non-deterministic test behavior; update the YAML resource so the image field uses an explicit, immutable tag (e.g., change "image: alpine" to a specific version like "image: alpine:3.18" or another vetted version) to ensure reproducible e2e runs and pin stability for the karpenter-kubelet-checker daemonset.karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go (1)
260-261: Hardcoded AMD64 architecture.The
Archfield is hardcoded toArchitectureAMD64. If ARM64 support is planned for Karpenter-managed nodes, this should be parameterized or derived from the NodeClass.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go` around lines 260 - 261, The Arch field is hardcoded to hyperv1.ArchitectureAMD64; change it to derive from the NodeClass/NodeClassSpec rather than a constant: read the architecture value from the NodeClass (e.g., nodeClass.Spec.Architecture or equivalent) or accept it as a parameter to the function that builds the config (where Config: configRefs is set), validate/normalize values (AMD64/ARM64) and fall back to a safe default if unspecified, and replace the literal hyperv1.ArchitectureAMD64 with that derived/parameterized value so Karpenter-managed nodes can support ARM64.karpenter-operator/controllers/karpenter/assets/karpenter.hypershift.openshift.io_openshiftec2nodeclasses.yaml (1)
202-276: Consider adding validation rules for kubelet thresholds.The
spec.kubeletschema is well-structured, but consider adding CEL validation rules to enforce invariants such as:
imageGCHighThresholdPercent > imageGCLowThresholdPercent- Threshold percentages within 0-100 range
This would provide earlier feedback to users rather than failing at kubelet startup.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@karpenter-operator/controllers/karpenter/assets/karpenter.hypershift.openshift.io_openshiftec2nodeclasses.yaml` around lines 202 - 276, The CRD lacks CEL validation for kubelet thresholds; add CEL rules under the NodeClass CRD validation for spec.kubelet (fields imageGCHighThresholdPercent and imageGCLowThresholdPercent and any other percent thresholds) to enforce 0 <= imageGCLowThresholdPercent <= 100, 0 <= imageGCHighThresholdPercent <= 100, and imageGCHighThresholdPercent > imageGCLowThresholdPercent (and similar invariants for other *_ThresholdPercent fields if present); place these expressions in the validation.rules (or the validation.openAPIV3Schema.x-kubernetes-validations block) next to the kubelet properties to provide early user feedback.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In
`@karpenter-operator/controllers/karpenter/assets/karpenter.hypershift.openshift.io_openshiftec2nodeclasses.yaml`:
- Around line 202-276: The CRD lacks CEL validation for kubelet thresholds; add
CEL rules under the NodeClass CRD validation for spec.kubelet (fields
imageGCHighThresholdPercent and imageGCLowThresholdPercent and any other percent
thresholds) to enforce 0 <= imageGCLowThresholdPercent <= 100, 0 <=
imageGCHighThresholdPercent <= 100, and imageGCHighThresholdPercent >
imageGCLowThresholdPercent (and similar invariants for other *_ThresholdPercent
fields if present); place these expressions in the validation.rules (or the
validation.openAPIV3Schema.x-kubernetes-validations block) next to the kubelet
properties to provide early user feedback.
In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go`:
- Around line 260-261: The Arch field is hardcoded to hyperv1.ArchitectureAMD64;
change it to derive from the NodeClass/NodeClassSpec rather than a constant:
read the architecture value from the NodeClass (e.g.,
nodeClass.Spec.Architecture or equivalent) or accept it as a parameter to the
function that builds the config (where Config: configRefs is set),
validate/normalize values (AMD64/ARM64) and fall back to a safe default if
unspecified, and replace the literal hyperv1.ArchitectureAMD64 with that
derived/parameterized value so Karpenter-managed nodes can support ARM64.
In `@test/e2e/assets/karpenter-config-checker-ds.yaml`:
- Line 19: The image currently uses an unpinned tag ("image: alpine"); change it
to a specific pinned tag (for example the same alpine:<version> used in
karpenter-kubelet-checker-ds.yaml) to ensure reproducible tests, updating the
image field in the karpenter-config-checker daemonset (the "image: alpine" line)
to the exact version string.
In `@test/e2e/assets/karpenter-kubelet-checker-ds.yaml`:
- Line 19: The container image is unpinned ("image: alpine"), which can cause
non-deterministic test behavior; update the YAML resource so the image field
uses an explicit, immutable tag (e.g., change "image: alpine" to a specific
version like "image: alpine:3.18" or another vetted version) to ensure
reproducible e2e runs and pin stability for the karpenter-kubelet-checker
daemonset.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5ca5591d-b704-486e-95cb-aa9dc52a18ff
⛔ Files ignored due to path filters (4)
api/karpenter/v1beta1/zz_generated.deepcopy.gois excluded by!**/zz_generated*vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/karpenter_types.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/kubeletconfig.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/zz_generated.deepcopy.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*
📒 Files selected for processing (14)
api/go.modapi/karpenter/v1beta1/karpenter_types.goapi/karpenter/v1beta1/kubeletconfig.goapi/karpenter/v1beta1/kubeletconfig_test.goclient/applyconfiguration/karpenter/v1beta1/kubeletconfiguration.goclient/applyconfiguration/karpenter/v1beta1/openshiftec2nodeclassspec.goclient/applyconfiguration/utils.gokarpenter-operator/controllers/karpenter/assets/karpenter.hypershift.openshift.io_openshiftec2nodeclasses.yamlkarpenter-operator/controllers/karpenterignition/karpenterignition_controller.gokarpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.gosupport/karpenter/karpenter.gotest/e2e/assets/karpenter-config-checker-ds.yamltest/e2e/assets/karpenter-kubelet-checker-ds.yamltest/e2e/karpenter_test.go
|
/test e2e-aws-autonode |
|
@jkyros: This pull request references AUTOSCALE-558 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go`:
- Around line 403-442: Add a deletion-handling branch in
KarpenterIgnitionReconciler.Reconcile() that checks if
openshiftEC2NodeClass.DeletionTimestamp != nil, calls
r.reconcileKubeletConfigMap(ctx, hcp, openshiftEC2NodeClass) to ensure the
per-nodeclass ConfigMap is deleted even when Spec.Kubelet is present, then
remove the OpenshiftEC2NodeClass finalizer (the same finalizer string used by
ec2_nodeclass_controller) and update the object via r.ManagementClient before
returning; this mirrors ec2_nodeclass_controller deletion flow and guarantees
cleanup happens prior to finalizer removal.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4cafd0cf-a4ae-45ef-a46a-107c49448fcb
📒 Files selected for processing (2)
karpenter-operator/controllers/karpenterignition/karpenterignition_controller.gokarpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go
dd1e22c to
e50993a
Compare
|
@jkyros: This pull request references AUTOSCALE-558 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test e2e-aws-autonode |
|
@jkyros: This pull request references AUTOSCALE-558 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (7)
test/e2e/karpenter_test.go (2)
756-759: Minimal NodePool used only for timeout calculation.The
minimalNPwithType: hyperv1.AWSPlatformis a workaround foreventuallyDaemonSetRollsOutwhich uses the NodePool's platform type to determine timeouts. This is acceptable but could benefit from a brief comment explaining the purpose.Optional: Add clarifying comment
- // nodePool arg to eventuallyDaemonSetRollsOut is only used for timeout calculation (KubeVirt gets longer) + // nodePool is only used by eventuallyDaemonSetRollsOut for timeout calculation + // (KubeVirt platform gets a longer timeout). We use AWS platform for standard timeouts. minimalNP := &hyperv1.NodePool{} minimalNP.Spec.Platform.Type = hyperv1.AWSPlatform🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/karpenter_test.go` around lines 756 - 759, Add a brief clarifying comment above the minimalNP construction explaining it is a minimal NodePool used solely so eventuallyDaemonSetRollsOut can derive timeout values from the NodePool platform; reference the minimalNP variable, the hyperv1.NodePool type and the Spec.Platform.Type assignment, and mention that the value is intentionally set to hyperv1.AWSPlatform as a workaround for timeout calculation rather than representing a full NodePool.
740-761: Good use of DaemonSet for runtime kubelet verification.Using the DaemonSet's readiness probe to verify kubelet configuration is an elegant approach - if the probe fails (maxPods≠500), the pods won't become ready, and the test will fail.
However, the
checkerDSDaemonSet is not cleaned up after the test. Consider adding a deferred delete to avoid leaving test resources on the cluster.Suggested cleanup
g.Expect(guestClient.Create(ctx, checkerDS)).To(Succeed()) t.Logf("Created karpenter-kubelet-checker DaemonSet") + defer func() { + _ = guestClient.Delete(ctx, checkerDS) + }()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/karpenter_test.go` around lines 740 - 761, The DaemonSet checkerDS created for kubelet verification is not cleaned up; add a deferred deletion immediately after creating checkerDS to ensure test resources are removed: call guestClient.Delete(ctx, checkerDS) in a defer, optionally wrapping it with g.Eventually or error handling to ignore not-found errors, so the cleanup runs even if the test fails (reference checkerDS, guestClient, ctx and the existing create call just before eventuallyDaemonSetRollsOut).api/karpenter/v1beta1/karpenter_types.go (1)
42-47: Consider adding validation for percentage fields.
ImageGCHighThresholdPercentandImageGCLowThresholdPercentshould logically be in the range 0-100. Adding kubebuilder validation markers would prevent invalid configurations at admission time.Suggested validation markers
// ImageGCHighThresholdPercent is the percent of disk usage which triggers image garbage collection. // +optional + // +kubebuilder:validation:Minimum=0 + // +kubebuilder:validation:Maximum=100 ImageGCHighThresholdPercent *int32 `json:"imageGCHighThresholdPercent,omitempty"` // ImageGCLowThresholdPercent is the percent of disk usage to which image garbage collection attempts to free. // +optional + // +kubebuilder:validation:Minimum=0 + // +kubebuilder:validation:Maximum=100 ImageGCLowThresholdPercent *int32 `json:"imageGCLowThresholdPercent,omitempty"`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@api/karpenter/v1beta1/karpenter_types.go` around lines 42 - 47, Add kubebuilder validation markers to the ImageGCHighThresholdPercent and ImageGCLowThresholdPercent fields so admission rejects values outside 0-100; specifically, annotate the struct fields ImageGCHighThresholdPercent and ImageGCLowThresholdPercent with an integer range validation (e.g., +kubebuilder:validation:Minimum=0 and +kubebuilder:validation:Maximum=100) and ensure they remain optional pointers as declared.test/e2e/assets/karpenter-kubelet-checker-ds.yaml (1)
19-19: Consider pinning the alpine image tag for reproducibility.Using
alpinewithout a version tag may lead to inconsistent test results if the base image changes. For E2E tests, pinning a specific version (e.g.,alpine:3.19) ensures reproducible behavior.Suggested fix
- image: alpine + image: alpine:3.19🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/assets/karpenter-kubelet-checker-ds.yaml` at line 19, Replace the unpinned image reference "image: alpine" in the DaemonSet container spec with a specific, reproducible tag (for example "image: alpine:3.19"); update the container image line in karpenter-kubelet-checker-ds.yaml so the test uses a pinned Alpine version to avoid drift.karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go (1)
230-233: Log message may be misleading when ConfigMap was already absent.
supportutil.DeleteIfNeededreturns(exists bool, err error)but theexistsvalue is discarded. The log at line 233 unconditionally states "Deleted kubelet config ConfigMap" even if the ConfigMap was already absent. Consider checking the return value for more accurate logging.📝 Suggested improvement
- if _, err := supportutil.DeleteIfNeeded(ctx, r.ManagementClient, cm); err != nil { + existed, err := supportutil.DeleteIfNeeded(ctx, r.ManagementClient, cm) + if err != nil { return ctrl.Result{}, fmt.Errorf("failed to delete kubelet config configmap %s: %w", configMapName, err) } - log.Info("Deleted kubelet config ConfigMap", "name", configMapName) + if existed { + log.Info("Deleted kubelet config ConfigMap", "name", configMapName) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go` around lines 230 - 233, supportutil.DeleteIfNeeded returns (exists bool, err error) but the code discards the exists value and always logs "Deleted kubelet config ConfigMap"; update the call in karpenterignition_controller.go to capture the exists boolean (e.g., exists, err := supportutil.DeleteIfNeeded(...)) and then: on error return as before; if exists is true log that the ConfigMap was deleted (using log.Info with configMapName), otherwise log that the ConfigMap was already absent (or skip deletion log). Ensure you reference supportutil.DeleteIfNeeded, the cm variable, and configMapName when making the change so the log reflects the actual outcome.karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go (2)
1171-1218: Test title is misleading – it manually patches the finalizer rather than testing Reconcile.The test "When spec.kubelet is set it should add the finalizer and create the ConfigMap" calls
reconcileKubeletConfigMap(which creates the ConfigMap) but then manually patches the finalizer at lines 1200-1203. This doesn't actually test thatReconcile()adds the finalizer.Consider either:
- Renaming the test to clarify it tests
reconcileKubeletConfigMapseparately from finalizer logic- Calling the full
Reconcile()method to test the integrated behavior🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go` around lines 1171 - 1218, The test title is misleading because it manually patches the finalizer after calling reconcileKubeletConfigMap; either rename the test to state it only verifies reconcileKubeletConfigMap creates the ConfigMap (and keep the manual finalizer patch) or change the test to exercise the full controller integration by calling r.Reconcile(...) instead of reconcileKubeletConfigMap so you validate that Reconcile() both creates the ConfigMap and adds kubeletConfigFinalizer to the OpenshiftEC2NodeClass; update assertions to fetch the NodeClass from fakeGuestClient and the ConfigMap from fakeManagementClient as currently done, and remove the manual Patch(...) step when calling Reconcile().
1220-1256: Test does not exercise any reconciler logic.This test manually patches the finalizer removal (lines 1247-1250) without calling any reconciler method. It verifies the fake client accepts the patch but doesn't test actual controller behavior. Consider either removing this test (as it doesn't test meaningful behavior) or rewriting it to call
Reconcile()with the appropriate setup.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go` around lines 1220 - 1256, The test currently mutates the fake client directly instead of invoking the controller; fix by exercising the reconciler: keep the initial nodeClass with Finalizers set and Spec.Kubelet nil, build r as shown, then call r.Reconcile(ctx, reconcile.Request{NamespacedName: types.NamespacedName{Name: testNodeClassName}}) (use the controller-runtime reconcile.Request and types.NamespacedName), assert no error from Reconcile, then fetch the object with fakeGuestClient.Get(...) and expect kubeletConfigFinalizer to be removed; alternatively remove the test if you prefer not to test Reconcile behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go`:
- Around line 155-178: The finalizer is removed before attempting to delete the
kubelet ConfigMap which can orphan the ConfigMap if deletion fails; modify the
branch that handles openshiftEC2NodeClass.Spec.Kubelet == nil to first call
r.reconcileKubeletConfigMap(ctx, hcp, openshiftEC2NodeClass) and return on
error, and only after that success perform the DeepCopy,
controllerutil.RemoveFinalizer(...) and r.GuestClient.Patch(...) (the same
removal logic currently in the else block). Keep the existing error wrapping
messages and the kubeletConfigFinalizer, openshiftEC2NodeClass,
r.reconcileKubeletConfigMap and r.GuestClient.Patch symbol usages so the
ConfigMap cleanup completes before the finalizer is removed.
---
Nitpick comments:
In `@api/karpenter/v1beta1/karpenter_types.go`:
- Around line 42-47: Add kubebuilder validation markers to the
ImageGCHighThresholdPercent and ImageGCLowThresholdPercent fields so admission
rejects values outside 0-100; specifically, annotate the struct fields
ImageGCHighThresholdPercent and ImageGCLowThresholdPercent with an integer range
validation (e.g., +kubebuilder:validation:Minimum=0 and
+kubebuilder:validation:Maximum=100) and ensure they remain optional pointers as
declared.
In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go`:
- Around line 1171-1218: The test title is misleading because it manually
patches the finalizer after calling reconcileKubeletConfigMap; either rename the
test to state it only verifies reconcileKubeletConfigMap creates the ConfigMap
(and keep the manual finalizer patch) or change the test to exercise the full
controller integration by calling r.Reconcile(...) instead of
reconcileKubeletConfigMap so you validate that Reconcile() both creates the
ConfigMap and adds kubeletConfigFinalizer to the OpenshiftEC2NodeClass; update
assertions to fetch the NodeClass from fakeGuestClient and the ConfigMap from
fakeManagementClient as currently done, and remove the manual Patch(...) step
when calling Reconcile().
- Around line 1220-1256: The test currently mutates the fake client directly
instead of invoking the controller; fix by exercising the reconciler: keep the
initial nodeClass with Finalizers set and Spec.Kubelet nil, build r as shown,
then call r.Reconcile(ctx, reconcile.Request{NamespacedName:
types.NamespacedName{Name: testNodeClassName}}) (use the controller-runtime
reconcile.Request and types.NamespacedName), assert no error from Reconcile,
then fetch the object with fakeGuestClient.Get(...) and expect
kubeletConfigFinalizer to be removed; alternatively remove the test if you
prefer not to test Reconcile behavior.
In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go`:
- Around line 230-233: supportutil.DeleteIfNeeded returns (exists bool, err
error) but the code discards the exists value and always logs "Deleted kubelet
config ConfigMap"; update the call in karpenterignition_controller.go to capture
the exists boolean (e.g., exists, err := supportutil.DeleteIfNeeded(...)) and
then: on error return as before; if exists is true log that the ConfigMap was
deleted (using log.Info with configMapName), otherwise log that the ConfigMap
was already absent (or skip deletion log). Ensure you reference
supportutil.DeleteIfNeeded, the cm variable, and configMapName when making the
change so the log reflects the actual outcome.
In `@test/e2e/assets/karpenter-kubelet-checker-ds.yaml`:
- Line 19: Replace the unpinned image reference "image: alpine" in the DaemonSet
container spec with a specific, reproducible tag (for example "image:
alpine:3.19"); update the container image line in
karpenter-kubelet-checker-ds.yaml so the test uses a pinned Alpine version to
avoid drift.
In `@test/e2e/karpenter_test.go`:
- Around line 756-759: Add a brief clarifying comment above the minimalNP
construction explaining it is a minimal NodePool used solely so
eventuallyDaemonSetRollsOut can derive timeout values from the NodePool
platform; reference the minimalNP variable, the hyperv1.NodePool type and the
Spec.Platform.Type assignment, and mention that the value is intentionally set
to hyperv1.AWSPlatform as a workaround for timeout calculation rather than
representing a full NodePool.
- Around line 740-761: The DaemonSet checkerDS created for kubelet verification
is not cleaned up; add a deferred deletion immediately after creating checkerDS
to ensure test resources are removed: call guestClient.Delete(ctx, checkerDS) in
a defer, optionally wrapping it with g.Eventually or error handling to ignore
not-found errors, so the cleanup runs even if the test fails (reference
checkerDS, guestClient, ctx and the existing create call just before
eventuallyDaemonSetRollsOut).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e3b9a8bb-a0e3-4fd3-bc2e-18a8a8236788
⛔ Files ignored due to path filters (4)
api/karpenter/v1beta1/zz_generated.deepcopy.gois excluded by!**/zz_generated*vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/karpenter_types.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/kubeletconfig.gois excluded by!**/vendor/**,!vendor/**vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/zz_generated.deepcopy.gois excluded by!**/vendor/**,!vendor/**,!**/zz_generated*
📒 Files selected for processing (13)
api/go.modapi/karpenter/v1beta1/karpenter_types.goapi/karpenter/v1beta1/kubeletconfig.goapi/karpenter/v1beta1/kubeletconfig_test.goclient/applyconfiguration/karpenter/v1beta1/kubeletconfiguration.goclient/applyconfiguration/karpenter/v1beta1/openshiftec2nodeclassspec.goclient/applyconfiguration/utils.gokarpenter-operator/controllers/karpenter/assets/karpenter.hypershift.openshift.io_openshiftec2nodeclasses.yamlkarpenter-operator/controllers/karpenterignition/karpenterignition_controller.gokarpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.gosupport/karpenter/karpenter.gotest/e2e/assets/karpenter-kubelet-checker-ds.yamltest/e2e/karpenter_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
- api/go.mod
- client/applyconfiguration/karpenter/v1beta1/openshiftec2nodeclassspec.go
- client/applyconfiguration/utils.go
e50993a to
994277d
Compare
|
@jkyros: This pull request references AUTOSCALE-558 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@jkyros: This pull request references AUTOSCALE-558 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test e2e-aws-autonode |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
karpenter-operator/controllers/karpenter/karpenter_controller.go (1)
295-310: Consider setting an owner reference for cleanup.The ConfigMap is created without an owner reference to the HCP, which means it won't be automatically garbage collected when the HCP is deleted. This aligns with the TODO at line 272, but should be tracked for follow-up.
💡 Suggested improvement
func (r *Reconciler) reconcileTaintConfigMap(ctx context.Context, hcp *hyperv1.HostedControlPlane) error { cm := &corev1.ConfigMap{ ObjectMeta: metav1.ObjectMeta{ Name: karpenterutil.KarpenterTaintConfigMapName, Namespace: hcp.Namespace, }, } _, err := r.CreateOrUpdate(ctx, r.ManagementClient, cm, func() error { cm.Data = map[string]string{ "config": karpenterutil.KarpenterTaintConfigManifest(), } + return controllerutil.SetControllerReference(hcp, cm, r.ManagementClient.Scheme()) - return nil }) return err }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@karpenter-operator/controllers/karpenter/karpenter_controller.go` around lines 295 - 310, The ConfigMap created in reconcileTaintConfigMap is missing an OwnerReference to the HostControlPlane (hcp), so add an owner reference before calling CreateOrUpdate: set cm.ObjectMeta.OwnerReferences (or use controllerutil.SetControllerReference(hcp, cm, r.Scheme)) to point to the hcp object, handle and return any error from setting the owner, and ensure the reconciler's scheme knows the HCP GVK so the owner can be set correctly; keep the owner setup adjacent to where cm is instantiated in reconcileTaintConfigMap.karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go (1)
502-524: Shallow merge allows user-providedregisterWithTaintsto override the required taint.The comment at line 513-514 states "registerWithTaints is never a user-settable field so the merge is always purely additive." However,
mergeKubeletConfigMapsuses overlay-wins semantics, so if a user includesregisterWithTaintsin theirKubeletConfiguration, it would silently override the required Karpenter taint.While the API may not expose
registerWithTaintsas a field onKubeletConfiguration, the JSON round-trip at lines 515-522 would include any extra fields that might be present. Consider either:
- Explicitly overwriting
registerWithTaintsin the merged result (swap merge order or post-process), or- Adding validation to reject user-provided
registerWithTaintsat admission timeGiven the current API doesn't expose this field, the risk is low, but worth noting for future-proofing.
🔧 Defensive fix to ensure taint is always present
merged := mergeKubeletConfigMaps(taintBase, userMap) + // Ensure the required taint is always present, even if userMap contains registerWithTaints + merged["registerWithTaints"] = taintBase["registerWithTaints"] manifest, err := kubeletConfigManifest(configMapName, merged)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go` around lines 502 - 524, The shallow merge can let user-provided "registerWithTaints" override the required taint; after creating merged via mergeKubeletConfigMaps(taintBase, userMap) ensure the required field is enforced by explicitly setting merged["registerWithTaints"] = taintBase["registerWithTaints"] (or re-run the merge with userMap overlaying taintBase reversed) so that the taint from taintBase always wins; reference taintBase, mergeKubeletConfigMaps, openshiftEC2NodeClass.Spec.Kubelet and merged when making the change.test/e2e/assets/karpenter-kubelet-checker-ds.yaml (1)
1-43: Privileged container is acceptable for this E2E test asset.The static analysis warnings about privileged containers (CKV_K8S_16, CKV_K8S_20, CKV_K8S_23) are expected here. This DaemonSet requires host filesystem access to verify kubelet configuration, which necessitates privileged mode. The test code scopes it to specific test nodes via
nodeSelector.A few minor observations:
Unpinned image tag: Using
alpinewithout a version tag could lead to non-reproducible test runs. Consider pinning to a specific version (e.g.,alpine:3.19).Missing resource limits: Only
requestsare specified. Addinglimitswould prevent resource exhaustion in edge cases.Magic number coupling: The value
427appears here and must match the test'sMaxPodssetting. Consider documenting this coupling or extracting to a shared constant if feasible.🔧 Suggested improvements
containers: - name: karpenter-kubelet-checker - image: alpine + image: alpine:3.19 command: - /bin/sleep - 24h resources: requests: cpu: 100m memory: 200Mi + limits: + cpu: 100m + memory: 200Mi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/assets/karpenter-kubelet-checker-ds.yaml` around lines 1 - 43, The DaemonSet karpenter-kubelet-checker uses an unpinned image, lacks resource limits, and embeds a magic number in the readinessProbe; update the container karpenter-kubelet-checker to use a pinned image tag (e.g., alpine:3.19), add matching resource.limits (cpu and memory) alongside the existing requests to guard against exhaustion, and replace or document the hardcoded "427" in the readinessProbe command (grep 'maxPods: 427') by either referencing a shared test constant or adding a clear comment linking it to the test's MaxPods setting so the coupling is explicit and maintainable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go`:
- Around line 1156-1163: The test initializes AutoNode.Provisioner using a
pointer but the field is a value type; change the initialization in the block
setting AutoNode.Provisioner from &hyperv1.ProvisionerConfig{...} to
hyperv1.ProvisionerConfig{...} so the types match. Specifically update the
Provisioner assignment in the test where AutoNode, Provisioner,
ProvisionerConfig, KarpenterConfig (with Name: hyperv1.ProvisionerKarpenter and
Platform: hyperv1.AWSPlatform) are constructed to remove the leading &.
---
Nitpick comments:
In `@karpenter-operator/controllers/karpenter/karpenter_controller.go`:
- Around line 295-310: The ConfigMap created in reconcileTaintConfigMap is
missing an OwnerReference to the HostControlPlane (hcp), so add an owner
reference before calling CreateOrUpdate: set cm.ObjectMeta.OwnerReferences (or
use controllerutil.SetControllerReference(hcp, cm, r.Scheme)) to point to the
hcp object, handle and return any error from setting the owner, and ensure the
reconciler's scheme knows the HCP GVK so the owner can be set correctly; keep
the owner setup adjacent to where cm is instantiated in reconcileTaintConfigMap.
In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller.go`:
- Around line 502-524: The shallow merge can let user-provided
"registerWithTaints" override the required taint; after creating merged via
mergeKubeletConfigMaps(taintBase, userMap) ensure the required field is enforced
by explicitly setting merged["registerWithTaints"] =
taintBase["registerWithTaints"] (or re-run the merge with userMap overlaying
taintBase reversed) so that the taint from taintBase always wins; reference
taintBase, mergeKubeletConfigMaps, openshiftEC2NodeClass.Spec.Kubelet and merged
when making the change.
In `@test/e2e/assets/karpenter-kubelet-checker-ds.yaml`:
- Around line 1-43: The DaemonSet karpenter-kubelet-checker uses an unpinned
image, lacks resource limits, and embeds a magic number in the readinessProbe;
update the container karpenter-kubelet-checker to use a pinned image tag (e.g.,
alpine:3.19), add matching resource.limits (cpu and memory) alongside the
existing requests to guard against exhaustion, and replace or document the
hardcoded "427" in the readinessProbe command (grep 'maxPods: 427') by either
referencing a shared test constant or adding a clear comment linking it to the
test's MaxPods setting so the coupling is explicit and maintainable.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 9ba50d46-4d73-4aa6-90c9-1718b4bb23e9
⛔ Files ignored due to path filters (8)
api/karpenter/v1beta1/zz_generated.deepcopy.gois excluded by!**/zz_generated*.go,!**/zz_generated*client/applyconfiguration/karpenter/v1beta1/kubeletconfiguration.gois excluded by!client/**client/applyconfiguration/karpenter/v1beta1/openshiftec2nodeclassspec.gois excluded by!client/**client/applyconfiguration/utils.gois excluded by!client/**karpenter-operator/controllers/karpenter/assets/karpenter.hypershift.openshift.io_openshiftec2nodeclasses.yamlis excluded by!karpenter-operator/controllers/karpenter/assets/*.yamlvendor/github.com/openshift/hypershift/api/karpenter/v1beta1/karpenter_types.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/kubeletconfig.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/zz_generated.deepcopy.gois excluded by!vendor/**,!**/vendor/**,!**/zz_generated*.go,!**/zz_generated*
📒 Files selected for processing (12)
api/go.modapi/karpenter/v1beta1/karpenter_types.goapi/karpenter/v1beta1/kubeletconfig.goapi/karpenter/v1beta1/kubeletconfig_test.gohypershift-operator/controllers/hostedcluster/karpenter.gokarpenter-operator/controllers/karpenter/karpenter_controller.gokarpenter-operator/controllers/karpenter/karpenter_controller_test.gokarpenter-operator/controllers/karpenterignition/karpenterignition_controller.gokarpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.gosupport/karpenter/karpenter.gotest/e2e/assets/karpenter-kubelet-checker-ds.yamltest/e2e/karpenter_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- api/karpenter/v1beta1/kubeletconfig_test.go
- api/karpenter/v1beta1/karpenter_types.go
994277d to
3ef1b25
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
karpenter-operator/controllers/karpenter/karpenter_controller_test.go (1)
357-392: Idempotency subtest should verify repeated reconcile stability.The subtest name says idempotent, but it only reconciles once. Also, Line 389 indexes
taints[0]without a prior length assertion. Consider reconciling twice and assertingconfigis unchanged between runs, plus explicit taint length checks.Suggested test hardening
err := r.reconcileTaintConfigMap(ctx, hcp) g.Expect(err).NotTo(HaveOccurred()) cm := &corev1.ConfigMap{} err = fakeManagementClient.Get(ctx, client.ObjectKey{Name: karpenterutil.KarpenterTaintConfigMapName, Namespace: namespace}, cm) g.Expect(err).NotTo(HaveOccurred()) + firstConfig := cm.Data["config"] + + // Run reconcile again to verify idempotent output + err = r.reconcileTaintConfigMap(ctx, hcp) + g.Expect(err).NotTo(HaveOccurred()) + err = fakeManagementClient.Get(ctx, client.ObjectKey{Name: karpenterutil.KarpenterTaintConfigMapName, Namespace: namespace}, cm) + g.Expect(err).NotTo(HaveOccurred()) + g.Expect(cm.Data["config"]).To(Equal(firstConfig)) var cr map[string]interface{} err = yaml.Unmarshal([]byte(cm.Data["config"]), &cr) g.Expect(err).NotTo(HaveOccurred()) spec, ok := cr["spec"].(map[string]interface{}) g.Expect(ok).To(BeTrue()) kubeletConfig, ok := spec["kubeletConfig"].(map[string]interface{}) g.Expect(ok).To(BeTrue()) taints, ok := kubeletConfig["registerWithTaints"].([]interface{}) g.Expect(ok).To(BeTrue()) + g.Expect(taints).To(HaveLen(len(karpenterutil.KarpenterBaseTaints))) taint, ok := taints[0].(map[string]interface{}) g.Expect(ok).To(BeTrue()) g.Expect(taint["key"]).To(Equal(karpenterutil.KarpenterBaseTaints[0].Key))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@karpenter-operator/controllers/karpenter/karpenter_controller_test.go` around lines 357 - 392, The subtest claims idempotency but only calls r.reconcileTaintConfigMap once and indexes taints[0] without checking length; update the test to call r.reconcileTaintConfigMap twice (using the same ctx and r) and capture cm.Data["config"] after each reconcile to assert the config string is unchanged, and before accessing taints[0] assert len(taints) > 0 (or exact length equals len(karpenterutil.KarpenterBaseTaints)) so the index is safe; keep references to existingCM, r.reconcileTaintConfigMap, cm, and karpenterutil.KarpenterBaseTaints when making the changes.karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go (1)
1173-1258: These subtests bypass the production finalizer flow.Line 1201–1205 and Line 1248–1252 mutate finalizers directly, so they don't actually verify
Reconcile()branch behavior. Please drive these cases throughReconcile()(orreconcileDeletedNodeClass) and assert finalizer/configmap outcomes from controller logic.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go` around lines 1173 - 1258, The tests are mutating finalizers directly (nodeClass.Finalizers = append(...) / = nil) which bypasses controller logic; instead invoke the controller path (call r.Reconcile with a ctrl.Request for the NodeClass or call r.reconcileDeletedNodeClass(ctx, hcp, nodeClass) where appropriate) after setting up fakeGuestClient/fakeManagementClient so the reconciler executes the finalizer add/remove and ConfigMap create/delete logic, then assert the ConfigMap existence via fakeManagementClient.Get and finalizer presence via fakeGuestClient.Get; replace the direct Patch steps in the two subtests with a Reconcile/reconcileDeletedNodeClass invocation and subsequent assertions against the clients.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/karpenter_test.go`:
- Around line 737-738: The test uses identical SystemReserved and KubeReserved
values which makes the kubeReserved assertion ambiguous; update the KubeReserved
map in the test (the SystemReserved/KubeReserved map literal where KubeReserved
is set) to use "cpu": "610m" and "memory": "621Mi" (leave SystemReserved as
"510m"/"521Mi") and also update the corresponding kubelet checker manifest
expectations for kubeReserved to expect 610m / 621Mi so the grep-based check
uniquely verifies kubeReserved is applied.
---
Nitpick comments:
In `@karpenter-operator/controllers/karpenter/karpenter_controller_test.go`:
- Around line 357-392: The subtest claims idempotency but only calls
r.reconcileTaintConfigMap once and indexes taints[0] without checking length;
update the test to call r.reconcileTaintConfigMap twice (using the same ctx and
r) and capture cm.Data["config"] after each reconcile to assert the config
string is unchanged, and before accessing taints[0] assert len(taints) > 0 (or
exact length equals len(karpenterutil.KarpenterBaseTaints)) so the index is
safe; keep references to existingCM, r.reconcileTaintConfigMap, cm, and
karpenterutil.KarpenterBaseTaints when making the changes.
In
`@karpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.go`:
- Around line 1173-1258: The tests are mutating finalizers directly
(nodeClass.Finalizers = append(...) / = nil) which bypasses controller logic;
instead invoke the controller path (call r.Reconcile with a ctrl.Request for the
NodeClass or call r.reconcileDeletedNodeClass(ctx, hcp, nodeClass) where
appropriate) after setting up fakeGuestClient/fakeManagementClient so the
reconciler executes the finalizer add/remove and ConfigMap create/delete logic,
then assert the ConfigMap existence via fakeManagementClient.Get and finalizer
presence via fakeGuestClient.Get; replace the direct Patch steps in the two
subtests with a Reconcile/reconcileDeletedNodeClass invocation and subsequent
assertions against the clients.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Pro
Run ID: fe03755c-67e4-4ecf-981c-b58a497bb0f6
⛔ Files ignored due to path filters (7)
api/karpenter/v1beta1/zz_generated.deepcopy.gois excluded by!**/zz_generated*.go,!**/zz_generated*client/applyconfiguration/karpenter/v1beta1/kubeletconfiguration.gois excluded by!client/**client/applyconfiguration/karpenter/v1beta1/openshiftec2nodeclassspec.gois excluded by!client/**client/applyconfiguration/utils.gois excluded by!client/**karpenter-operator/controllers/karpenter/assets/karpenter.hypershift.openshift.io_openshiftec2nodeclasses.yamlis excluded by!karpenter-operator/controllers/karpenter/assets/*.yamlvendor/github.com/openshift/hypershift/api/karpenter/v1beta1/karpenter_types.gois excluded by!vendor/**,!**/vendor/**vendor/github.com/openshift/hypershift/api/karpenter/v1beta1/zz_generated.deepcopy.gois excluded by!vendor/**,!**/vendor/**,!**/zz_generated*.go,!**/zz_generated*
📒 Files selected for processing (10)
api/karpenter/v1beta1/karpenter_types.gohypershift-operator/controllers/hostedcluster/karpenter.gokarpenter-operator/controllers/karpenter/karpenter_controller.gokarpenter-operator/controllers/karpenter/karpenter_controller_test.gokarpenter-operator/controllers/karpenterignition/karpenterignition_controller.gokarpenter-operator/controllers/karpenterignition/karpenterignition_controller_test.gosupport/karpenter/karpenter.gosupport/karpenter/karpenter_test.gotest/e2e/karpenter_kubelet_checker_pod.yamltest/e2e/karpenter_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- karpenter-operator/controllers/karpenter/karpenter_controller.go
|
/test e2e-aws-autonode |
|
|
||
| // KubeletConfiguration configures kubelet settings for nodes provisioned by this NodeClass. | ||
| // These settings are injected into the node's ignition configuration via MachineConfig. | ||
| type KubeletConfiguration struct { |
There was a problem hiding this comment.
@jkyros I'm pretty sure we'll need to expose podPidsLimit here as well. That is currently the only option available to customers using the OCM API, so I would expect them to need it here as well.
There was a problem hiding this comment.
This will have to be an openshiftec2nodeclass specific field, the upstream ec2nodeclass doesn't support this yet, but it's in "priority/backlog". https://github.com/aws/karpenter-provider-aws/issues/7982
Not that this matters too much to us, since we don't propagate the fields to ec2nodeclass anyways, but just noting here that the API with the new field will look slightly different to customers. If upstream eventually decides to support this, I assume we would want the field naming to be similar to not be confusing.
EDIT: With that other PR, I guess naming doesn't really matter as much as I think :-)
There was a problem hiding this comment.
I think #7952 is going to decide the direction here. If we're skewing from upstream already with that, I think this boils down to "what list of fields do we want exposed here", and we're deciding to more-or-less ignore upstream because our fields are already going to be weird and OpenShift-ified, right?
There was a problem hiding this comment.
I added podPodsLimit and made it sync the fields Karpenter cares about back to ec2nodeclass (it apparently uses them for scheduling) .
I put the "christmas tree" approach where it syncs from the upstream on top of it so we can see it. If we don't like it, I'll peel it off.
|
/test e2e-aws-autonode EDIT: github bot didn't was scaled to zero during night and couldn't respond? 😛 |
|
|
||
| // KubeletConfiguration configures kubelet settings for nodes provisioned by this NodeClass. | ||
| // These settings are injected into the node's ignition configuration via MachineConfig. | ||
| type KubeletConfiguration struct { |
There was a problem hiding this comment.
I will align this with that once we get that figured out. 😄
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
de2ca6f to
6c416be
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jkyros The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
6c416be to
31de5ae
Compare
|
/test e2e-aws-autonode |
…eclass The Karpenter ec2nodeclass has a spec.kubelet field that we currently do not expose on OpenShiftEC2Nodeclass. We currently use AMIType=Custom, and as a result even if we did expose spec.kubelet on OpenShiftEC2Nodeclass, it wouldn't get reconciled to the node anyway. We do desire as much parity as possible with the upstream Karpenter API, so what we _can_ do is expose the spec.kubelet on OpenShiftEC2Nodeclass, then marshal that into a config that we give to the node as part of the ignition bootstrapping process. This results in the user being able to use OpenShiftEC2Nodeclass as they would a normal ec2nodeclass (with regard to KubeletConfig) while still being able to bootstrap our nodes the OpenShift way, and that is what we have done here. Signed-off-by: John Kyros <jkyros@redhat.com>
This is just the e2e test for KubeletConfig on OpenShiftEC2NodeClass, it configures KubeletConfig, spins up a node, then runs a daemon on that node to make sure the settings made it. Signed-off-by: John Kyros <jkyros@redhat.com>
There is a default taint config that our ignition controller uses to set the base taints on a node for first boot so workloads don't get scheduled until Karpenter okays the nodes. The ignition controller consumes configs via local object references, so it has to be in a configmap, and rather than have each ec2nodeclass have its own configmap, we made one central one. That configmap was originally created in the hypershift operator's hostedcluster controller where it sets up the karpenter component, with the thought that all of the nodepools could use it, but we ran into trouble when we exposed spec.kubelet on the OpenShiftEC2Nodeclass because that ALSO becomes a KubeletConfig and there can only be one KubeletConfig in the MachineConfig pool. So we needed to merge them for nodeclasses that had spec.kubelet configured. This just moves that default creation into the karpenter operator proper so that we can get access to its contents directly as a "constant" without having to retrieve the default one every time we reconcile. It still does not get cleaned up, we will need to handle that when we address deconfiguring autonode/karpenter. Signed-off-by: John Kyros <jkyros@redhat.com>
The original test here was very opaque, it would wait for the daemonset to come up and if it failed, you didn't know why, you just knew that the health check on the daemon failed. I also didn't remember that the MCO split kubeletconfig settings into multiple files so we were looking for systemReserved in kubelet.conf that wouldn't end up there. This is maybe _too_ exhaustive and I worry about having to keep this updated in the future if the list/implementation changes, but at least maybe we will catch some weirdness. Signed-off-by: John Kyros <jkyros@redhat.com>
Signed-off-by: John Kyros <jkyros@redhat.com>
Upstream Karpenter only exposes a few KubeletConfig fields that get used for bin packing (they are scheduling-related). Users generally want to use more than that, and upstream, because they aren't using a custom AMI type, they can specify those in userdata, but downstream they can't. So this is an attempt at a generator such that we can sync with both the upstream karpenter API (for parity) AND add addition fields from the kubelet, while retaining the ability to deny-list any fields we don't feel comfortable exposing. Signed-off-by: John Kyros <jkyros@redhat.com>
Signed-off-by: John Kyros <jkyros@redhat.com>
31de5ae to
74da398
Compare
|
Rebased |
|
Now that this is fixed, we can: |
| var excludedFields = map[string]string{ | ||
| // Embedded TypeMeta | ||
| "TypeMeta": "embedded type metadata", | ||
|
|
||
| // Server / networking | ||
| "EnableServer": "server config, not user-facing", | ||
| "StaticPodPath": "static pod management", | ||
| "PodLogsDir": "pod log directory", | ||
| "SyncFrequency": "internal sync frequency", | ||
| "FileCheckFrequency": "file check frequency", | ||
| "HTTPCheckFrequency": "http check frequency", | ||
| "StaticPodURL": "static pod URL", | ||
| "StaticPodURLHeader": "static pod URL header", | ||
| "Address": "kubelet bind address", | ||
| "Port": "kubelet port", | ||
| "ReadOnlyPort": "read-only port", | ||
|
|
||
| // TLS |
There was a problem hiding this comment.
I set my "ast atrocity" that generates this from the kubernetes kubeletconfig struct up for deny-list (because I figured we'd want everything except for dangerous ones), but we could also allow-list, or expose them all and then reject them via VAP/webhook/in the controller). I just wanted to vet the generation approach.
I'm sure we could spend too long bikeshedding which ones to disable (aside from ClusterDNS) 😄
|
@jkyros: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Now I have all the information needed. Let me compile the final analysis. The key findings are:
Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThere are two independent failures blocking this PR, neither caused by the PR's code changes. The Root CausePrimary issue: Merge conflicts (verify-workflows + tide) The PR branch The Secondary issue: TestAutoscaling/Teardown flake (e2e-aws-techpreview) The Critically, all functional tests passed:
Recommendations
Evidence
|
|
closing in favor of #8192 |
What this PR does / why we need it:
This:
spec.kubeletonOpenShiftEC2NodeClass, but DOES NOT reconcile it to ec2nodeclass directly like the other fieldsBecause:
spec.Kubeletfrom the ec2nodeclass does NOT get reconciled into userdata by Karpenter, so we have to stuff the KubeletConfig into the ignition/MachineConfig ourselves.Which issue(s) this PR fixes:
Fixes AUTOSCALE-558
Special notes for your reviewer:
systemReservedinto that other env file and I thought I broke it because it was missing -- wrote that to figure it outChecklist:
Summary by CodeRabbit
Release Notes
New Features
Tests