Skip to content

Conversation

@dprince
Copy link
Contributor

@dprince dprince commented Dec 19, 2025

Fixes upgrade failures from 0.4 to main caused by incompatible webhook configuration changes that trigger index out of range panics during manifest merging.

When OPENSTACK_RELEASE_VERSION is bumped, the controller now:

  • Detects the version change by comparing against status.ReleaseVersion
  • Deletes all owned resources (deployments, services, serviceaccounts, configmaps)
  • Removes managed webhooks (validating and mutating configurations)
  • Requeues to recreate resources with new manifests

This one-time cleanup ensures a clean slate for incompatible upgrades where the structure of resources (especially webhooks) has changed between versions.

Adds ReleaseVersion field to OpenStackStatus to track the deployed version.

Jira: OSPRH-23865

@openshift-ci openshift-ci bot requested review from abays and rabi December 19, 2025 19:15
Comment on lines 352 to 435
func (r *OpenStackReconciler) deleteAllOwnedResources(ctx context.Context, instance *operatorv1beta1.OpenStack) error {
Log := r.GetLogger(ctx)
Log.Info("Deleting all owned resources for release version upgrade")

// Delete all owned deployments
deployments := &appsv1.DeploymentList{}
err := r.List(ctx, deployments, &client.ListOptions{Namespace: instance.Namespace})
if err != nil {
return errors.Wrap(err, "failed to list deployments")
}
for _, deployment := range deployments.Items {
if metav1.IsControlledBy(&deployment, instance) {
Log.Info("Deleting deployment", "name", deployment.Name)
err := r.Delete(ctx, &deployment)
if err != nil && !apierrors.IsNotFound(err) {
return errors.Wrapf(err, "failed to delete deployment %s", deployment.Name)
}
}
}

// Delete all owned service accounts
serviceAccounts := &corev1.ServiceAccountList{}
err = r.List(ctx, serviceAccounts, &client.ListOptions{Namespace: instance.Namespace})
if err != nil {
return errors.Wrap(err, "failed to list service accounts")
}
for _, sa := range serviceAccounts.Items {
if metav1.IsControlledBy(&sa, instance) {
Log.Info("Deleting service account", "name", sa.Name)
err := r.Delete(ctx, &sa)
if err != nil && !apierrors.IsNotFound(err) {
return errors.Wrapf(err, "failed to delete service account %s", sa.Name)
}
}
}

// Delete all owned services
services := &corev1.ServiceList{}
err = r.List(ctx, services, &client.ListOptions{Namespace: instance.Namespace})
if err != nil {
return errors.Wrap(err, "failed to list services")
}
for _, svc := range services.Items {
if metav1.IsControlledBy(&svc, instance) {
Log.Info("Deleting service", "name", svc.Name)
err := r.Delete(ctx, &svc)
if err != nil && !apierrors.IsNotFound(err) {
return errors.Wrapf(err, "failed to delete service %s", svc.Name)
}
}
}

// Delete webhooks (these are cluster-scoped and not owned, but managed by label)
valWebhooks, err := r.Kclient.AdmissionregistrationV1().ValidatingWebhookConfigurations().List(ctx, metav1.ListOptions{
LabelSelector: "openstack.openstack.org/managed=true",
})
if err != nil {
return errors.Wrap(err, "failed listing validating webhook configurations")
}
for _, webhook := range valWebhooks.Items {
Log.Info("Deleting validating webhook", "name", webhook.Name)
err := r.Kclient.AdmissionregistrationV1().ValidatingWebhookConfigurations().Delete(ctx, webhook.Name, metav1.DeleteOptions{})
if err != nil && !apierrors.IsNotFound(err) {
return errors.Wrapf(err, "failed to delete validating webhook %s", webhook.Name)
}
}

mutWebhooks, err := r.Kclient.AdmissionregistrationV1().MutatingWebhookConfigurations().List(ctx, metav1.ListOptions{
LabelSelector: "openstack.openstack.org/managed=true",
})
if err != nil {
return errors.Wrap(err, "failed listing mutating webhook configurations")
}
for _, webhook := range mutWebhooks.Items {
Log.Info("Deleting mutating webhook", "name", webhook.Name)
err := r.Kclient.AdmissionregistrationV1().MutatingWebhookConfigurations().Delete(ctx, webhook.Name, metav1.DeleteOptions{})
if err != nil && !apierrors.IsNotFound(err) {
return errors.Wrapf(err, "failed to delete mutating webhook %s", webhook.Name)
}
}

Log.Info("All owned resources deleted successfully")
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be really nice if this could be done with some Generics to avoid the repetition. Does something like this work?
bshephar@80a5be2

(I don't have a OCP / OKD env to validate. So just throwing out a suggestion for you.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try it. thanks @bshephar

@dprince dprince force-pushed the reinstall_operators branch 2 times, most recently from 5080ea0 to 73c5855 Compare January 6, 2026 14:35
Copy link
Contributor

@bshephar bshephar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looks good.

It could probably be tightened up to avoid the runtime assertions by using client.ObjectList instead of any for the list arg. I guess all of those types should match client.ObjectList. But that could just be a nice follow up patch imo. Something like:

-func deleteOwnedResources[L any, T any](
+func deleteOwnedResources[L client.ObjectList, T any](
        ctx context.Context,
        r *OpenStackReconciler,
        instance client.Object,
@@ -359,7 +359,7 @@ func deleteOwnedResources[L any, T any](
 ) error {
        log := r.GetLogger(ctx)

-       err := r.List(ctx, any(list).(client.ObjectList), &client.ListOptions{Namespace: instance.GetNamespace()})
+       err := r.List(ctx, list, &client.ListOptions{Namespace: instance.GetNamespace()})
        if err != nil {
                return errors.Wrap(err, "failed to list resources")
        }

I think that should still work.

Happy New Year btw.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 7, 2026

@bshephar: changing LGTM is restricted to collaborators

Details

In response to this:

Nice, looks good.

It could probably be tightened up to avoid the runtime assertions by using client.ObjectList instead of any for the list arg. I guess all of those types should match client.ObjectList. But that could just be a nice follow up patch imo. Something like:

-func deleteOwnedResources[L any, T any](
+func deleteOwnedResources[L client.ObjectList, T any](
       ctx context.Context,
       r *OpenStackReconciler,
       instance client.Object,
@@ -359,7 +359,7 @@ func deleteOwnedResources[L any, T any](
) error {
       log := r.GetLogger(ctx)

-       err := r.List(ctx, any(list).(client.ObjectList), &client.ListOptions{Namespace: instance.GetNamespace()})
+       err := r.List(ctx, list, &client.ListOptions{Namespace: instance.GetNamespace()})
       if err != nil {
               return errors.Wrap(err, "failed to list resources")
       }

I think that should still work.

Happy New Year btw.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@dprince dprince marked this pull request as draft January 7, 2026 22:56
@dprince dprince force-pushed the reinstall_operators branch from 73c5855 to 5468459 Compare January 8, 2026 17:50
@dprince dprince marked this pull request as ready for review January 8, 2026 17:50
@openshift-ci openshift-ci bot requested review from olliewalsh and rebtoor January 8, 2026 17:51
@dprince
Copy link
Contributor Author

dprince commented Jan 8, 2026

/test precommit-check

Copy link
Contributor

@stuggi stuggi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@stuggi
Copy link
Contributor

stuggi commented Jan 9, 2026

could you add a comment to let golangci-lint to ignore that package name

internal/dataplane/util/image_registry_test.go:17:9: var-naming: avoid meaningless package names (revive)
package util
        ^
1 issues:
* revive: 1

@dprince dprince force-pushed the reinstall_operators branch from 5468459 to e9798cf Compare January 9, 2026 14:52
@openshift-ci openshift-ci bot removed the lgtm label Jan 9, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 9, 2026

New changes are detected. LGTM label has been removed.

@dprince
Copy link
Contributor Author

dprince commented Jan 9, 2026

could you add a comment to let golangci-lint to ignore that package name

internal/dataplane/util/image_registry_test.go:17:9: var-naming: avoid meaningless package names (revive)
package util
        ^
1 issues:
* revive: 1

done

@dprince
Copy link
Contributor Author

dprince commented Jan 11, 2026

/retest

Copy link
Contributor

@bshephar bshephar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit.

) error {
log := r.GetLogger(ctx)

err := r.List(ctx, any(list).(client.ObjectList), &client.ListOptions{Namespace: instance.GetNamespace()})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the casting is necessary now that you have made L a client.ObjectList. I think it should work like this now:

Suggested change
err := r.List(ctx, any(list).(client.ObjectList), &client.ListOptions{Namespace: instance.GetNamespace()})
err := r.List(ctx, list, &client.ListOptions{Namespace: instance.GetNamespace()})

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 12, 2026

@bshephar: changing LGTM is restricted to collaborators

Details

In response to this:

Small nit.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 12, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bshephar, dprince, stuggi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Fixes upgrade failures from 0.4 to main caused by incompatible webhook
configuration changes that trigger index out of range panics during
manifest merging.

When OPENSTACK_RELEASE_VERSION is bumped, the controller now:
  - Detects the version change by comparing against status.ReleaseVersion
  - Deletes all owned resources (deployments, services, serviceaccounts, configmaps)
  - Removes managed webhooks (validating and mutating configurations)
  - Requeues to recreate resources with new manifests

This one-time cleanup ensures a clean slate for incompatible upgrades
where the structure of resources (especially webhooks) has changed between
versions.

Adds ReleaseVersion field to OpenStackStatus to track the deployed version.

Also, adds bounds checking to skip copying clientConfig for new webhooks
that don't exist in the current configuration, allowing the merge to
complete successfully.

Jira: OSPRH-23865

Co-authored-by: Brendan Shephard <bshephar@fedora-g16.bne-home.net>
@dprince dprince force-pushed the reinstall_operators branch from e9798cf to 2306cc6 Compare January 13, 2026 11:16
@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/ca7a76cdab8e43d2a4e0c7590a9eef08

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 29m 07s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 26m 19s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 34m 43s
adoption-standalone-to-crc-ceph-provider FAILURE in 1h 32m 43s
✔️ openstack-operator-tempest-multinode SUCCESS in 1h 37m 08s

@stuggi
Copy link
Contributor

stuggi commented Jan 14, 2026

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/33f5fa03d0a7401e9beefb4d161fa16f

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 53m 02s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 23m 30s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 39m 09s
adoption-standalone-to-crc-ceph-provider RETRY_LIMIT in 14m 34s
✔️ openstack-operator-tempest-multinode SUCCESS in 1h 38m 39s

@dprince
Copy link
Contributor Author

dprince commented Jan 15, 2026

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/f326743057c348008530c19a999888b0

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 21m 39s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 24m 06s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 29m 37s
adoption-standalone-to-crc-ceph-provider RETRY_LIMIT in 11m 32s
✔️ openstack-operator-tempest-multinode SUCCESS in 2h 06m 42s

@dprince
Copy link
Contributor Author

dprince commented Jan 15, 2026

recheck

@softwarefactory-project-zuul
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/2055c372045043c785f59c9ab418df4a

✔️ openstack-k8s-operators-content-provider SUCCESS in 3h 22m 54s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 28m 18s
cifmw-crc-podified-edpm-baremetal FAILURE in 32m 38s
✔️ adoption-standalone-to-crc-ceph-provider SUCCESS in 3h 07m 46s
✔️ openstack-operator-tempest-multinode SUCCESS in 1h 42m 56s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants