Skip to content

Fix destroying k8s cluster on shared networks#4461

Merged
yadvr merged 2 commits into
apache:4.14from
shapeblue:fix-k8s-deletion
Nov 18, 2020
Merged

Fix destroying k8s cluster on shared networks#4461
yadvr merged 2 commits into
apache:4.14from
shapeblue:fix-k8s-deletion

Conversation

@Pearl1594
Copy link
Copy Markdown
Contributor

Description

Deletion of K8s cluster on shared networks fails due to attempt to clear network rules, however, no network rules get added for clusters brought up on Shared networks.

2020-11-10 16:46:26,125 WARN  [c.c.k.c.a.KubernetesClusterActionWorker] (API-Job-Executor-7:ctx-46554d1f job-67 ctx-cf887921) (logid:ed80132a) Failed to remove network rules of Kubernetes cluster : c1
com.cloud.exception.ManagementServerException: No source NAT IP addresses found for network : shared1
        at com.cloud.kubernetes.cluster.actionworkers.KubernetesClusterDestroyWorker.deleteKubernetesClusterNetworkRules(KubernetesClusterDestroyWorker.java:145)
        at com.cloud.kubernetes.cluster.actionworkers.KubernetesClusterDestroyWorker.destroy(KubernetesClusterDestroyWorker.java:242)
        at com.cloud.kubernetes.cluster.KubernetesClusterManagerImpl.deleteKubernetesCluster(KubernetesClusterManagerImpl.java:1144)
        at org.apache.cloudstack.api.command.user.kubernetes.cluster.DeleteKubernetesClusterCmd.execute(DeleteKubernetesClusterCmd.java:77)
        at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:156)
        at com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
        at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:620)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
        at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
        at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45)
        at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:568)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

How Has This Been Tested?

Deployed a cluster on a shared network and then cleaned it up

@Pearl1594 Pearl1594 requested a review from shwstppr November 11, 2020 09:45
@Pearl1594
Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@Pearl1594 a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2365

Comment on lines +243 to +246
NetworkVO kubernetesClusterNetwork = networkDao.findById(kubernetesCluster.getNetworkId());
if (kubernetesClusterNetwork != null && kubernetesClusterNetwork.getGuestType() != Network.GuestType.Shared) {
deleteKubernetesClusterNetworkRules();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I'm not a nice guy, but can we please have new code in separate methods:

private void checkForRulesToDelete(){
    NetworkVO kubernetesClusterNetwork = networkDao.findById(kubernetesCluster.getNetworkId());
    if (kubernetesClusterNetwork != null && kubernetesClusterNetwork.getGuestType() != Network.GuestType.Shared) {
        deleteKubernetesClusterNetworkRules();
    }
}

and then

Suggested change
NetworkVO kubernetesClusterNetwork = networkDao.findById(kubernetesCluster.getNetworkId());
if (kubernetesClusterNetwork != null && kubernetesClusterNetwork.getGuestType() != Network.GuestType.Shared) {
deleteKubernetesClusterNetworkRules();
}
checkForRulesToDelete();

@Pearl1594
Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@Pearl1594 a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2369

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2371

@Pearl1594
Copy link
Copy Markdown
Contributor Author

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@Pearl1594 a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-3158)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 31254 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4461-t3158-kvm-centos7.zip
Smoke tests completed. 86 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File

@DaanHoogland
Copy link
Copy Markdown
Contributor

@Pearl1594 is this really minor? k8 would not work at all in shared nets because of this would it? cc @PaulAngus
@shwstppr can you review this please?

Copy link
Copy Markdown
Contributor

@shwstppr shwstppr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Pearl1594
Copy link
Copy Markdown
Contributor Author

@DaanHoogland It isn't a critical bug - and can be pushed to the next milestone if need be. It basically looks after the scenario wherein when a cluster deployed in a shared network doesn't get cleaned up / garbage collected and indefinitely stays in destroying state

@shwstppr
Copy link
Copy Markdown
Contributor

@Pearl1594 @DaanHoogland I think this should go to 4.14.1.0

@DaanHoogland
Copy link
Copy Markdown
Contributor

@Pearl1594 can you rebase on 4.14?

@Pearl1594 Pearl1594 changed the base branch from master to 4.14 November 18, 2020 09:58
@Pearl1594
Copy link
Copy Markdown
Contributor Author

Done @DaanHoogland

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔centos7 ✖centos8 ✔debian. JID-2398

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan test

@DaanHoogland DaanHoogland modified the milestones: 4.15.0.0, 4.14.1.0 Nov 18, 2020
@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

Copy link
Copy Markdown
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code looks good

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-3196)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 32934 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4461-t3196-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_supported_versions.py
Smoke tests completed. 83 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Nov 18, 2020

Lgtm

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Nov 18, 2020

This fixes a valid case for a supported feature, let's merge

@yadvr yadvr merged commit 1692df4 into apache:4.14 Nov 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants