Skip to content

Fix k8s cluster upgrade in shared networks#4458

Merged
DaanHoogland merged 1 commit into
apache:4.14from
shapeblue:fix_k8s_upgrade_shared_net
Nov 20, 2020
Merged

Fix k8s cluster upgrade in shared networks#4458
DaanHoogland merged 1 commit into
apache:4.14from
shapeblue:fix_k8s_upgrade_shared_net

Conversation

@Pearl1594
Copy link
Copy Markdown
Contributor

Description

Upgrade of kubernetes cluster reports as successful on a shared network, when it doesn't successfully upgrade the nodes

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Screenshots (if appropriate):

How Has This Been Tested?

Prior to the fix:
Deployed a 1.16.0 k8s cluster:

k8s-master ~ # kubectl get nodes
NAME         STATUS                     ROLES    AGE     VERSION
k8s-master   Ready,SchedulingDisabled   master   6m23s   v1.16.0
k8s-node-1   Ready                      <none>   5m51s   v1.16.0

Upgraded the cluster to 1.16.3, though the upgradeKubernetesCluster API returns a successful response, it doesn't upgrade the worker nodes:

k8s-master ~ # kubectl get nodes
NAME         STATUS   ROLES    AGE     VERSION
k8s-master   Ready    master   8m35s   v1.16.3
k8s-node-1   Ready    <none>   8m3s    v1.16.0

Post fix, once the upgrade completes i.e., when the upgradeKubernetesCluster API returns a success response, kubectl get nodes shows all nodes to be at the upgraded version, here, 1.16.3

c2-master ~ # kubectl get nodes
NAME        STATUS   ROLES    AGE   VERSION
c2-master   Ready    master   18m   v1.16.0
c2-node-1   Ready    <none>   18m   v1.16.0
c2-master ~ # kubectl get nodes
NAME        STATUS                     ROLES    AGE   VERSION
c2-master   Ready,SchedulingDisabled   master   19m   v1.16.0
c2-node-1   Ready                      <none>   18m   v1.16.0

...
c2-master ~ # kubectl get nodes
NAME        STATUS     ROLES    AGE   VERSION
c2-master   Ready      master   25m   v1.16.3
c2-node-1   NotReady   <none>   25m   v1.16.3

@Pearl1594
Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@Pearl1594 a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔centos7 ✔centos8 ✔debian. JID-2352

@Pearl1594 Pearl1594 requested a review from shwstppr November 10, 2020 18:37
@Pearl1594
Copy link
Copy Markdown
Contributor Author

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@Pearl1594 a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-3141)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 33913 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4458-t3141-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_supported_versions.py
Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
Smoke tests completed. 85 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestKubernetesSupportedVersion>:setup Error 0.00 test_kubernetes_supported_versions.py

@shwstppr
Copy link
Copy Markdown
Contributor

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@shwstppr a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-3195)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 34490 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4458-t3195-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
Smoke tests completed. 85 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_03_deploy_and_upgrade_kubernetes_cluster Failure 247.53 test_kubernetes_clusters.py

@DaanHoogland DaanHoogland added this to the 4.15.0.0 milestone Nov 18, 2020
@DaanHoogland
Copy link
Copy Markdown
Contributor

Major issue in new functionality, @PaulAngus (IMNSHO). Do we accept this, still?

@shwstppr
Copy link
Copy Markdown
Contributor

@DaanHoogland k8s cluster on shared network was present in 4.14 itself. IMO, this should be rebased to 4.14 and forward merge in master @Pearl1594

@Pearl1594 Pearl1594 force-pushed the fix_k8s_upgrade_shared_net branch from b091178 to 0b3d449 Compare November 18, 2020 10:04
@Pearl1594 Pearl1594 changed the base branch from master to 4.14 November 18, 2020 10:04
@yadvr yadvr modified the milestones: 4.15.0.0, 4.14.1.0 Nov 18, 2020
@yadvr
Copy link
Copy Markdown
Member

yadvr commented Nov 18, 2020

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔centos7 ✖centos8 ✔debian. JID-2404

Copy link
Copy Markdown
Contributor

@davidjumani davidjumani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@shwstppr shwstppr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Nov 19, 2020

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-3208)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 30505 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4458-t3208-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_diagnostics.py
Smoke tests completed. 83 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File

@DaanHoogland DaanHoogland merged commit daec77f into apache:4.14 Nov 20, 2020
@DaanHoogland DaanHoogland deleted the fix_k8s_upgrade_shared_net branch November 20, 2020 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants