server: Fix update capacity for hosts take long time if there are many service offerings by ustcweizhou · Pull Request #4623 · apache/cloudstack

ustcweizhou · 2021-01-27T13:10:55Z

Description

This PR fixes the issue that update capacity for hosts take long time if there are many service offerings.

Steps to reproduce the issue:

(1)Create 10000 service offerings (by db changes or cloudmonkey).

(2) Check the total time of periodical capacity check in cloudstack.

Without this patch, it spend 2.5 seconds (2 hosts)

2021-01-15 16:10:12,793 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:ctx-5d5f3b3b) (logid:f5eb68ba) Running Capacity Checker ...
2021-01-15 16:10:15,287 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:ctx-5d5f3b3b) (logid:f5eb68ba) Done running Capacity Checker ...

With this patch ,it spend 1.3 seconds (2 hosts)

2021-01-15 16:12:43,604 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:ctx-a2a7f3f1) (logid:f7e0a4c5) Running Capacity Checker ...
2021-01-15 16:12:44,927 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:ctx-a2a7f3f1) (logid:f7e0a4c5) Done running Capacity Checker ...

If there are 100 hosts, the total time will be reduced from 100+ seconds to around 10 seconds.
This helps a lot to reduce the execution time of prometheus exporter.

Types of changes

Breaking change (fix or feature that would cause existing functionality to change)
New feature (non-breaking change which adds functionality)
Bug fix (non-breaking change which fixes an issue)
Enhancement (improves an existing feature and functionality)
Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

Major
Minor

Bug Severity

Screenshots (if appropriate):

How Has This Been Tested?

…y service offerings Steps to reproduce the issue: (1)Create 10000 service offerings (by db changes below or cloudmonkey). ``` DROP PROCEDURE IF EXISTS cloud.insert_service_offering; DELIMITER $$ CREATE PROCEDURE cloud.insert_service_offering() BEGIN DECLARE count INT DEFAULT 10000; SET @offeringid = (select max(id)+1 from disk_offering); WHILE count > 0 DO INSERT INTO disk_offering (id,name,uuid,display_text,disk_size,type,created) values (@offeringid,'test-offering-wei',uuid(), 'test-offering-wei',0,'Service',now()); INSERT INTO service_offering (id,cpu,speed,ram_size) values (@offeringid, 1, 500,256); SET @offeringid = @offeringid + 1; SET count = count - 1; END WHILE; END $$ DELIMITER ; CALL cloud.insert_service_offering(); mysql> CALL cloud.insert_service_offering(); Query OK, 0 rows affected (2 min 30.85 sec) ``` (2) Check the total time of periodical capacity check in cloudstack. Without this patch, it spend 2.5 seconds (2 hosts) ``` 2021-01-15 16:10:12,793 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:ctx-5d5f3b3b) (logid:f5eb68ba) Running Capacity Checker ... 2021-01-15 16:10:15,287 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:ctx-5d5f3b3b) (logid:f5eb68ba) Done running Capacity Checker ... ``` With this patch ,it spend 1.3 seconds (2 hosts) ``` 2021-01-15 16:12:43,604 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:ctx-a2a7f3f1) (logid:f7e0a4c5) Running Capacity Checker ... 2021-01-15 16:12:44,927 DEBUG [c.c.a.AlertManagerImpl] (CapacityChecker:ctx-a2a7f3f1) (logid:f7e0a4c5) Done running Capacity Checker ... ``` If there are 100 hosts, the total time will be reduced from 100+ seconds to around 10 seconds.

yadvr · 2021-01-27T14:22:58Z

@blueorangutan package

blueorangutan · 2021-01-27T14:24:23Z

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

blueorangutan · 2021-01-27T15:25:41Z

Packaging result: ✔centos7 ✖centos8 ✔debian. JID-2607

yadvr · 2021-02-01T08:45:08Z

@blueorangutan test

blueorangutan · 2021-02-01T08:46:42Z

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

blueorangutan · 2021-02-01T19:19:16Z

Trillian test result (tid-3457)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 36001 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4623-t3457-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Intermittent failure detected: /marvin/tests/smoke/test_nic.py
Smoke tests completed. 81 look OK, 2 have error(s)
Only failed tests results shown below:

Test	Result	Time (s)	Test File
test_03_deploy_and_upgrade_kubernetes_cluster	`Failure`	258.93	test_kubernetes_clusters.py
test_01_nic	`Error`	50.06	test_nic.py

yadvr · 2021-02-02T10:38:20Z

@ustcweizhou can you review the test_01_nic failure?

weizhouapache · 2021-02-02T11:10:39Z

@ustcweizhou can you review the test_01_nic failure?

@rhtyd I ran the test on 4.14 and 4.15, both succeed.
can you re-kick the test ?

yadvr · 2021-02-02T11:12:34Z

@blueorangutan package

blueorangutan · 2021-02-02T11:13:31Z

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

shwstppr · 2021-02-02T11:14:37Z

Running failed test manually @weizhouapache @rhtyd

blueorangutan · 2021-02-02T11:55:23Z

Packaging result: ✔centos7 ✖centos8 ✔debian. JID-2633

DaanHoogland

looks like a sensible optimisation; fetch the offerings before looping over hosts instead of re-fetch for each host. I think only regression tests are needed for this one.

DaanHoogland · 2021-02-02T13:57:50Z

@blueorangutan test

blueorangutan · 2021-02-02T13:58:37Z

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

shwstppr · 2021-02-02T14:34:05Z

@ustcweizhou can you review the test_01_nic failure?

Failing test verified manually,

==== Marvin Init Started ====

=== Marvin Parse Config Successful ===

=== Marvin Setting TestData Successful===

==== Log Folder Path: /marvin/MarvinLogs/Feb_02_2021_14_27_56_UX96H4. All logs will be available here ====

=== Marvin Init Logging Successful===

==== Marvin Init Successful ====
=== TestName: test_01_nic | Status : SUCCESS ===

blueorangutan · 2021-02-03T03:47:15Z

Trillian test result (tid-3474)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 47885 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4623-t3474-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Smoke tests completed. 82 look OK, 1 have error(s)
Only failed tests results shown below:

Test	Result	Time (s)	Test File
test_01_deploy_kubernetes_cluster	`Failure`	3602.88	test_kubernetes_clusters.py
test_03_deploy_and_upgrade_kubernetes_cluster	`Failure`	251.15	test_kubernetes_clusters.py
test_08_deploy_and_upgrade_kubernetes_ha_cluster	`Failure`	166.53	test_kubernetes_clusters.py
ContextSuite context=TestKubernetesCluster>:teardown	`Error`	760.79	test_kubernetes_clusters.py

yadvr added this to the 4.14.1.0 milestone Jan 27, 2021

yadvr requested a review from shwstppr February 1, 2021 08:59

weizhouapache closed this Feb 2, 2021

weizhouapache reopened this Feb 2, 2021

DaanHoogland approved these changes Feb 2, 2021

View reviewed changes

shwstppr approved these changes Feb 2, 2021

View reviewed changes

yadvr merged commit 78f73c1 into apache:4.14 Feb 4, 2021

Conversation

ustcweizhou commented Jan 27, 2021

Description

Types of changes

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

Bug Severity

Screenshots (if appropriate):

How Has This Been Tested?

Uh oh!

yadvr commented Jan 27, 2021

Uh oh!

blueorangutan commented Jan 27, 2021

Uh oh!

blueorangutan commented Jan 27, 2021

Uh oh!

yadvr commented Feb 1, 2021

Uh oh!

blueorangutan commented Feb 1, 2021

Uh oh!

blueorangutan commented Feb 1, 2021

Uh oh!

yadvr commented Feb 2, 2021

Uh oh!

weizhouapache commented Feb 2, 2021

Uh oh!

yadvr commented Feb 2, 2021

Uh oh!

blueorangutan commented Feb 2, 2021

Uh oh!

shwstppr commented Feb 2, 2021

Uh oh!

blueorangutan commented Feb 2, 2021

Uh oh!

DaanHoogland left a comment

Choose a reason for hiding this comment

Uh oh!

DaanHoogland commented Feb 2, 2021

Uh oh!

blueorangutan commented Feb 2, 2021

Uh oh!

shwstppr commented Feb 2, 2021

Uh oh!

blueorangutan commented Feb 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants