Skip to content

server: Capacity check should take vms in Migrating state into calculation#3727

Merged
yadvr merged 1 commit into
apache:4.13from
ustcweizhou:4.13-host-capacity-migrating-vms
Jan 28, 2020
Merged

server: Capacity check should take vms in Migrating state into calculation#3727
yadvr merged 1 commit into
apache:4.13from
ustcweizhou:4.13-host-capacity-migrating-vms

Conversation

@ustcweizhou
Copy link
Copy Markdown
Contributor

Description

When we calculate a resource consumption of a host, we need to take the vms in following states into calculation: Running, Starting, Stopping, Migrating (to the host), and vms are Migrating from the host. Because, when stop a vm, the resource on host will be released when vm is stopped. When migrate a vm, the resource on destination host will be increased before migration starts, and resource on source host will be decreased after migraiton succeeds.

In cloudstack, there is a task named CapacityChecked which run every 5 minutes (capacity.check.period =300000 ms by default). It recalculates capacity of all hosts. However, it takes only vms in Running and Starting into consideration. We have faced some issues in host maintenance due to it.

Steps to reproduce the issue
(1) migrate N vms from host A to host B, cpu/ram resource increases before the migration.
(2) capacity check recalculate the capacity of hosts. used capacity of Host B will be reset to original value (not including the vms in Migrating).
(3) migrate some more vms from other host to host B, the migrations are allowed by cloudstack (because used capacity is incorrect). If the actual used memory exceed the physical memory on the host, there might be some critical issues (for example, libvirt dies)

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Screenshots (if appropriate):

How Has This Been Tested?

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan package

@apache apache deleted a comment from blueorangutan Jan 21, 2020
@apache apache deleted a comment from blueorangutan Jan 21, 2020
@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@apache apache deleted a comment from blueorangutan Jan 21, 2020
@apache apache deleted a comment from blueorangutan Jan 21, 2020
@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-641

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-807)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 40427 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3727-t807-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Smoke tests completed. 76 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_04_rvpc_network_garbage_collector_nics Failure 325.05 test_vpc_redundant.py

Copy link
Copy Markdown
Contributor

@DaanHoogland DaanHoogland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm

@weizhouapache
Copy link
Copy Markdown
Member

thanks @DaanHoogland

@rhtyd @svenvogel can you please review ? we need one more approvals

@yadvr yadvr merged commit 71e53ab into apache:4.13 Jan 28, 2020
ustcweizhou added a commit to ustcweizhou/cloudstack that referenced this pull request Feb 28, 2020
…ation (apache#3727)

When we calculate a resource consumption of a host, we need to take the vms in following states into calculation: Running, Starting, Stopping, Migrating (to the host), and vms are Migrating from the host. Because, when stop a vm, the resource on host will be released when vm is stopped. When migrate a vm, the resource on destination host will be increased before migration starts, and resource on source host will be decreased after migraiton succeeds.

In cloudstack, there is a task named CapacityChecked which run every 5 minutes (capacity.check.period =300000 ms by default). It recalculates capacity of all hosts. However, it takes only vms in Running and Starting into consideration. We have faced some issues in host maintenance due to it.

Steps to reproduce the issue
(1) migrate N vms from host A to host B, cpu/ram resource increases before the migration.
(2) capacity check recalculate the capacity of hosts. used capacity of Host B will be reset to original value (not including the vms in Migrating).
(3) migrate some more vms from other host to host B, the migrations are allowed by cloudstack (because used capacity is incorrect). If the actual used memory exceed the physical memory on the host, there might be some critical issues (for example, libvirt dies)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants