Skip to content

indirect.agent.lb.algorithm set to roundrobin cause SSVM/CPVM not to start #3722

@andrijapanicsb

Description

@andrijapanicsb
ISSUE TYPE
  • Bug Report
COMPONENT NAME
indirect agent load balancing algorithm
CLOUDSTACK VERSION
4.13
CONFIGURATION
  • Advanced zone configuration with 1 zone, 1 pod, 1 management hosts, 1 KVM hosts, Shared storage (NFS)
  • “indirect.agent.lb.algorithm” set to “roundrobin”
  • “host” value contains comma separated list of both management servers
  • Issue Observed in env with XenServer 6.5sp1 but should not be relevant
OS / ENVIRONMENT

NA

SUMMARY

When SSVM/CPVM is destroyed, it can't be started automatically any more and the following error is shown in the mgmt error log:

2019-11-27 10:11:55,652 ERROR [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-137:ctx-19e1bf7d job-17904/job-18318 ctx-080fa0b0) (logid:d507f5ae) Invocation exception, caused by: java.lang.IndexOutOfBoundsException: fromIndex = -1 2019-11-27 10:11:55,654 INFO [c.c.v.VmWorkJobHandlerProxy] (Work-Job-Executor-137:ctx-19e1bf7d job-17904/job-18318 ctx-080fa0b0) (logid:d507f5ae) Rethrow exception java.lang.IndexOutOfBoundsException: fromIndex = -1 2019-11-27 10:11:55,654 DEBUG [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-137:ctx-19e1bf7d job-17904/job-18318) (logid:d507f5ae) Done with run of VM work job: com.cloud.vm.VmWorkStart for VM 3712, job origin: 17904 2019-11-27 10:11:55,654 ERROR [c.c.v.VmWorkJobDispatcher] (Work-Job-Executor-137:ctx-19e1bf7d job-17904/job-18318) (logid:d507f5ae) Unable to complete AsyncJobVO {id:18318, userId: 1, accountId: 1, instanceType: null, instanceId: null, cmd: com.cloud.vm.VmWorkStart, cmdInfo: rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oABGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5nL0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-AAN4cAAAAAAAAAABAAAAAAAAAAEAAAAAAAAOgHQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwcHA, cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: null, initMsid: 32986472974184, completeMsid: null, lastUpdated: null, lastPolled: null, created: Wed Nov 27 10:11:54 UTC 2019, removed: null}, job origin:17904 java.lang.IndexOutOfBoundsException: fromIndex = -1 at java.util.SubList.<init>(AbstractList.java:620) at java.util.RandomAccessSubList.<init>(AbstractList.java:775) at java.util.AbstractList.subList(AbstractList.java:484) at org.apache.cloudstack.agent.lb.algorithm.IndirectAgentLBRoundRobinAlgorithm.sort(IndirectAgentLBRoundRobinAlgorithm.java:44) at org.apache.cloudstack.agent.lb.IndirectAgentLBServiceImpl.getManagementServerList(IndirectAgentLBServiceImpl.java:93) at org.apache.cloudstack.secondarystorage.SecondaryStorageManagerImpl.finalizeVirtualMachineProfile(SecondaryStorageManagerImpl.java:1128) at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1110) at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:5107) at sun.reflect.GeneratedMethodAccessor233.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107) at com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:5270) at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:583) at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:531) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-11-27 10:11:55,655 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-137:ctx-19e1bf7d job-17904/job-18318) (logid:d507f5ae) Complete async job-18318, jobStatus: FAILED, resultCode: 0, result: rO0ABXNyACNqYXZhLmxhbmcuSW5kZXhPdXRPZkJvdW5kc0V4Y2VwdGlvbgM_xZ_atCsnAgAAeHIAGmphdmEubGFuZy5SdW50aW1lRXhjZXB0aW9unl8GRwo0g-UCAAB4cgATamF2YS5sYW5nLkV4Y2VwdGlvbtD9Hz4aOxzEAgAAeHIAE2phdmEubGFuZy5UaHJvd2FibGXVxjUnOXe4ywMABEwABWNhdXNldAAVTGphdmEvbGFuZy9UaHJvd2FibGU7TAANZGV0YWlsTWVzc2FnZXQAEkxqYXZhL2xhbmcvU3RyaW5nO1sACnN0YWNrVHJhY2V0AB5bTGphdmEvbGFuZy9TdGFja1RyYWNlRWxlbWVudDtMABRzdXBwcmVzc2VkRXhjZXB0aW9uc3QAEExqYXZhL3V0aWwvTGlzdDt4cHEAfgAIdAAOZnJvbUluZGV4ID0gLTF1cgAeW0xqYXZhLmxhbmcuU3RhY2tUcmFjZUVsZW1lbnQ7AkYqPDz9IjkCAAB4cAAAABpzcgAbamF2YS5sYW5nLlN0YWNrVHJhY2VFbGVtZW50YQnFmiY23YUCAARJAApsaW5lTnVtYmVyTAAOZGVjbGFyaW5nQ2xhc3NxAH4ABUwACGZpbGVOYW1lcQB-AAVMAAptZXRob2ROYW1lcQB-AAV4cAAAAmx0ABFqYXZhLnV0aWwuU3ViTGlzdHQAEUFic3RyYWN0TGlzdC5qYXZhdAAGPGluaXQ-c3EAfgAMAAADB3QAHWphdmEudXRpbC5SYW5kb21BY2Nlc3NTdWJMaXN0cQB-AA9xAH4AEHNxAH4ADAAAAeR0ABZqYXZhLnV0aWwuQWJzdHJhY3RMaXN0cQB-AA90AAdzdWJMaXN0c3EAfgAMAAAALHQAS29yZy5hcGFjaGUuY2xvdWRzdGFjay5hZ2VudC5sYi5hbGdvcml0aG0uSW5kaXJlY3RBZ2VudExCUm91bmRSb2JpbkFsZ29yaXRobXQAJ0luZGlyZWN0QWdlbnRMQlJvdW5kUm9iaW5BbGdvcml0aG0uamF2YXQABHNvcnRzcQB-AAwAAABddAA5b3JnLmFwYWNoZS5jbG91ZHN0YWNrLmFnZW50LmxiLkluZGlyZWN0QWdlbnRMQlNlcnZpY2VJbXBsdAAfSW5kaXJlY3RBZ2VudExCU2VydmljZUltcGwuamF2YXQAF2dldE1hbmFnZW1lbnRTZXJ2ZXJMaXN0c3EAfgAMAAAEaHQAQm9yZy5hcGFjaGUuY2xvdWRzdGFjay5zZWNvbmRhcnlzdG9yYWdlLlNlY29uZGFyeVN0b3JhZ2VNYW5hZ2VySW1wbHQAIFNlY29uZGFyeVN0b3JhZ2VNYW5hZ2VySW1wbC5qYXZhdAAdZmluYWxpemVWaXJ0dWFsTWFjaGluZVByb2ZpbGVzcQB-AAwAAARWdAAmY29tLmNsb3VkLnZtLlZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGx0AB5WaXJ0dWFsTWFjaGluZU1hbmFnZXJJbXBsLmphdmF0ABBvcmNoZXN0cmF0ZVN0YXJ0c3EAfgAMAAAT83EAfgAjcQB-ACRxAH4AJXNxAH4ADP____90ACZzdW4ucmVmbGVjdC5HZW5lcmF0ZWRNZXRob2RBY2Nlc3NvcjIzM3B0AAZpbnZva2VzcQB-AAwAAAArdAAoc3VuLnJlZmxlY3QuRGVsZWdhdGluZ01ldGhvZEFjY2Vzc29ySW1wbHQAIURlbGVnYXRpbmdNZXRob2RBY2Nlc3NvckltcGwuamF2YXEAfgApc3EAfgAMAAAB8nQAGGphdmEubGFuZy5yZWZsZWN0Lk1ldGhvZHQAC01ldGhvZC5qYXZhcQB-AClzcQB-AAwAAABrdAAiY29tLmNsb3VkLnZtLlZtV29ya0pvYkhhbmRsZXJQcm94eXQAGlZtV29ya0pvYkhhbmRsZXJQcm94eS5qYXZhdAAPaGFuZGxlVm1Xb3JrSm9ic3EAfgAMAAAUlnEAfgAjcQB-ACRxAH4AM3NxAH4ADAAAAGZ0ACBjb20uY2xvdWQudm0uVm1Xb3JrSm9iRGlzcGF0Y2hlcnQAGFZtV29ya0pvYkRpc3BhdGNoZXIuamF2YXQABnJ1bkpvYnNxAH4ADAAAAkd0AD9vcmcuYXBhY2hlLmNsb3Vkc3RhY2suZnJhbWV3b3JrLmpvYnMuaW1wbC5Bc3luY0pvYk1hbmQuY29udGV4dC5pbXBsLkRlZmF1bHRNYW5hZ2VkQ29udGV4dCQxdAAaRGVmYXVsdE1hbmFnZWRDb250ZXh0LmphdmF0AARjYWxsc3EAfgAMAAAAZ3QAQG9yZy5hcGFjaGUuY2xvdWRzdGFjay5tYW5hZ2VkLmNvbnRleHQuaW1wbC5EZWZhdWx0TWFuYWdlZENvbnRleHRxAH4AQ3QAD2NhbGxXaXRoQ29udGV4dHNxAH4ADAAAADVxAH4ARnEAfgBDdAAOcnVuV2l0aENvbnRleHRzcQB-AAwAAAAudAA8b3JnLmFwYWNoZS5jbG91ZHN0YWNrLm1hbmFnZWQuY29udGV4dC5NYW5hZ2VkQ29udGV4dFJ1bm5hYmxlcQB-AD9xAH4AQHNxAH4ADAAAAhNxAH4AOnEAfgA7cQB-AEBzcQB-AAwAAAH_dAAuamF2YS51dGlsLmNvbmN1cnJlbnQuRXhlY3V0b3JzJFJ1bm5hYmxlQWRhcHRlcnQADkV4ZWN1dG9ycy5qYXZhcQB-AERzcQB-AAwAAAEKdAAfamF2YS51dGlsLmNvbmN1cnJlbnQuRnV0dXJlVGFza3QAD0Z1dHVyZVRhc2suamF2YXEAfgBAc3EAfgAMAAAEfXQAJ2phdmEudXRpbC5jb25jdXJyZW50LlRocmVhZFBvb2xFeGVjdXRvcnQAF1RocmVhZFBvb2xFeGVjdXRvci5qYXZhdAAJcnVuV29ya2Vyc3EAfgAMAAACcHQALmphdmEudXRpbC5jb25jdXJyZW50LlRocmVhZFBvb2xFeGVjdXRvciRXb3JrZXJxAH4AVXEAfgBAc3EAfgAMAAAC7HQAEGphdmEubGFuZy5UaHJlYWR0AAtUaHJlYWQuamF2YXEAfgBAc3IAJmphdmEudXRpbC5Db2xsZWN0aW9ucyRVbm1vZGlmaWFibGVMaXN0_A8lMbXsjhACAAFMAARsaXN0cQB-AAd4cgAsamF2YS51dGlsLkNvbGxlY3Rpb25zJFVubW9kaWZpYWJsZUNvbGxlY3Rpb24ZQgCAy173HgIAAUwAAWN0ABZMamF2YS91dGlsL0NvbGxlY3Rpb247eHBzcgATamF2YS51dGlsLkFycmF5TGlzdHiB0h2Zx2GdAwABSQAEc2l6ZXhwAAAAAHcEAAAAAHhxAH4AYXg 2019-11-27 10:11:55,658 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-137:ctx-19e1bf7d job-17904/job-18318) (logid:d507f5ae) Publish async job-18318 complete on message bus 2019-11-27 10:11:55,658 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-137:ctx-19e1bf7d job-17904/job-18318) (logid:d507f5ae) Wake up jobs related to job-18318 2019-11-27 10:11:55,658 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-137:ctx-19e1bf7d job-17904/job-18318) (logid:d507f5ae) Update db status for job-18318 2019-11-27 10:11:55,659 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-137:ctx-19e1bf7d job-17904/job-18318) (logid:d507f5ae) Wake up jobs joined with job-18318 and disjoin all subjobs created from job- 18318 2019-11-27 10:11:55,674 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl] (Work-Job-Executor-137:ctx-19e1bf7d job-17904/job-18318) (logid:d507f5ae) Done executing com.cloud.vm.VmWorkStart for job-18318 2019-11-27 10:11:55,675 INFO [o.a.c.f.j.i.AsyncJobMonitor] (Work-Job-Executor-137:ctx-19e1bf7d job-17904/job-18318) (logid:d507f5ae) Remove job-18318 from job monitoring 2019-11-27 10:11:55,684 WARN [o.a.c.s.SecondaryStorageManagerImpl] (secstorage-1:ctx-f2145e12) (logid:6dbadcd6) Exception while trying to start secondary storage vm java.lang.IndexOutOfBoundsException: fromIndex = -1 at java.util.SubList.<init>(AbstractList.java:620) at java.util.RandomAccessSubList.<init>(AbstractList.java:775) at java.util.AbstractList.subList(AbstractList.java:484) at org.apache.cloudstack.agent.lb.algorithm.IndirectAgentLBRoundRobinAlgorithm.sort(IndirectAgentLBRoundRobinAlgorithm.java:44) at org.apache.cloudstack.agent.lb.IndirectAgentLBServiceImpl.getManagementServerList(IndirectAgentLBServiceImpl.java:93) at org.apache.cloudstack.secondarystorage.SecondaryStorageManagerImpl.finalizeVirtualMachineProfile(SecondaryStorageManagerImpl.java:1128) at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1110) at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:5107) at sun.reflect.GeneratedMethodAccessor233.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107) at com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:5270) at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:583) at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) at org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:531) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 2019-11-27 10:11:55,684 INFO [o.a.c.s.SecondaryStorageManagerImpl] (secstorage-1:ctx-f2145e12) (logid:6dbadcd6) Unable to start secondary storage vm for standby capacity, vm id : 3712, will recycle it and start a new one

STEPS TO REPRODUCE

EXPECTED RESULTS
SSVM/CPVM starts fine
ACTUAL RESULTS
one of those can't be started any more
Setting back the “indirect.agent.lb.algorithm” set to “static” will fix the problem and both SSVM/CPVM will start fine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions