ISSUE TYPE
- (Maybe) Bug report
- Improvement Request
COMPONENT NAME
CLOUDSTACK VERSION
CONFIGURATION
Single Management server, 2 Shared Networks with non-HA VR each.
OS / ENVIRONMENT
Management Server: CentOS 7
Hypervisor: XCP-ng 8.5
SUMMARY
Here and there, we get an e-mail from the management server like this:
Health checks failed: 2 failing checks on router r-6849-VM / b8679d51-5f2b-4dfa-913c-4c73506dee2e
STEPS TO REPRODUCE
Unknown, but I suspect it might happen during VM provisioning?
EXPECTED RESULTS
Have the reason for the (supposed) failure in the email.
ACTUAL RESULTS
It only says that there are Health checks failed. But we notice no real service failure, and at first have no idea what is wrong.
Only after grepping through the management server logs, I noticed these two lines:
2023-11-02 08:57:22,320 INFO [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-33ca5294) (logid:2e04f053) Found health check com.cloud.network.dao.RouterHealthCheckResultVO@28fabd51- check type: advanced,check name: dhcp_check.py, check result: false, check last update: Thu Nov 02 08:50:02 CET 2023, details: Missing elements in dhcphosts.txt -
1e:00:5a:00:21:8d 10.12.96.9 test-ci-12 which took running duration (ms) 85.2541923523
2023-11-02 08:57:22,322 INFO [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-33ca5294) (logid:2e04f053) Found health check com.cloud.network.dao.RouterHealthCheckResultVO@6d617c4c- check type: advanced,check name: dns_check.py, check result: false, check last update: Thu Nov 02 08:50:02 CET 2023, details: Missing entries for VMs in /etc/hosts -
A bit above, I found this (good to have DEBUG log level on all the time), I shortened it to the relevant part:
2023-11-02 08:57:22,300 DEBUG [c.c.n.r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:ctx-33ca5294) (logid:2e04f053) Parsing and updating DB health check data for router: 6849 with data: {"basic":{"lastRun <SNIP> "message": "Missing entries for VMs in /etc/hosts -\n10.12.96.9 test-ci-12", "success": "false"}, <SNIP>
That VM was just being created. Maybe the health check ran during VM creation and got an inconsistent snapshot of the situation?
ISSUE TYPE
COMPONENT NAME
CLOUDSTACK VERSION
CONFIGURATION
Single Management server, 2 Shared Networks with non-HA VR each.
OS / ENVIRONMENT
Management Server: CentOS 7
Hypervisor: XCP-ng 8.5
SUMMARY
Here and there, we get an e-mail from the management server like this:
STEPS TO REPRODUCE
Unknown, but I suspect it might happen during VM provisioning?
EXPECTED RESULTS
Have the reason for the (supposed) failure in the email.
ACTUAL RESULTS
It only says that there are Health checks failed. But we notice no real service failure, and at first have no idea what is wrong.
Only after grepping through the management server logs, I noticed these two lines:
A bit above, I found this (good to have DEBUG log level on all the time), I shortened it to the relevant part:
That VM was just being created. Maybe the health check ran during VM creation and got an inconsistent snapshot of the situation?