Fix test flakiness due to time.time() calls#362
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
In the reboot_test.py, the assert calls check for the exact timestamp to match. However sometimes there is a delay between when the call is made and when the assert is checked. If this delay is greater than 1 second, the timestamps diverge and the assert fails. This is a flakiness introduced by the test framework and hence we should mock it and make it more robust. Signed-off-by: arista-hpandya <hpandya@arista.com>
f5f99d4 to
a40bbc8
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@vvolam , please help review? thanks |
vvolam
left a comment
There was a problem hiding this comment.
Overall the time.time() fix looks good. One suggestion below for a related pre-existing issue in test_execute_reboot_fail_halt_timeout.
| with ( | ||
| mock.patch("reboot._run_command") as mock_run_command, | ||
| mock.patch("time.sleep") as mock_sleep, | ||
| mock.patch("time.time", return_value=TEST_TIMESTAMP), |
There was a problem hiding this comment.
Suggestion (pre-existing issue): time.sleep is mocked here but time.monotonic() is not. The production code uses time.monotonic() for the halt timeout loop (while time.monotonic() - start_time < timeout). Since sleep is a no-op but monotonic is real, this test spins for ~60 real wall-clock seconds waiting for the loop to exit.
Consider also mocking time.monotonic to make this test fast:
| mock.patch("time.time", return_value=TEST_TIMESTAMP), | |
| mock.patch("time.time", return_value=TEST_TIMESTAMP), | |
| mock.patch("time.monotonic", side_effect=[0, 61]), |
This makes the loop iterate once (0 -> start_time=0) then exit immediately on the second call (61 > 60 timeout). It would cut this test from ~60s to near-instant.
vvolam
left a comment
There was a problem hiding this comment.
@arista-hpandya changes LGTM, one small suggestion which you could fix along with the changes.
Fixes: sonic-net/sonic-mgmt#23745
In the reboot_test.py, the assert calls check for the exact timestamp to match. However sometimes there is a delay between when the call is made and when the assert is checked. If this delay is greater than 1 second, the timestamps diverge and the assert fails. This is a flakiness introduced by the test framework and hence we should mock it and make it more robust.