[Build] Fix flaky test_clock_accuracy by using LCG workload with in-loop timing#436

Open

hughperkins wants to merge 1 commit intomainfrom

hp/fix-clock-accuracy-flaky

Collaborator

hughperkins commented Mar 30, 2026

The test was flaky (~10% failure rate) due to two issues:

block_dim=1 caused each thread to run on a different SM with slightly different clock rates, breaking the proportionality check at high i
Cold instruction cache on first kernel execution inflated a[0]

Replace the sin-based workload with an LCG, use parallel execution within a single warp, capture timing inside the loop via a data-dependent store to prevent compiler hoisting, and warm up the instruction cache by delaying the start timestamp to j=10.

Issue: #

Brief Summary

copilot:summary

Walkthrough

copilot:walkthrough


          Fix flaky test_clock_accuracy by using LCG workload with in-loop timing

b79393e

The test was flaky (~10% failure rate) due to two issues:
- block_dim=1 caused each thread to run on a different SM with slightly
  different clock rates, breaking the proportionality check at high i
- Cold instruction cache on first kernel execution inflated a[0]

Replace the sin-based workload with an LCG, use parallel execution within
a single warp, capture timing inside the loop via a data-dependent store
to prevent compiler hoisting, and warm up the instruction cache by
delaying the start timestamp to j=10.

Collaborator Author

hughperkins commented Mar 30, 2026

Technically, Opus physically wrote the code, but I had to painfully spoon feed it pretty much every line in practice. (it wanted to serialize the loop, and I think it's more elegant to run it on a warp, considering it's a 32-size loop in the first place). Anyway... I have read every line added in this PR, and reviewed the lines. I take responsibilty for the lines added and removed in this PR, and won't blame any issues on Opus.

hughperkins marked this pull request as ready for review

March 30, 2026 18:53

hughperkins commented

View reviewed changes

tests/python/test_intrinsics.py

+                              x = (1664527 * x + 1013904223) % 2147483647
+                              if j == 10:
+                                  start = qd.clock_counter()
+                              if x > 10:

Collaborator Author

hughperkins Mar 31, 2026

needs a comment explaining why we are doing this x comparison

hughperkins commented

View reviewed changes

tests/python/test_intrinsics.py

+                              if j == 10:
+                                  start = qd.clock_counter()
+                              if x > 10:
+                                  a[i] = qd.clock_counter() - start

Collaborator Author

hughperkins Mar 31, 2026

needs a comment explaining why we write inside the loop, not after it.

hughperkins commented

View reviewed changes

tests/python/test_intrinsics.py

+                          start = qd.i64(0)
+                          for j in range((i + 1) * 50000):
+                              x = (1664527 * x + 1013904223) % 2147483647
+                              if j == 10:

Collaborator Author

hughperkins Mar 31, 2026

nees a comment explaining why we dont sart the loop till j=1, and why we do it inside the loop

hughperkins commented

View reviewed changes

tests/python/test_intrinsics.py

+                          x = state[i]
+                          start = qd.i64(0)
+                          for j in range((i + 1) * 50000):
+                              x = (1664527 * x + 1013904223) % 2147483647

Collaborator Author

hughperkins Mar 31, 2026

needs a comment saying that this is an LCG, and why we use it

hughperkins commented

View reviewed changes

tests/python/test_intrinsics.py

+                                  start = qd.clock_counter()
+                              if x > 10:
+                                  a[i] = qd.clock_counter() - start
+                          state[i] = x

Collaborator Author

hughperkins Mar 31, 2026

needs a commment saying why we write the result back to state

hughperkins commented

View reviewed changes

tests/python/test_intrinsics.py

-                              a[i] = qd.clock_counter() - start
-                  foo()
+                          x = state[i]

Collaborator Author

hughperkins Mar 31, 2026

needs a comment saying why we read x from state initially

hughperkins commented

View reviewed changes

tests/python/test_intrinsics.py

-                  def foo():
-                      qd.loop_config(block_dim=1)
+                  def measure_sequence_timings():
                       for i in range(32):

Collaborator Author

hughperkins Mar 31, 2026

needs a comment saying the significant of shape 32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet