conversions: Refactor tests to reduce dataset size#2711
Open
rjodinchr wants to merge 2 commits into
Open
Conversation
The primary goal of this commit is to improve CI tracking by introducing a new golden format that can differentiate test results based on command-line arguments. To cleanly extract and pass these arguments into the JSON result outputs, the command-line parsing infrastructure across the CTS required a significant refactoring. Key changes include: * Enhanced CI Tracking: Updates `ci/compare_results.py`, `ci/pocl/golden.json`, and `saveResultsToJson` to include and evaluate an `args` key. The golden JSON now uses a nested format mapping specific argument strings (e.g., `--wimpy -1`) to their expected results, allowing the CI to validate the same binary run under different parameters. * Centralized Parsing Infrastructure: Introduces the `ParseArgsFn` callback and `runTestHarnessWithCheckAndParse`. This offloads custom argument parsing from individual test `main()` functions and safely extracts the arguments used so they can be logged by the test harness. * Help Text Consolidation: Replaces fragmented `printUsage()` functions with unified `help` string references populated directly by the standard parsing callbacks. [run-test: test_computeinfo] [run-test: test_bruteforce -1 -w] [run-test: test_cl_copy_images small_images --num-worker-threads 2 1D] [run-test: test_image_streams 1D --num-worker-threads 2 CL_R CL_FILTER_NEAREST]
Collaborator
Author
|
Depends on #2706 |
This commit reduces the overall dataset size for conversion tests while maintaining high confidence in edge-case coverage. By replacing hardcoded arrays with a dynamically generated set of problematic corner cases, the test execution time and memory footprint are optimized without sacrificing strict conformance validation. It also introduces an exhaustive testing mode (`-a` flag) to test 2^32 inputs when full coverage is explicitly desired, and corrects the workload reduction math for Wimpy and Embedded modes. Thread safety is guaranteed via a global `std::recursive_mutex`, and exact bit-level deduplication using `std::unordered_set` prevents redundant floating-point testing (e.g., collapsing NaNs or signed zeros). Special Value Selection Strategies: * Integer Types (Base and Conversions) Focuses on a dense cluster of known boundary triggers rather than sparse randoms. The generator populates a baseline pool (0 to 255), calculates powers of 2, 3, 5, 7, and 10 up to the sign bit, and creates bitwise combinations of repeating patterns and shifting masks. Crucially, to catch off-by-one and alignment errors, every base value generates its immediate neighbors (offsets from -3 to +3), along with their bitwise NOT and sign-bit XORs. * Floating-Point Types Focuses strictly on precision loss and rounding boundaries. It seeds a core set of known difficult values (-/+ INF, NaN, subnormals, and type limits). To test exact rounding thresholds, the generator calculates the +/- 1 ULP neighbors for every seeded value via integer bitcasting. Finally, to guarantee coverage of saturation and overflow thresholds, cross-type boundary values corresponding to the specific destination type's domain limits (e.g., injecting exact char/short limits into float inputs) are dynamically added to the test set.
Collaborator
|
I would be interested to know if you have any metrics that you can share, e.g. execution times and peak memory usage before and after your change. |
Collaborator
Author
|
I don't have memory usage numbers, but for the execution time it is about 40 times faster. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit reduces the overall dataset size for conversion tests
while maintaining high confidence in edge-case coverage. By replacing
hardcoded arrays with a dynamically generated set of problematic
corner cases, the test execution time and memory footprint are
optimized without sacrificing strict conformance validation.
It also introduces an exhaustive testing mode (
-aflag) to test 2^32inputs when full coverage is explicitly desired, and corrects the
workload reduction math for Wimpy and Embedded modes. Thread safety is
guaranteed via a global
std::recursive_mutex, and exact bit-leveldeduplication using
std::unordered_setprevents redundantfloating-point testing (e.g., collapsing NaNs or signed zeros).
Special Value Selection Strategies:
Integer Types (Base and Conversions)
Focuses on a dense cluster of known boundary triggers rather than
sparse randoms. The generator populates a baseline pool (0 to 255),
calculates powers of 2, 3, 5, 7, and 10 up to the sign bit, and
creates bitwise combinations of repeating patterns and shifting
masks. Crucially, to catch off-by-one and alignment errors, every
base value generates its immediate neighbors (offsets from -3 to
+3), along with their bitwise NOT and sign-bit XORs.
Floating-Point Types
Focuses strictly on precision loss and rounding boundaries. It seeds
a core set of known difficult values (-/+ INF, NaN, subnormals, and
type limits). To test exact rounding thresholds, the generator
calculates the +/- 1 ULP neighbors for every seeded value via
integer bitcasting. Finally, to guarantee coverage of saturation and
overflow thresholds, cross-type boundary values corresponding to the
specific destination type's domain limits (e.g., injecting exact
char/short limits into float inputs) are dynamically added to the
test set.