Move some of the E2E tests for GEG & CEG into nightly runs#2036
Open
umangyadav wants to merge 2 commits intodevelopfrom
Open
Move some of the E2E tests for GEG & CEG into nightly runs#2036umangyadav wants to merge 2 commits intodevelopfrom
umangyadav wants to merge 2 commits intodevelopfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull Request Overview
Moves heavier GEG/CEG E2E tests out of PR CI into nightly and lowers workloads in PR to reduce runtime.
- Repoint F32/BF16 GEG/CEG suites to nightly by renaming directories/suite names from Pr* to non-Pr*.
- Reduce PR test sizes (notably batch size) and trim permutations.
- Add new F16 nightly suites and arch gating cfgs; update CMake to split PR vs nightly coverage.
Reviewed Changes
Copilot reviewed 18 out of 26 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| mlir/test/e2e/PrGemmElementwiseGemmF16SplitK.toml | Reduced PR test set to a smaller config to speed up CI |
| mlir/test/e2e/PrConvElementwiseGemmF16SplitK.toml | Reduced PR configs; switched to batchsize=1 |
| mlir/test/e2e/PrConvElementwiseGemmF16.toml | Simplified PR conv+gemm F16 suite to a small config |
| mlir/test/e2e/GemmElementwiseGemmF32SplitK.toml | Nightly: rename directory and suite name to non-Pr; keep config |
| mlir/test/e2e/GemmElementwiseGemmF32.toml | Nightly: rename directory and suite name to non-Pr |
| mlir/test/e2e/GemmElementwiseGemmF16SplitK.toml | New nightly F16 split-K gemm+gemm suite |
| mlir/test/e2e/GemmElementwiseGemmF16SplitK.cfg | New nightly gating for F16 split-K (requires mfma/wmma and atomic_add_f16) |
| mlir/test/e2e/GemmElementwiseGemmBF16SplitK.toml | Nightly: rename directory and suite name to non-Pr |
| mlir/test/e2e/GemmElementwiseGemmBF16.toml | Nightly: rename directory and suite name to non-Pr |
| mlir/test/e2e/GemmElementwiseGemmBF16.cfg | New nightly gating for BF16 (requires mfma/wmma) |
| mlir/test/e2e/ConvElementwiseGemmF32SplitK.toml | Nightly: rename directory; reduce workloads; suite name still “pr_*” |
| mlir/test/e2e/ConvElementwiseGemmF32.toml | Nightly: rename directory/suite name; reduce workloads |
| mlir/test/e2e/ConvElementwiseGemmF16SplitK.toml | New nightly F16 conv+gemm split-K suite |
| mlir/test/e2e/ConvElementwiseGemmF16SplitK.cfg | New nightly gating for F16 conv split-K (mfma/wmma and atomic_add_f16) |
| mlir/test/e2e/ConvElementwiseGemmF16.toml | New nightly F16 conv+gemm suite |
| mlir/test/e2e/ConvElementwiseGemmBF16SplitK.toml | Nightly: rename directory/suite name; reduce workloads |
| mlir/test/e2e/ConvElementwiseGemmBF16.toml | Nightly: rename directory/suite name; reduce workloads |
| mlir/test/e2e/CMakeLists.txt | Split PR vs nightly test lists accordingly |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
dhernandez0
reviewed
Oct 16, 2025
dhernandez0
approved these changes
Oct 16, 2025
justinrosner
approved these changes
Oct 16, 2025
Reduces PR CI runtime by keeping only FP16 CEG and GEG tests in the PR test suite. BF16, F32, and most F16 SplitK variants are moved to nightly. One small F16 SplitK config is kept in PR CI for SplitK coverage. Co-authored-by: Cursor <cursoragent@cursor.com>
Switch input random data from symmetric [-1, 1] to one-sided [0, 1] for the BF16 and F16 conv-elementwise-gemm split-K nightly variants. The symmetric range produced K-reduction outputs near zero from cancellation, which then triggered huge relDiff (>1000) on a small fraction of elements and tripped the verifier despite tiny absDiff and RMS. One-sided random eliminates the cancellation, so an explicit -absDiff_threshold gate is no longer needed. Also add -RMS_threshold 0.01 to match the gemm-elementwise-gemm SplitK pattern. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Jenkins PR CI tests are taking a lot longer right now mainly due to CEG & GEG tests. These tests happens on three different data types F32, FP16 and BF16 and also with Split-K Enabled.
Currently some of these tests are also using batch size of 64. For the testing purposes smaller batch size is sufficient.
This PR aims to lower runtime to speed up testing locally and also on Jenkins.
Technical Details
CEG/GEG tests are for three different dtypes. IMO it is sufficient to just test F16 in PR CI and keep F32 and BF16 in nightly.
Test Plan
Run both PR CI and Nightly and measure runtime for E2E tests and compare them with current runtimes. Make sure nightly doesn't timeout with these additional tests.
Test Result