-
Notifications
You must be signed in to change notification settings - Fork 77
add 2d inputs and copy transpose to transpose benchmark #5915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Review updated until commit e1dcbe7 Description
|
| Relevant files | |||
|---|---|---|---|
| Enhancement |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
Function Signature Change
|
Test failures
-
(Medium, 3)
Shape mismatch in thunderfx higher_order_inplace_alias_update (thunder/tests/test_update_aliases)Test Name A100 GB200 H100 Source thunder.tests.test_update_aliases.test_higher_order_inplace_alias_update_nvfuser_cuda_thunder.dtypes.float32 ❌ ❌ ❌
Greptile OverviewGreptile SummaryThis PR extends the transpose benchmark to cover both copy and view transpose operations, and adds support for 2D input shapes. Key Changes:
Impact:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Test as Test Runner
participant Gen as _generate_transpose_params()
participant Fusion as transpose_fusion()
participant Forward as transpose_fwd_fn()
participant NVF as nvFuser Runtime
Note over Test,NVF: Test Execution Flow
Test->>Gen: Generate test parameters
Gen->>Gen: For dims in (2, 3)
Gen->>Gen: Generate sizes via generate_input_sizes(dims)
Gen->>Gen: Create axes_list based on dims
Gen-->>Test: Return (size, axes, dims) tuples
Test->>Test: Parametrize with dtype, is_copy_transpose
Test->>Fusion: Define fusion (dtype, is_copy_transpose, axes, rank)
Fusion->>Fusion: Define 2 input tensors with rank dimensions
Fusion->>Fusion: Add tensors, permute, apply ReLU
alt is_copy_transpose == True
Fusion->>Fusion: Apply segment_set(T9) → T10
Fusion->>NVF: add_output(T10)
Note over Fusion,NVF: Forces copy transpose<br/>(transpose scheduler)
else is_copy_transpose == False
Fusion->>NVF: add_output(T9)
Note over Fusion,NVF: View transpose<br/>(pointwise scheduler)
end
Test->>Forward: Validate with eager_output
Forward->>Forward: torch.transpose(inputs[0] + inputs[1])
Forward->>Forward: Apply ReLU
alt is_copy_transpose == True
Forward->>Forward: Apply .contiguous()
Note over Forward: Materializes contiguous copy
end
Forward-->>Test: Return result
Test->>NVF: Execute benchmark
NVF-->>Test: Performance metrics
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, no comments
naoyam
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
!test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, no comments
| @pytest.mark.parametrize("size,axes,dims", _generate_transpose_params()) | ||
| @pytest.mark.parametrize("dtype", FLOAT_DTYPES) | ||
| @pytest.mark.parametrize("axes", [(0, 1), (0, 2), (1, 2)]) | ||
| @pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to benchmark view transpose? Should we remove it instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, it's not an expensive benchmark, so I just leave it as is in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Please work with @xwang233 for dashboard integration.
|
|
||
| @pytest.mark.parametrize("executor", DEFAULT_EXECUTORS) | ||
| @pytest.mark.parametrize("size", generate_input_sizes(dims=3)) | ||
| @pytest.mark.parametrize("size,axes,dims", _generate_transpose_params()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, I used 3D inputs to match C++ benchmark. If 2D inputs are sufficient for benchmarking, we should remove the 3D benchmarking. This should also simplify the dashboard for this benchmark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should keep 3D for different axes
Co-authored-by: Priya Mishra <52657555+Priya2698@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 2 comments
benchmarks/python/test_transpose.py
Outdated
| # add segmenter set to avoid presegment passes setting the output as a view of the input without any data movement. It leads to pointwise instead of transpose scheduler. | ||
| #we can also expose OptimizationPassGuard to python frontend and disable presegmentation passes to enforce output to be contiguous and then transpose scheduler will be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Break these long comments into multiple lines for better readability
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, no comments
Priya2698
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. If you find that the view transpose variant is not meaningful anymore, please remove it in a future follow-up.
Two extensions to transpose benchmark in
benchmarks/python/test_transpose.py(1) Adds coverage for copy vs. view transpose
Previously, we only exercised view transpose, which returns a non-contiguous tensor and the pointwise scheduler is used. As a result, the transpose scheduler was never actually used.
This PR adds
.contiguous()to enforce a contiguous output layout, which triggers a copy-based transpose.For manually defined fusions, a
segment_setwas added to the fusion to avoid the pre-segmentation pass (AllocationDomainPass) changing transpose output layout to ensure the copy transpose path is taken.For view transpose, the output has a allocation domain of
(iS11{i0}, iS10{i1})which is same as inputFinal fusion is:
For copy transpose, the output is T6, it has
a transposed allocation domain : (iS12{i1}, iS13{i0}):Final fusion is:
(2) Generalizes fusion input ranks to 2D
Previously, fusion inputs were limited to 3D shapes, with roughly 100 test cases per data type. This PR expands coverage to include 2D input shapes as well.