add 2d inputs and copy transpose to transpose benchmark #5915

liqiangxl · 2026-02-03T15:09:22Z

Two extensions to transpose benchmark in benchmarks/python/test_transpose.py

(1) Adds coverage for copy vs. view transpose
Previously, we only exercised view transpose, which returns a non-contiguous tensor and the pointwise scheduler is used. As a result, the transpose scheduler was never actually used.
This PR adds .contiguous() to enforce a contiguous output layout, which triggers a copy-based transpose.
For manually defined fusions, a segment_set was added to the fusion to avoid the pre-segmentation pass (AllocationDomainPass) changing transpose output layout to ensure the copy transpose path is taken.

For view transpose, the output has a allocation domain of (iS11{i0}, iS10{i1}) which is same as input

T5_g_float[iS10{i1}, iS11{i0}]
 logical domain : (iS10{i1}, iS11{i0})
 allocation domain : (iS11{i0}, iS10{i1})
 contiguity: t t
 loop domain : (iS10{i1}, iS11{i0})

Final fusion is:

Segmented_Fusion{ 
groups: 
  pointwise{0, 1, 2, 3}
edges: 

group details:
g{(pointwise)
group id: 0
inputs:
  T0_g_float[iS0{i0}, iS1{i1}] float
  T1_g_float[iS12{i0}, iS13{i1}] float
outputs:
  T5_g_float[iS10{i1}, iS11{i0}] float


T2_l_float[iS4{i0}, iS5{i1}]
   = T0_g_float[iS0{i0}, iS1{i1}]
   + T1_g_float[iS12{i0}, iS13{i1}];
(0)
T3_l_float[iS7{i1}, iS6{i0}]
   = Set.Permute( T2_l_float[iS4{i0}, iS5{i1}], cache_op=Streaming )
(1)
T4_l_bool[iS8{i1}, iS9{i0}]
   = T3_l_float[iS7{i1}, iS6{i0}]
   > double(0);
(2)
T5_g_float[iS10{i1}, iS11{i0}]
   = where(T4_l_bool[iS8{i1}, iS9{i0}]
  , T3_l_float[iS7{i1}, iS6{i0}]
  , double(0));
(3)
}

} //Segmented_Fusion

For copy transpose, the output is T6, it has a transposed allocation domain : (iS12{i1}, iS13{i0}):

T6_g_float[iS12{i1}, iS13{i0}]
   = SegmenterSet( T5_l_float[iS10{i1}, iS11{i0}] )

T5_l_float[iS10{i1}, iS11{i0}]
 logical domain : (iS10{i1}, iS11{i0})
 contiguity: t t
 loop domain : (iS10{i1}, iS11{i0})
T6_g_float[iS12{i1}, iS13{i0}]
 logical domain : (iS12{i1}, iS13{i0})
 allocation domain : (iS12{i1}, iS13{i0})
 contiguity: t t
 loop domain : (iS12{i1}, iS13{i0})

Final fusion is:

Segmented_Fusion{ 
groups: 
  transpose{0, 1, 2, 3, 4}
edges: 

group details:
g{(transpose)
group id: 0
inputs:
  T0_g_float[iS0{i0}, iS1{i1}] float
  T1_g_float[iS14{i0}, iS15{i1}] float
outputs:
  T6_g_float[iS12{i1}, iS13{i0}] float


T2_l_float[iS4{i0}, iS5{i1}]
   = T0_g_float[iS0{i0}, iS1{i1}]
   + T1_g_float[iS14{i0}, iS15{i1}];
(0)
T3_g_float[iS7{i1}, iS6{i0}]
   = Set.Permute( T2_l_float[iS4{i0}, iS5{i1}], cache_op=Streaming )
(1)
T4_g_bool[iS8{i1}, iS9{i0}]
   = T3_g_float[iS7{i1}, iS6{i0}]
   > double(0);
(2)
T5_g_float[iS10{i1}, iS11{i0}]
   = where(T4_g_bool[iS8{i1}, iS9{i0}]
  , T3_g_float[iS7{i1}, iS6{i0}]
  , double(0));
(3)
T6_g_float[iS12{i1}, iS13{i0}]
   = SegmenterSet( T5_g_float[iS10{i1}, iS11{i0}] )
(4)
}

} //Segmented_Fusion

(2) Generalizes fusion input ranks to 2D
Previously, fusion inputs were limited to 3D shapes, with roughly 100 test cases per data type. This PR expands coverage to include 2D input shapes as well.

github-actions · 2026-02-03T15:11:49Z

Review updated until commit e1dcbe7

Description

Extends transpose benchmark to support both 2D and 3D tensor inputs
Adds copy vs view transpose testing using contiguous() and segment_set
Implements segment_set to enforce transpose scheduler over pointwise scheduler
Parameterizes tests to cover all axes combinations for both 2D and 3D cases

Changes walkthrough

Relevant files

Enhancement

test_transpose.py `Extend transpose benchmark with 2D inputs and copy transpose` benchmarks/python/test_transpose.py Modified transpose_fusion to accept is_copy_transpose and rank parameters Added segment_set logic to enforce copy transpose path when needed Extended transpose_fwd_fn to handle both copy and view transpose cases Updated test parameterization to include 2D/3D dims and copy/view transpose flags Added _generate_transpose_params helper for comprehensive test coverage	+65/-22

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review
Function Signature Change The transpose_fusion function now requires two new mandatory parameters (is_copy_transpose and rank). While all current calls have been updated, this is a breaking API change that could affect external users of this function. def transpose_fusion( fd: FusionDefinition, dtype: DataType, is_copy_transpose: bool, axes: list, rank: int, ): Parameter Validation The new is_copy_transpose parameter is used to control both the segment_set logic and the contiguous() call in transpose_fwd_fn. Ensure this parameter is properly validated and handles edge cases correctly. if is_copy_transpose: T10 = fd.ops.segment_set(T9) fd.add_output(T10) else: fd.add_output(T9)

Test failures

(Medium, 3) Shape mismatch in thunderfx higher_order_inplace_alias_update (thunder/tests/test_update_aliases)

Test Name A100 GB200 H100 Source

thunder.tests.test_update_aliases.test_higher_order_inplace_alias_update_nvfuser_cuda_thunder.dtypes.float32 ❌ ❌ ❌

greptile-apps · 2026-02-03T15:29:17Z

Greptile Overview

Greptile Summary

This PR extends the transpose benchmark to cover both copy and view transpose operations, and adds support for 2D input shapes.

Key Changes:

Added is_copy_transpose parameter to distinguish between copy-based transpose (using .contiguous()) and view-based transpose (swapped strides only)
For copy transpose: added segment_set operation in fusion definition to prevent presegmentation passes from optimizing output to a view, ensuring the transpose scheduler is used instead of pointwise scheduler
Generalized input tensor rank from fixed 3D to parameterized 2D/3D via new rank parameter
Introduced _generate_transpose_params() helper to generate test parameters for both 2D (axes: (0,1)) and 3D (axes: (0,1), (0,2), (1,2)) cases
Updated both nvFuser and baseline benchmark tests to support the new parameters

Impact:

Significantly expands test coverage by adding copy transpose path and 2D shapes
Ensures transpose scheduler is properly exercised during benchmarking
The changes maintain backward compatibility with existing 3D test cases

Confidence Score: 4/5

This PR is safe to merge with minimal risk - changes are well-scoped to benchmark expansion
The implementation correctly extends the transpose benchmark with proper parameter handling for both copy and view transpose modes, and 2D/3D input shapes. The logic for conditionally applying segment_set and .contiguous() is sound. Previous review comments about formatting/spelling have been noted but don't affect functionality.
No files require special attention

Important Files Changed

Filename	Overview
benchmarks/python/test_transpose.py	Extended transpose benchmark with 2D input support and copy vs. view transpose coverage; no critical issues found

Sequence Diagram

sequenceDiagram
    participant Test as Test Runner
    participant Gen as _generate_transpose_params()
    participant Fusion as transpose_fusion()
    participant Forward as transpose_fwd_fn()
    participant NVF as nvFuser Runtime
    
    Note over Test,NVF: Test Execution Flow
    
    Test->>Gen: Generate test parameters
    Gen->>Gen: For dims in (2, 3)
    Gen->>Gen: Generate sizes via generate_input_sizes(dims)
    Gen->>Gen: Create axes_list based on dims
    Gen-->>Test: Return (size, axes, dims) tuples
    
    Test->>Test: Parametrize with dtype, is_copy_transpose
    Test->>Fusion: Define fusion (dtype, is_copy_transpose, axes, rank)
    
    Fusion->>Fusion: Define 2 input tensors with rank dimensions
    Fusion->>Fusion: Add tensors, permute, apply ReLU
    
    alt is_copy_transpose == True
        Fusion->>Fusion: Apply segment_set(T9) → T10
        Fusion->>NVF: add_output(T10)
        Note over Fusion,NVF: Forces copy transpose<br/>(transpose scheduler)
    else is_copy_transpose == False
        Fusion->>NVF: add_output(T9)
        Note over Fusion,NVF: View transpose<br/>(pointwise scheduler)
    end
    
    Test->>Forward: Validate with eager_output
    Forward->>Forward: torch.transpose(inputs[0] + inputs[1])
    Forward->>Forward: Apply ReLU
    
    alt is_copy_transpose == True
        Forward->>Forward: Apply .contiguous()
        Note over Forward: Materializes contiguous copy
    end
    
    Forward-->>Test: Return result
    Test->>NVF: Execute benchmark
    NVF-->>Test: Performance metrics

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

benchmarks/python/test_transpose.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

greptile-apps

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

naoyam

LGTM

liqiangxl · 2026-02-03T17:19:57Z

!test

greptile-apps

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Priya2698 · 2026-02-03T18:02:37Z

benchmarks/python/test_transpose.py

+@pytest.mark.parametrize("size,axes,dims", _generate_transpose_params())
 @pytest.mark.parametrize("dtype", FLOAT_DTYPES)
-@pytest.mark.parametrize("axes", [(0, 1), (0, 2), (1, 2)])
+@pytest.mark.parametrize(


Do we need to benchmark view transpose? Should we remove it instead?

I don't know, it's not an expensive benchmark, so I just leave it as is in this PR.

Got it. Please work with @xwang233 for dashboard integration.

Priya2698 · 2026-02-03T18:03:55Z

benchmarks/python/test_transpose.py


 @pytest.mark.parametrize("executor", DEFAULT_EXECUTORS)
-@pytest.mark.parametrize("size", generate_input_sizes(dims=3))
+@pytest.mark.parametrize("size,axes,dims", _generate_transpose_params())


IIRC, I used 3D inputs to match C++ benchmark. If 2D inputs are sufficient for benchmarking, we should remove the 3D benchmarking. This should also simplify the dashboard for this benchmark

should keep 3D for different axes

benchmarks/python/test_transpose.py

Co-authored-by: Priya Mishra <52657555+Priya2698@users.noreply.github.com>

greptile-apps

_{1 file reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

benchmarks/python/test_transpose.py

greptile-apps · 2026-02-03T19:30:16Z

benchmarks/python/test_transpose.py

+    # add segmenter set to avoid presegment passes setting the output as a view of the input without any data movement. It leads to pointwise instead of transpose scheduler.
+    #we can also expose OptimizationPassGuard to python frontend and disable presegmentation passes to enforce output to be contiguous and then transpose scheduler will be used.


Break these long comments into multiple lines for better readability

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

greptile-apps

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Priya2698

LGTM. If you find that the view transpose variant is not meaningful anymore, please remove it in a future follow-up.

liqiangxl added 2 commits February 3, 2026 07:08

add 2d inputs and copy transpose to transpose benchmark

2de4a10

clean

180c9b3

clean

56de94a

liqiangxl marked this pull request as ready for review February 3, 2026 15:26

liqiangxl requested review from Priya2698 and naoyam February 3, 2026 15:26

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

benchmarks/python/test_transpose.py Outdated Show resolved Hide resolved

Apply suggestions from code review

1ceac27

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

naoyam approved these changes Feb 3, 2026

View reviewed changes

Merge branch 'main' into llu/extend_transpose_benchmark

241d692

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

Priya2698 requested changes Feb 3, 2026

View reviewed changes

Priya2698 reviewed Feb 3, 2026

View reviewed changes

benchmarks/python/test_transpose.py Outdated Show resolved Hide resolved

Update benchmarks/python/test_transpose.py

bb136c3

Co-authored-by: Priya Mishra <52657555+Priya2698@users.noreply.github.com>

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

Update benchmarks/python/test_transpose.py

e1dcbe7

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

Priya2698 approved these changes Feb 4, 2026

View reviewed changes

		# add segmenter set to avoid presegment passes setting the output as a view of the input without any data movement. It leads to pointwise instead of transpose scheduler.
		#we can also expose OptimizationPassGuard to python frontend and disable presegmentation passes to enforce output to be contiguous and then transpose scheduler will be used.

add 2d inputs and copy transpose to transpose benchmark #5915

Are you sure you want to change the base?

add 2d inputs and copy transpose to transpose benchmark #5915

Conversation

liqiangxl commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 3, 2026 • edited by xwang233 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough

PR Reviewer Guide

Test failures

Uh oh!

greptile-apps bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

naoyam left a comment

Choose a reason for hiding this comment

Uh oh!

liqiangxl commented Feb 3, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Priya2698 Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liqiangxl Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Priya2698 Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Priya2698 Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

liqiangxl Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Priya2698 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

liqiangxl commented Feb 3, 2026 •

edited

Loading

github-actions bot commented Feb 3, 2026 •

edited by xwang233

Loading

greptile-apps bot commented Feb 3, 2026 •

edited

Loading

Priya2698 Feb 3, 2026 •

edited

Loading