Skip to content

Conversation

@wujingyue
Copy link
Collaborator

for multi-GPU debugging. Multi-GPU scheduling happens before segmentation and the shardings are encoded as loop transforms.

for multi-GPU debugging. Multi-GPU scheduling happens before
segmentation and the shardings are encoded as loop transforms.
@wujingyue wujingyue requested a review from Priya2698 February 3, 2026 06:27
@wujingyue
Copy link
Collaborator Author

!test

@github-actions
Copy link

github-actions bot commented Feb 3, 2026

Review updated until commit d19ab85

Description

  • Enhanced segmented fusion debugging output for multi-GPU scenarios

  • Changed from printMath() to print() method for more detailed output

  • Improved formatting with std::endl and extra newlines

  • Better support for debugging loop transforms and shardings

Changes walkthrough

Relevant files
Enhancement
fusion_segmenter.cpp
Enhanced segmented fusion debugging output method               

csrc/fusion_segmenter.cpp

  • Modified SegmentedFusion::print() method to use std::endl instead of
    \n
  • Changed completeFusion()->printMath() to completeFusion()->print() for
    detailed output
  • Added extra newline at end for better formatting
  • Enhanced debugging output to include transforms for multi-GPU
    debugging
  • +4/-3     

    PR Reviewer Guide

    Here are some key observations to aid the review process:

    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review
    Debug output change

    The change from printMath() to print() in the SegmentedFusion::print() function is intended to show transforms for multi-GPU debugging. While this appears to be a straightforward debug improvement, the reviewer should verify that print() indeed provides the expected transform information and that the output format is appropriate for debugging multi-GPU scheduling scenarios.

    completeFusion()->print();

    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Feb 3, 2026

    Greptile Overview

    Greptile Summary

    Changed SegmentedFusion::print() to call completeFusion()->print() instead of completeFusion()->printMath(), enabling tensor transform printing when NVFUSER_DUMP=segmented_fusion is set.

    Key changes:

    • Replaced printMath() with print() to include tensor transforms in debug output
    • Changed string concatenation from \n to std::endl for consistent formatting and buffer flushing
    • Aligns with multi-GPU debugging needs where shardings are encoded as loop transforms

    Context:
    Multi-GPU scheduling occurs before segmentation, and the shardings are encoded as loop transforms. The previous printMath() method only printed arithmetic expressions without transform information, limiting debugging visibility. The new approach using print() includes the IrTransformPrinter output, which is essential for understanding multi-GPU scheduling decisions.

    Confidence Score: 5/5

    • This PR is safe to merge with no risk - it only affects debug output formatting
    • The change is minimal and well-scoped, affecting only debug output when NVFUSER_DUMP=segmented_fusion is set. The modification switches from printMath() (arithmetic only) to print() (includes transforms), which directly supports the stated goal of multi-GPU debugging. No functional behavior changes, no test modifications needed, and the change aligns perfectly with existing code patterns in the codebase.
    • No files require special attention

    Important Files Changed

    Filename Overview
    csrc/fusion_segmenter.cpp Replaced printMath() with print() to include tensor transforms in debug output, improving multi-GPU debugging visibility

    Sequence Diagram

    sequenceDiagram
        participant User
        participant Runtime as FusionKernelRuntime
        participant SF as SegmentedFusion
        participant CF as CompleteFusion
        participant Printer as IrTransformPrinter
        
        User->>Runtime: Set NVFUSER_DUMP=segmented_fusion
        Runtime->>Runtime: isDebugDumpEnabled(FusionSegments)?
        Runtime->>SF: print()
        SF->>SF: debug() << header
        SF->>CF: completeFusion()->print()
        CF->>CF: Print inputs & outputs
        CF->>CF: Print kernel expressions
        CF->>Printer: IrTransformPrinter.handle(this)
        Note over Printer: NEW: Prints tensor transforms<br/>for multi-GPU debugging
        Printer-->>CF: Transform details
        CF-->>SF: Complete fusion output
        SF->>SF: debug() << footer
        SF->>SF: debug() << this (segmented info)
    
    Loading

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    1 file reviewed, no comments

    Edit Code Review Agent Settings | Greptile

    Copy link
    Collaborator

    @Priya2698 Priya2698 left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    LGTM otherwise.

    @wujingyue
    Copy link
    Collaborator Author

    !build

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    1 file reviewed, no comments

    Edit Code Review Agent Settings | Greptile

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants