Skip to content

Conversation

@naoyam
Copy link
Collaborator

@naoyam naoyam commented Jan 31, 2026

Gather allows non-gathered indices to have smaller output dimensions, which complicates indexing and is not yet supported by TensorIndexer and is supported only by the legacy indexer. Note that takeAlongAxis, which is a limited case of gather, is supported.

The motivation is to remove the legacy indexer. This is the only remaining fallback case.

One way to support it is to decompose it into a takeAlongAxis and slice. For now, this PR disables codegen of gather and delegates to ExprEval.

Note that the cross-entropy benchmark does use gather rather than takeAlongAxis. There's a pending change needed in Thunder. See #3924 (comment). While this is a perf regression, at this point I think it'd more important to remove the large technical debt.

In a follow-up PR, I'll remove the legacy indexer. This PR just inserts an assertion that no fallback is necessary, which should be true by the scheduler changes.

@naoyam
Copy link
Collaborator Author

naoyam commented Jan 31, 2026

!test

@github-actions
Copy link

github-actions bot commented Jan 31, 2026

Review updated until commit 617f257

Description

  • Dropped codegen support for non-exact gather operations due to indexing complexity

  • Added validation checks to reject unsupported gather operations in scheduler

  • Updated test expectations to use ExprEval scheduler instead of PointWise for gather patterns

  • Replaced gather calls with takeAlongAxis in test files and examples

Changes walkthrough

Relevant files
Bug fix
indexing.cpp
Add validation check for unsupported operations                   

csrc/id_model/indexing.cpp

  • Added NVF_ERROR check for unsupported fusion operations in
    TensorIndexer constructor
  • This ensures unsupported operations like gather are caught early
  • +2/-0     
    registry.cpp
    Reject non-exact gather operations in scheduler                   

    csrc/scheduler/registry.cpp

  • Added check to reject scheduling of fusions with non-exact gather
    operations
  • Prevents compilation of unsupported gather patterns
  • +10/-0   
    Enhancement
    expr_eval_sched.cpp
    Exclude GatherOp from ExprEvalScheduler                                   

    csrc/scheduler/expr_eval_sched.cpp

  • Added GatherOp to the list of unsupported operations for
    ExprEvalScheduler
  • Gather operations now fall back to ExprEval instead of being compiled
  • +1/-0     
    Tests
    test_gather.cpp
    Update gather tests for dropped codegen support                   

    tests/cpp/test_gather.cpp

  • Updated test expectations from PointWise to ExprEval scheduler for
    gather patterns
  • Disabled two gather tests with DISABLED_ prefix
  • Added explanatory comments about dropped codegen support
  • +5/-3     
    test_persistent_buffer.cpp
    Replace gather with takeAlongAxis in buffer test                 

    tests/cpp/test_persistent_buffer.cpp

  • Replaced gather operation with takeAlongAxis in buffer test
  • Added comment explaining the change due to codegen limitations
  • +3/-1     
    test_reduction.cpp
    Replace gather with takeAlongAxis in reduction test           

    tests/cpp/test_reduction.cpp

  • Changed gather call to takeAlongAxis in cross-entropy loss test
  • Maintains functionality while avoiding unsupported gather codegen
  • +1/-1     

    PR Reviewer Guide

    Here are some key observations to aid the review process:

    🧪 PR contains tests
    ⚡ Recommended focus areas for review
    Error Handling

    The addition of NVF_ERROR(isSupported(id_model.fusion())) in TensorIndexer constructor could cause runtime failures for fusions containing gather operations. This should be validated to ensure proper error messages and graceful handling.

    NVF_ERROR(isSupported(id_model.fusion()));
    Scheduler Rejection Logic

    The new check for non-exact gather operations rejects scheduling entirely. This should be validated to ensure it correctly identifies and handles all cases of non-exact gather operations while allowing exact gather operations to proceed.

    if (std::ranges::any_of(
            ir_utils::getOpsOfType<GatherOp>(fusion),
            [](GatherOp* gather) { return !gather->exactSizes(); })) {
      scheduler_debug_utils::canScheduleRejectReason(
          scheduler_type, "Non-exact gather ops");
      return false;
    }

    Test failures

    • (Medium, 3) Shape mismatch in Thunder higher-order inplace alias update tests (test_update_aliases.py)

      Test Name A100 GB200 H100 Source
      thunder.tests.test_update_aliases.test_higher_order_inplace_alias_update_nvfuser_cuda_thunder.dtypes.float32
    • (Medium, 1) Thunder nvFuser CUDA autograd mismatch in NanoGPT test (runner: dlcluster_h100)

      Test Name H100 Source
      thunder.tests.test_networks.test_nanogpt_complete_autograd_nvfuser_cuda_thunder.dtypes.float32

    @naoyam
    Copy link
    Collaborator Author

    naoyam commented Feb 3, 2026

    !test

    @naoyam naoyam requested a review from jjsjann123 February 3, 2026 18:19
    @naoyam naoyam marked this pull request as ready for review February 3, 2026 18:20
    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Feb 3, 2026

    Greptile Overview

    Greptile Summary

    This PR drops codegen support for non-exact gather operations (where exactSizes() returns false) while maintaining support for takeAlongAxis (exact gather). Non-exact gather operations now fall back to ExprEval scheduler for eager evaluation instead of code generation.

    Key changes:

    • Added validation in TensorIndexer constructor to reject fusions with non-exact gather ops
    • Added rejection check in scheduler registry to prevent scheduling non-exact gather operations
    • Registered GatherOp with ExprEval scheduler to enable fallback execution
    • Updated tests to use takeAlongAxis instead of gather where possible
    • Disabled two tests that specifically test non-exact gather functionality

    The changes are consistent with the stated goal of removing the legacy indexer, which was the only component supporting non-exact gather indexing.

    Confidence Score: 5/5

    • This PR is safe to merge with minimal risk
    • The changes are well-structured and comprehensive, covering all necessary layers (indexer, scheduler, and tests). The fallback to ExprEval ensures no functionality is lost, and the migration from gather to takeAlongAxis in tests demonstrates the recommended pattern for users.
    • No files require special attention

    Important Files Changed

    Filename Overview
    csrc/id_model/indexing.cpp Added validation check to ensure TensorIndexer only processes fusions without non-exact gather operations
    csrc/scheduler/expr_eval_sched.cpp Added GatherOp to the list of operations that can be scheduled with ExprEval scheduler
    csrc/scheduler/registry.cpp Added rejection check to prevent scheduling non-exact gather operations (those with exactSizes() == false)

    Sequence Diagram

    sequenceDiagram
        participant User as Fusion API
        participant Gather as GatherOp
        participant Registry as Scheduler Registry
        participant ExprEval as ExprEval Scheduler
        participant TensorIndexer as TensorIndexer
        
        User->>Gather: Create GatherOp with exactSizes flag
        
        alt exactSizes == true (takeAlongAxis)
            Gather->>Registry: Request scheduling
            Registry->>Registry: Check exactSizes()
            Registry->>TensorIndexer: Validate fusion support
            TensorIndexer->>TensorIndexer: isSupported() passes
            Registry-->>User: Schedule with standard schedulers
        else exactSizes == false (gather)
            Gather->>Registry: Request scheduling
            Registry->>Registry: Check exactSizes()
            Registry->>Registry: Reject non-exact gather
            Registry->>ExprEval: Delegate to ExprEval scheduler
            ExprEval->>ExprEval: Execute via eager evaluation
            ExprEval-->>User: Return results (no codegen)
        end
    
    Loading

    Copy link
    Contributor

    @greptile-apps greptile-apps bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    3 files reviewed, no comments

    Edit Code Review Agent Settings | Greptile

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    1 participant