-
Notifications
You must be signed in to change notification settings - Fork 77
Drop codegen support of gather (but not takeAlongAxis) #5907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
!test |
|
Review updated until commit 617f257 Description
|
| Relevant files | |||||||
|---|---|---|---|---|---|---|---|
| Bug fix |
| ||||||
| Enhancement |
| ||||||
| Tests |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
Error Handling
|
Test failures
-
(Medium, 3)
Shape mismatch in Thunder higher-order inplace alias update tests (test_update_aliases.py)Test Name A100 GB200 H100 Source thunder.tests.test_update_aliases.test_higher_order_inplace_alias_update_nvfuser_cuda_thunder.dtypes.float32 ❌ ❌ ❌ -
(Medium, 1)
Thunder nvFuser CUDA autograd mismatch in NanoGPT test (runner: dlcluster_h100)Test Name H100 Source thunder.tests.test_networks.test_nanogpt_complete_autograd_nvfuser_cuda_thunder.dtypes.float32 ❌
|
!test |
Greptile OverviewGreptile SummaryThis PR drops codegen support for non-exact gather operations (where Key changes:
The changes are consistent with the stated goal of removing the legacy indexer, which was the only component supporting non-exact gather indexing. Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User as Fusion API
participant Gather as GatherOp
participant Registry as Scheduler Registry
participant ExprEval as ExprEval Scheduler
participant TensorIndexer as TensorIndexer
User->>Gather: Create GatherOp with exactSizes flag
alt exactSizes == true (takeAlongAxis)
Gather->>Registry: Request scheduling
Registry->>Registry: Check exactSizes()
Registry->>TensorIndexer: Validate fusion support
TensorIndexer->>TensorIndexer: isSupported() passes
Registry-->>User: Schedule with standard schedulers
else exactSizes == false (gather)
Gather->>Registry: Request scheduling
Registry->>Registry: Check exactSizes()
Registry->>Registry: Reject non-exact gather
Registry->>ExprEval: Delegate to ExprEval scheduler
ExprEval->>ExprEval: Execute via eager evaluation
ExprEval-->>User: Return results (no codegen)
end
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, no comments
Gather allows non-gathered indices to have smaller output dimensions, which complicates indexing and is not yet supported by TensorIndexer and is supported only by the legacy indexer. Note that takeAlongAxis, which is a limited case of gather, is supported.
The motivation is to remove the legacy indexer. This is the only remaining fallback case.
One way to support it is to decompose it into a takeAlongAxis and slice. For now, this PR disables codegen of gather and delegates to ExprEval.
Note that the cross-entropy benchmark does use gather rather than takeAlongAxis. There's a pending change needed in Thunder. See #3924 (comment). While this is a perf regression, at this point I think it'd more important to remove the large technical debt.
In a follow-up PR, I'll remove the legacy indexer. This PR just inserts an assertion that no fallback is necessary, which should be true by the scheduler changes.