Skip to content

Conversation

@rdspring1
Copy link
Collaborator

This PR removes the nvfuser python module, corresponding pybind11 CPP bindings, and any references from csrc. Version is bumped to 0.2.36.

@rdspring1 rdspring1 added the Direct Bindings Python extension with direct mapping to NvFuser CPP objects. label Feb 3, 2026
@rdspring1
Copy link
Collaborator Author

!test

@github-actions
Copy link

github-actions bot commented Feb 3, 2026

Review updated until commit 11be6d7

Description

  • Removes legacy nvfuser python module and pybind11 bindings from build system

  • Eliminates all source files and configuration for legacy python bindings

  • Adds NVF_API to function declarations in type.h and multidevice/utils.h

  • Updates version to 0.2.36 and modifies documentation to reflect removed serialization

Changes walkthrough

Relevant files
Enhancement
5 files
CMakeLists.txt
Remove legacy nvfuser python library build configuration 
+3/-141 
utils.py
Remove nvfuser._C extension handling, keep only direct version
+1/-5     
version.txt
Bump version from 0.2.35 to 0.2.36                                             
+1/-1     
type.h
Add NVF_API to UnaryOpType and TernaryOpType operator<< declarations
+2/-2     
utils.h
Add NVF_API to getShardedLogicalAxis function declaration
+2/-1     
Documentation
1 files
Serde.md
Add note about serialization being disabled due to removed legacy
bindings
+3/-1     
Miscellaneous
1 files
fusion_segmenter.cpp
Update test reference URL to new location in direct tests
+1/-1     
Additional files
34 files
options.cpp +0/-2     
options.h +0/-2     
fusion_kernel_runtime.cpp +0/-12   
fusion_record.cpp +0/-952 
fusion_record.h +0/-124 
README.md +0/-210 
__init__.py +0/-649 
__init__.pyi +0/-4     
benchmark_utils.py +0/-160 
__init__.py +0/-9     
__init__.py +0/-13   
normalization.py +0/-725 
nvfuser_version.py +0/-69   
pytorch_utils.py +0/-190 
fusion_cache.cpp +0/-953 
fusion_cache.h +0/-320 
fusion_definition.cpp +0/-769 
fusion_definition.h +0/-389 
fusion_record.h +0/-3675
fusion_state.cpp +0/-297 
fusion_state.h +0/-143 
multidevice_bindings.cpp +0/-103 
python_bindings.cpp +0/-4196
python_bindings.h +0/-27   
python_bindings_extension.cpp +0/-18   
schedule_bindings.cpp +0/-517 
segmentation.cpp +0/-369 
segmentation.h +0/-246 
translation.cpp +0/-1484
translation.h +0/-20   
translation_utils.cpp +0/-80   
translation_utils.h +0/-300 
test_import.py +0/-17   
env_options.yaml +0/-12   

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review
Build System Changes

The build_extension method now only handles nvfuser_direct._C_DIRECT extension, removing support for the legacy nvfuser._C extension. This is a significant change that affects the build system and users who might still depend on the old extension name.

if ext.name == "nvfuser_direct._C_DIRECT":
    self.copy_library(ext, "libnvfuser_direct")
    self.copy_shared_library("libnvfuser_codegen.so")
else:
CMake Build Configuration

The entire nvfuser python library build section has been removed from CMakeLists.txt. This includes the python API version generation, library compilation, and installation rules. Need to verify this doesn't break existing build workflows or documentation.

  # ------------------------------------------------
  # build nvfuser direct python library
  # ------------------------------------------------
  # nvfuser direct bindings API sources
  set(NVFUSER_PYTHON_DIRECT_SRCS)
  list(APPEND NVFUSER_PYTHON_DIRECT_SRCS
    ${NVFUSER_PYTHON_DIRECT_BINDINGS}/extension.cpp
    ${NVFUSER_PYTHON_DIRECT_BINDINGS}/bindings.cpp
    ${NVFUSER_PYTHON_DIRECT_BINDINGS}/enum.cpp
    ${NVFUSER_PYTHON_DIRECT_BINDINGS}/heuristic_params.cpp
    ${NVFUSER_PYTHON_DIRECT_BINDINGS}/ir.cpp
    ${NVFUSER_PYTHON_DIRECT_BINDINGS}/internal_ir.cpp
    ${NVFUSER_PYTHON_DIRECT_BINDINGS}/lru_cache.cpp
    ${NVFUSER_PYTHON_DIRECT_BINDINGS}/multidevice.cpp
    ${NVFUSER_PYTHON_DIRECT_BINDINGS}/ops.cpp
    ${NVFUSER_PYTHON_DIRECT_BINDINGS}/cutlass.cpp
    ${NVFUSER_PYTHON_DIRECT_BINDINGS}/runtime.cpp
    ${NVFUSER_PYTHON_DIRECT_BINDINGS}/schedule.cpp
    ${NVFUSER_PYTHON_DIRECT_BINDINGS}/id_model.cpp
    ${NVFUSER_PYTHON_DIRECT_BINDINGS}/profile.cpp
    ${NVFUSER_PYTHON_DIRECT_BINDINGS}/direct_utils.cpp
    ${NVFUSER_PYTHON_DIRECT_BINDINGS}/python_translate.cpp
    ${NVFUSER_PYTHON_COMMON}/distributed_tensor.cpp
    ${NVFUSER_PYTHON_COMMON}/python_utils.cpp
    ${NVFUSER_PYTHON_COMMON}/translation_names.cpp
  )
  add_library(nvf_py_direct_internal OBJECT ${NVFUSER_PYTHON_DIRECT_SRCS})

  # setup python API version
  add_custom_command(
    OUTPUT ${NVFUSER_PYTHON_DIR}/nvfuser_direct/version.py
    COMMAND
    "${Python_EXECUTABLE}" -c \"from pathlib import Path\; Path('${NVFUSER_PYTHON_DIR}/tools/gen_nvfuser_version.py') .touch() \"
    COMMAND
    "${Python_EXECUTABLE}" ${NVFUSER_PYTHON_DIR}/tools/gen_nvfuser_version.py nvfuser_direct
    DEPENDS ${NVFUSER_PYTHON_DIR}/tools/gen_nvfuser_version.py
    DEPENDS ${NVFUSER_PYTHON_DIR}/version.txt
    WORKING_DIRECTORY ${NVFUSER_PYTHON_DIR}/tools/
  )
  add_custom_target(
    gen_nvfuser_direct_version ALL
    DEPENDS ${NVFUSER_PYTHON_DIR}/nvfuser_direct/version.py
  )
  add_dependencies(nvf_py_direct_internal gen_nvfuser_direct_version)

  # NOTE: For any future extension, change PYTHON_DIRECT_EXTENSION to another
  # name other than EXTENSION_NAME.
  target_compile_definitions(nvf_py_direct_internal PRIVATE
    "-DTORCH_CUDA_BUILD_MAIN_LIB"
    "-DC10_BUILD_MAIN_LIB=1"
    PYTHON_DIRECT_EXTENSION=_C_DIRECT
  )

  add_library(nvfuser_direct MODULE $<TARGET_OBJECTS:nvf_py_direct_internal>)
  target_compile_definitions(nvfuser_direct PRIVATE
    "-DTORCH_CUDA_BUILD_MAIN_LIB"
    "-DC10_BUILD_MAIN_LIB=1"
    PYTHON_DIRECT_EXTENSION=_C_DIRECT
  )

  if(NOT MSVC)
    target_compile_options(nvf_py_direct_internal PRIVATE -Wall -Wno-unused-function)
    target_compile_options(nvf_py_direct_internal PRIVATE -Werror)

    # Add function/data sections for dead code elimination
    target_compile_options(nvf_py_direct_internal PRIVATE
      "-ffunction-sections"
      "-fdata-sections"
    )

    set(NVF_LIB_SUFFIX ".so")
  else()
    set(NVF_LIB_SUFFIX ".pyd")
  endif()

  set_target_properties(nvf_py_direct_internal PROPERTIES
    C_STANDARD ${NVFUSER_C_STANDARD}
    CUDA_STANDARD ${NVFUSER_CUDA_STANDARD}
    CXX_STANDARD ${NVFUSER_CPP_STANDARD}
    CXX_STANDARD_REQUIRED ON
    CXX_VISIBILITY_PRESET hidden
    INSTALL_RPATH
    "$ORIGIN/lib:$ORIGIN/../nvidia/cuda_runtime/lib:$ORIGIN/../nvidia/cuda_nvrtc/lib:$ORIGIN/../../nvidia/cuda_cupti/lib:$ORIGIN/../torch/lib"
    POSITION_INDEPENDENT_CODE Yes
    VISIBILITY_INLINES_HIDDEN Yes
  )
  set_target_properties(nvfuser_direct PROPERTIES
    C_STANDARD ${NVFUSER_C_STANDARD}
    CUDA_STANDARD ${NVFUSER_CUDA_STANDARD}
    CXX_STANDARD ${NVFUSER_CPP_STANDARD}
    CXX_STANDARD_REQUIRED ON
    CXX_VISIBILITY_PRESET hidden
    INSTALL_RPATH
    "$ORIGIN/lib:$ORIGIN/../nvfuser_common/lib:$ORIGIN/../nvidia/cuda_runtime/lib:$ORIGIN/../nvidia/cuda_nvrtc/lib:$ORIGIN/../../nvidia/cuda_cupti/lib:$ORIGIN/../torch/lib"
    POSITION_INDEPENDENT_CODE Yes
    SUFFIX ${NVF_LIB_SUFFIX}
    VISIBILITY_INLINES_HIDDEN Yes
  )

  target_include_directories(nvf_py_direct_internal PUBLIC ${NVFUSER_PYTHON_DIRECT_BINDINGS})
  target_include_directories(nvf_py_direct_internal PUBLIC ${NVFUSER_PYTHON_COMMON})
  target_link_libraries(nvf_py_direct_internal PRIVATE
    nvfuser_codegen
    "${TORCH_INSTALL_PREFIX}/lib/libtorch_python.so"
    pybind11::pybind11 pybind11::headers
    CUDA::cupti
  )
  if (NVFUSER_USE_CUTLASS)
    target_link_libraries(nvf_py_direct_internal PRIVATE nvf_cutlass)
  endif()

  target_link_libraries(nvfuser_direct PRIVATE
    nvf_py_direct_internal
    Python::Module
  )

  # Add dead code elimination flags to reduce file size
  if(NOT MSVC)
    target_link_options(nvfuser_direct PRIVATE
      "-Wl,--gc-sections"
      "-Wl,--as-needed"
      $<$<CONFIG:Release>:-s>
    )
  endif()

  set_target_properties(nvfuser_direct PROPERTIES
    INSTALL_RPATH "$ORIGIN:$ORIGIN/../build:$ORIGIN/../nvfuser_common/lib"
  )
  install(TARGETS nvfuser_direct DESTINATION lib)
endif()

set(JIT_TEST_SRCS)
list(APPEND JIT_TEST_SRCS
  ${NVFUSER_ROOT}/tests/cpp/kernel_db/test_nvfuser_kernel_db_open.cpp
  ${NVFUSER_ROOT}/tests/cpp/kernel_db/test_nvfuser_kernel_db_query.cpp
  ${NVFUSER_ROOT}/tests/cpp/kernel_db/test_nvfuser_kernel_db_write.cpp
  ${NVFUSER_ROOT}/tests/cpp/test_abstract_tensor.cpp
  ${NVFUSER_ROOT}/tests/cpp/test_alias.cpp
  ${NVFUSER_ROOT}/tests/cpp/test_alias_analysis.cpp
  ${NVFUSER_ROOT}/tests/cpp/test_allocation_domain.cpp
  ${NVFUSER_ROOT}/tests/cpp/test_allocation_order_inference.cpp
  ${NVFUSER_ROOT}/tests/cpp/test_bfs.cpp
  ${NVFUSER_ROOT}/tests/cpp/test_ca_root_domain_map.cpp
  ${NVFUSER_ROOT}/tests/cpp/test_circular_buffering.cpp
  ${NVFUSER_ROOT}/tests/cpp/test_circular_buffering_ping_pong.cpp
  ${NVFUSER_ROOT}/tests/cpp/test_combined_inner_outer_reduction.cpp
  ${NVFUSER_ROOT}/tests/cpp/test_compute_at_map.cpp
  ${NVFUSER_ROOT}/tests/cpp/test_compute_with.cpp
  ${NVFUSER_ROOT}/tests/cpp/test_contiguity_id_model.cpp
  ${NVFUSER_ROOT}/tests/cpp/test_driver_api.cpp
  ${NVFUSER_ROOT}/tests/cpp/test_dynamic_transform.cpp
  ${NVFUSER_ROOT}/tests/cpp/test_embedding_node.cpp
  ${NVFUSER_ROOT}/tests/cpp/test_evaluator.cpp
  ${NVFUSER_ROOT}/tests/cpp/test_exceptions.cpp
  ${NVFUSER_ROOT}/tests/cpp/test_expr_simplifier.cpp
  ${NVFUSER_ROOT}/tests/cpp/test_expr_sort.cpp
  ${NVFUSER_ROOT}/tests/cpp/test_fusion_hash.cpp
  ${NVFUSER_ROOT}/tests/cpp/test_gather.cpp
Runtime Dependencies

Removed includes for python_frontend headers and debug code for PythonDefinitionSegments. This changes the runtime dependencies and removes debugging capabilities. Need to ensure this doesn't affect runtime functionality or debugging capabilities for users.

#include <c10/cuda/CUDAGuard.h>

#include "fusion.h"
#include "fusion_profiler.h"
#include "fusion_segmenter.h"
#include "host_ir/lowering.h"
#include "host_ir/passes.h"
#include "instrumentation.h"
#include "ir/base_nodes.h"
#include "preseg_passes/pre_segmenter.h"
#include "runtime/executor.h"
#include "runtime/executor_dispatch.h"
#include "runtime/fusion_cache_utils.h"
#include "scheduler/heuristic.h"
#include "serde/fusion_cache_generated.h"
#include "type.h"

Test failures

  • (Medium, 3) Thunder thunderfx path returns scalar instead of vector in higher-order inplace alias update (thunder.tests.test_update_aliases)

    Test Name A100 GB200 H100 Source
    thunder.tests.test_update_aliases.test_higher_order_inplace_alias_update_nvfuser_cuda_thunder.dtypes.float32
  • (Medium, 3) ModuleNotFoundError for 'nvfuser' in tests.python.direct.test_import

    Test Name A100 GB200 H100 Source
    tests.python.direct.test_import.test_import_conflict_direct_then_nvfuser

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 3, 2026

Greptile Overview

Greptile Summary

This PR removes the legacy nvfuser Python module and its associated pybind11 bindings. The version is bumped to 0.2.36.

Major changes:

  • Removed entire python/python_frontend/ directory containing legacy pybind11 bindings
  • Removed nvfuser Python package (649 lines from __init__.py, utilities, contrib modules)
  • Cleaned up CMakeLists.txt by removing nvfuser target and associated build configuration
  • Removed csrc/serde/fusion_record.cpp (952 lines) and associated headers
  • Removed Python frontend debug options from csrc/options.h
  • Updated python/utils.py to only build nvfuser_direct extension

Issues found:

  • Several files still import from the removed nvfuser module, which will cause import errors
  • tests/python/utils/utils.py imports from nvfuser
  • Documentation files in doc/dev/python_scheduling/ (6 files) still import from nvfuser

These files need to be updated to import from nvfuser_direct instead to maintain functionality.

Confidence Score: 3/5

  • This PR has breaking import errors that need to be fixed before merging
  • The PR successfully removes the legacy bindings infrastructure, but leaves broken imports in test utilities and documentation that will cause runtime failures. The core changes are clean, but the incomplete migration of imports prevents this from being merge-ready.
  • tests/python/utils/utils.py and all files in doc/dev/python_scheduling/ need their imports updated from nvfuser to nvfuser_direct

Important Files Changed

Filename Overview
tests/python/direct/test_import.py Removed test for nvfuser import conflict
python/utils.py Removed nvfuser._C extension from build configuration
CMakeLists.txt Removed nvfuser Python bindings and legacy pybind11 targets
tests/python/utils/utils.py Still imports from nvfuser module which no longer exists
doc/dev/python_scheduling/autotune_utils.py Still imports from nvfuser module which no longer exists

Sequence Diagram

sequenceDiagram
    participant User
    participant Python
    participant nvfuser_direct
    participant libnvfuser_codegen
    
    Note over nvfuser: Legacy module REMOVED
    
    User->>Python: import nvfuser_direct
    Python->>nvfuser_direct: Load _C_DIRECT extension
    nvfuser_direct->>libnvfuser_codegen: Link to shared library
    libnvfuser_codegen-->>nvfuser_direct: Core codegen functionality
    nvfuser_direct-->>Python: Module loaded
    Python-->>User: Success
    
    Note over User,Python: Old workflow (NOW BROKEN):
    User->>Python: import nvfuser
    Python-->>User: ImportError: No module named 'nvfuser'
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 3, 2026

Additional Comments (2)

python/utils.py
stale reference to removed python_frontend directory


tools/env-config/env_options.yaml
stale entries for removed debug options python_definition_segments and python_frontend_debug should be removed

@rdspring1
Copy link
Collaborator Author

!build

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 3, 2026

Additional Comments (1)

python/utils.py
outdated comment references nvfuser._C which was removed in this PR

    # "nvfuser_codegen" into "./nvfuser_direct/lib", and the former will be "nvfuser_direct._C_DIRECT".

@rdspring1 rdspring1 force-pushed the remove_legacy_bindings branch from 7d0a9ba to 904ab90 Compare February 4, 2026 05:13
@rdspring1
Copy link
Collaborator Author

!test

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@rdspring1
Copy link
Collaborator Author

!test

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 7 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 4, 2026

Additional Comments (7)

tests/python/utils/utils.py
The nvfuser module has been removed in this PR. Update to import from nvfuser_direct instead:

from nvfuser_direct import FusionCache, FusionDefinition, DataType, clone, Tensor

doc/dev/python_scheduling/autotune_utils.py
The nvfuser module has been removed. Update to import from nvfuser_direct:

from nvfuser_direct import FusionCache, FusionDefinition

doc/dev/python_scheduling/autotune_pointwise.py
Update to use nvfuser_direct instead of removed nvfuser module:

from nvfuser_direct import FusionDefinition, SchedulerType, DataType

doc/dev/python_scheduling/autotune_inner_reduction.py
Update to use nvfuser_direct:

from nvfuser_direct import FusionDefinition, SchedulerType, DataType, ParallelType

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!


doc/dev/python_scheduling/autotune_persistent.py
Update imports to use nvfuser_direct:

from nvfuser_direct import FusionCache, FusionDefinition, SchedulerType, DataType
from nvfuser_direct.pytorch_utils import torch_dtype_to_nvfuser_dtype

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!


doc/dev/python_scheduling/autotune_matmul.py
Update to use nvfuser_direct:

from nvfuser_direct import FusionDefinition, SchedulerType

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!


doc/dev/python_scheduling/profile_matmul.py
Update imports to use nvfuser_direct instead of removed nvfuser module

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Direct Bindings Python extension with direct mapping to NvFuser CPP objects.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant