Enable graph trainer passes for agnostic accelerators by jemitche1 · Pull Request #2968 · pytorch/torchtitan

jemitche1 · 2026-04-14T19:56:09Z

This PR keeps the existing CUDA path and adds a non-CUDA fallbacks for graph trainer passes to avoid scheduler estimation/alignment issues.

pytorch-bot · 2026-04-14T19:56:17Z

The following ciflow label(s) have been added but CI has not been triggered yet because the workflows are awaiting approval:

ciflow/8gpu

Once a maintainer approves the workflows (scroll to the bottom of the PR page), the corresponding CI jobs will be triggered automatically. Please ping one of the reviewers if you do not have access to approve and run workflows.

xmfan · 2026-04-29T15:33:52Z

            ngpu=8,
            disabled=_JIT_AOT_DISABLED,
-        ),
+        ),_``


xmfan · 2026-04-29T15:51:11Z

+    import torch._inductor.fx_passes.node_runtime_estimation as node_runtime_estimation
+    import torch._inductor.fx_passes.overlap_scheduling as overlap_scheduling
+
+    original_estimate_roofline_runtime_ms = getattr(
+        overlap_scheduling, "estimate_roofline_runtime_ms", None
+    )
+    original_estimate_runtime_analytical = getattr(
+        overlap_scheduling, "estimate_runtime_analytical", None
+    )
+    original_log_compute_estimations = getattr(
+        node_runtime_estimation, "_log_compute_estimations", None
+    )
+    scheduler_cls = getattr(overlap_scheduling, "OverlapScheduler", None)
+    original_align = None
+    if scheduler_cls is not None:
+        original_align = getattr(
+            scheduler_cls,
+            "_align_compute_nodes_runtime_estimations_across_all_distributed_ranks",
+            None,
+        )

+    try:
+        if original_estimate_roofline_runtime_ms is not None:
+            overlap_scheduling.estimate_roofline_runtime_ms = lambda node: 1e-3
+        if original_estimate_runtime_analytical is not None:
+            overlap_scheduling.estimate_runtime_analytical = lambda node: 1e-3
+        if original_log_compute_estimations is not None:
+            node_runtime_estimation._log_compute_estimations = (
+                lambda compute_nodes, benchmarked_estimations, analytical_estimations: None
+            )
+        if scheduler_cls is not None and original_align is not None:
+            scheduler_cls._align_compute_nodes_runtime_estimations_across_all_distributed_ranks = (
+                lambda self: None
+            )


This seems much more like a change that should happen upstream in OverlapScheduler, and patching it from here is brittle. So consider making this change there.

If not, please all least extract all these out into a helper, it's okay to assume the OverlapScheduler will contain those methods since refactors upstream will break this code anyways

Agreed; this belongs in overlap scheduler rather than patched in torchtitan. I removed the duplicated inline patching and created a PR to torch. Once it's merged, I will ping for further review.

graph trainer compile passes for xpu

0712c75

pytorch-bot Bot added the ciflow/8gpu label Apr 14, 2026

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 14, 2026

Merge branch 'main' into jerome_m/fix/graph-xpu-compile-passes

06c9b44

jemitche1 changed the title ~~graph trainer compile passes for xpu~~ Enable graph trainer passes accelerator agnostic Apr 14, 2026

jemitche1 changed the title ~~Enable graph trainer passes accelerator agnostic~~ Enable graph trainer passes for agnostic accelerators Apr 14, 2026

jemitche1 added 5 commits April 19, 2026 11:50

Merge branch 'pytorch:main' into jerome_m/fix/graph-xpu-compile-passes

387cd05

Merge branch 'main' into jerome_m/fix/graph-xpu-compile-passes

e95c130

Merge branch 'pytorch:main' into jerome_m/fix/graph-xpu-compile-passes

ef4c5a1

Merge branch 'pytorch:main' into jerome_m/fix/graph-xpu-compile-passes

e915e61

remove WA for cleanup and skip_xpu_tests

5542f5d

jemitche1 marked this pull request as ready for review April 26, 2026 23:51

jemitche1 requested review from SherlockNoMad, aditvenk, fegin, sanketpurandare, tianyu-l, wconstab, wwwjn, xmfan and yiming0416 as code owners April 26, 2026 23:51

jemitche1 marked this pull request as draft April 27, 2026 01:17

jemitche1 marked this pull request as ready for review April 28, 2026 20:12

Merge branch 'pytorch:main' into jerome_m/fix/graph-xpu-compile-passes

d3551dd

xmfan reviewed Apr 29, 2026

View reviewed changes

jemitche1 added 2 commits April 30, 2026 12:48

Merge branch 'pytorch:main' into jerome_m/fix/graph-xpu-compile-passes

a264971

use upstream analytical overlap scheduling for xPU

21f159d

jemitche1 marked this pull request as draft May 7, 2026 23:12

Merge branch 'main' into jerome_m/fix/graph-xpu-compile-passes

7b018a5

Merge branch 'pytorch:main' into jerome_m/fix/graph-xpu-compile-passes

b72535f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable graph trainer passes for agnostic accelerators#2968

Enable graph trainer passes for agnostic accelerators#2968
jemitche1 wants to merge 12 commits into
pytorch:mainfrom
jemitche1:jerome_m/fix/graph-xpu-compile-passes

jemitche1 commented Apr 14, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Apr 14, 2026 •

edited

Loading

Uh oh!

xmfan Apr 29, 2026

Uh oh!

xmfan Apr 29, 2026

Uh oh!

jemitche1 May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jemitche1 commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xmfan Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

xmfan Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

jemitche1 May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jemitche1 commented Apr 14, 2026 •

edited

Loading

pytorch-bot Bot commented Apr 14, 2026 •

edited

Loading