[Ascend] add vendor:ascend backend by Sans1J · Pull Request #44 · flagos-ai/TransformerEngine-FL

Sans1J · 2026-03-02T08:22:12Z

Description

Add vendor:ascend backend. The latest code has been synchronized and pre-commit has been executed locally.

Fixes # (issue)

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Sans1J · 2026-03-02T08:27:55Z

Here is the TE-FL related configuration information for running Qwen3-0.6B. Please check if it meets your expectations.

FlagOS-Related Arguments for unified computation backend and communication backend

te_fl_prefer: vendor #flagos # enable flagos:triton in transformer engine fl
te_fl_per_op: "rmsnorm_fwd=vendor:ascend|flagos" # enable custom ops selection for transformer engine fl
te_fl_allow_vendors: "ascend" # allow vendors for transformer engine fl
te_fl_deny_vendors: "nvidia" # deny vendors for transformer engine fl
enable_flag_gems: False #True # enable flag gems to replace torch ops for distributed training
flag_gems_log_path: xxx/TE-FL-test/FlagScale/log_gems/gems.log # path of flag gems logging
flag_gems_unused: [to, copy] # flag gems unused ops list
distributed_backend: nccl #flagcx # enable flagcx for distributed training

Here are the TE-FL-related information from the logs printed during model training.

[default0]:training ...
[default0]:Setting rerun_state_machine.current_iteration to 0...
[default0]:[before the start of training step] datetime: 2026-02-27 09:26:42
[default0]:ninja: no work to do.
[default0]:[2026-02-27 09:27:15,708 TE-FL manager.py:417 INFO] Op 'multi_tensor_adam' using 'default.flagos' (kind=flagos, vendor=None)
[default0]:[WARNING] Please DO NOT tune args ['num_warps']!
[default0]:[WARNING] Please DO NOT tune args ['num_warps']!

lxd-cumt · 2026-03-02T11:30:02Z

Here is the TE-FL related configuration information for running Qwen3-0.6B. Please check if it meets your expectations.

FlagOS-Related Arguments for unified computation backend and communication backend

te_fl_prefer: vendor #flagos # enable flagos:triton in transformer engine fl te_fl_per_op: "rmsnorm_fwd=vendor:ascend|flagos" # enable custom ops selection for transformer engine fl te_fl_allow_vendors: "ascend" # allow vendors for transformer engine fl te_fl_deny_vendors: "nvidia" # deny vendors for transformer engine fl enable_flag_gems: False #True # enable flag gems to replace torch ops for distributed training flag_gems_log_path: xxx/TE-FL-test/FlagScale/log_gems/gems.log # path of flag gems logging flag_gems_unused: [to, copy] # flag gems unused ops list distributed_backend: nccl #flagcx # enable flagcx for distributed training

Here are the TE-FL-related information from the logs printed during model training.

[default0]:training ... [default0]:Setting rerun_state_machine.current_iteration to 0... [default0]:[before the start of training step] datetime: 2026-02-27 09:26:42 [default0]:ninja: no work to do. [default0]:[2026-02-27 09:27:15,708 TE-FL manager.py:417 INFO] Op 'multi_tensor_adam' using 'default.flagos' (kind=flagos, vendor=None) [default0]:[WARNING] Please DO NOT tune args ['num_warps']! [default0]:[WARNING] Please DO NOT tune args ['num_warps']!

Could you provide a complete training log so we can verify whether other TE-related ops were executed, such as generic_temm?
Why is te_fl_prefer: vendor configured instead of flagos?
If te_fl_prefer: vendor, why did multi_tensor_adam execute the FlagOS implementation?
Why is enable_flag_gems: false? This will prevent some torch ops from being replaced with FlagGems.

lxd-cumt · 2026-03-03T02:18:12Z

Please config: transformer_impl: transformer_engine to use TransformerEngine-FL, and furthermore, adapt multiple FlagOS OPs in TE-FL.
Recommend config as follows:

transformer_impl: transformer_engine
te_fl_prefer: flagos
enable_flag_gems: true
flag_gems_log_path: xxx

add vendor:ascend backend

8960f98

Sans1J and others added 2 commits March 10, 2026 09:48

Merge branch 'flagos-ai:main' into main

55c3277

fix

6413b83

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ascend] add vendor:ascend backend#44

[Ascend] add vendor:ascend backend#44
Sans1J wants to merge 3 commits intoflagos-ai:mainfrom
Sans1J:main

Sans1J commented Mar 2, 2026 •

edited

Loading

Uh oh!

Sans1J commented Mar 2, 2026

Uh oh!

lxd-cumt commented Mar 2, 2026

FlagOS-Related Arguments for unified computation backend and communication backend

Uh oh!

lxd-cumt commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Sans1J commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

Sans1J commented Mar 2, 2026

FlagOS-Related Arguments for unified computation backend and communication backend

Uh oh!

lxd-cumt commented Mar 2, 2026

FlagOS-Related Arguments for unified computation backend and communication backend

Uh oh!

lxd-cumt commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sans1J commented Mar 2, 2026 •

edited

Loading