Skip to content

[Ascend] add vendor:ascend backend#44

Open
Sans1J wants to merge 3 commits intoflagos-ai:mainfrom
Sans1J:main
Open

[Ascend] add vendor:ascend backend#44
Sans1J wants to merge 3 commits intoflagos-ai:mainfrom
Sans1J:main

Conversation

@Sans1J
Copy link
Copy Markdown

@Sans1J Sans1J commented Mar 2, 2026

Description

Add vendor:ascend backend. The latest code has been synchronized and pre-commit has been executed locally.

Fixes # (issue)

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Change A
  • Change B

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@Sans1J
Copy link
Copy Markdown
Author

Sans1J commented Mar 2, 2026

Here is the TE-FL related configuration information for running Qwen3-0.6B. Please check if it meets your expectations.

FlagOS-Related Arguments for unified computation backend and communication backend

te_fl_prefer: vendor #flagos # enable flagos:triton in transformer engine fl
te_fl_per_op: "rmsnorm_fwd=vendor:ascend|flagos" # enable custom ops selection for transformer engine fl
te_fl_allow_vendors: "ascend" # allow vendors for transformer engine fl
te_fl_deny_vendors: "nvidia" # deny vendors for transformer engine fl
enable_flag_gems: False #True # enable flag gems to replace torch ops for distributed training
flag_gems_log_path: xxx/TE-FL-test/FlagScale/log_gems/gems.log # path of flag gems logging
flag_gems_unused: [to, copy] # flag gems unused ops list
distributed_backend: nccl #flagcx # enable flagcx for distributed training

Here are the TE-FL-related information from the logs printed during model training.

[default0]:training ...
[default0]:Setting rerun_state_machine.current_iteration to 0...
[default0]:[before the start of training step] datetime: 2026-02-27 09:26:42
[default0]:ninja: no work to do.
[default0]:[2026-02-27 09:27:15,708 TE-FL manager.py:417 INFO] Op 'multi_tensor_adam' using 'default.flagos' (kind=flagos, vendor=None)
[default0]:[WARNING] Please DO NOT tune args ['num_warps']!
[default0]:[WARNING] Please DO NOT tune args ['num_warps']!

@lxd-cumt
Copy link
Copy Markdown
Collaborator

lxd-cumt commented Mar 2, 2026

Here is the TE-FL related configuration information for running Qwen3-0.6B. Please check if it meets your expectations.

FlagOS-Related Arguments for unified computation backend and communication backend

te_fl_prefer: vendor #flagos # enable flagos:triton in transformer engine fl te_fl_per_op: "rmsnorm_fwd=vendor:ascend|flagos" # enable custom ops selection for transformer engine fl te_fl_allow_vendors: "ascend" # allow vendors for transformer engine fl te_fl_deny_vendors: "nvidia" # deny vendors for transformer engine fl enable_flag_gems: False #True # enable flag gems to replace torch ops for distributed training flag_gems_log_path: xxx/TE-FL-test/FlagScale/log_gems/gems.log # path of flag gems logging flag_gems_unused: [to, copy] # flag gems unused ops list distributed_backend: nccl #flagcx # enable flagcx for distributed training

Here are the TE-FL-related information from the logs printed during model training.

[default0]:training ... [default0]:Setting rerun_state_machine.current_iteration to 0... [default0]:[before the start of training step] datetime: 2026-02-27 09:26:42 [default0]:ninja: no work to do. [default0]:[2026-02-27 09:27:15,708 TE-FL manager.py:417 INFO] Op 'multi_tensor_adam' using 'default.flagos' (kind=flagos, vendor=None) [default0]:[WARNING] Please DO NOT tune args ['num_warps']! [default0]:[WARNING] Please DO NOT tune args ['num_warps']!

  • Could you provide a complete training log so we can verify whether other TE-related ops were executed, such as generic_temm?
  • Why is te_fl_prefer: vendor configured instead of flagos?
  • If te_fl_prefer: vendor, why did multi_tensor_adam execute the FlagOS implementation?
  • Why is enable_flag_gems: false? This will prevent some torch ops from being replaced with FlagGems.

@lxd-cumt
Copy link
Copy Markdown
Collaborator

lxd-cumt commented Mar 3, 2026

Please config: transformer_impl: transformer_engine to use TransformerEngine-FL, and furthermore, adapt multiple FlagOS OPs in TE-FL.
Recommend config as follows:

transformer_impl: transformer_engine
te_fl_prefer: flagos
enable_flag_gems: true
flag_gems_log_path: xxx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants