Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
319 commits
Select commit Hold shift + click to select a range
8efd60b
fixed spelling mistakes
lcskrishna Dec 15, 2020
3fdb8db
update readme and minor changes
lcskrishna Dec 16, 2020
663d5a4
Merge pull request #38 from lcskrishna/cl/rocm-hipify-revamp
Dec 17, 2020
5bae299
skip the unit tests
lcskrishna Dec 31, 2020
41bbf93
missing import statement
lcskrishna Dec 31, 2020
76e4e05
Merge pull request #41 from lcskrishna/cl/skip-tests
lcskrishna Dec 31, 2020
ff232fb
Fix reduce_block_into_lanes for multi_tensor_l2norm for ROCm
Nov 28, 2020
d061bf2
Merge pull request #42 from sarunyap/reduce-block-fix
jeffdaily Jan 18, 2021
dcc7b51
Merge remote-tracking branch 'upstream/master'
jeffdaily Jan 18, 2021
2332c4d
update setup.py to more closely align with upstream
jeffdaily Jan 18, 2021
4ebf2b9
missing #include <c10/cuda/CUDAGuard.h>
jeffdaily Jan 18, 2021
13c8d15
skip failing tests on ROCm
jeffdaily Jan 18, 2021
85b56d0
Merge pull request #43 from ROCmSoftwarePlatform/IFU-2021-01-18
jeffdaily Jan 19, 2021
5baa68d
use __launch_bounds__ for multi_tensor_apply (#44)
jeffdaily Jan 21, 2021
c1e88fa
fix cross-compiled ROCm builds when no GPUs detected (#45)
jeffdaily Jan 21, 2021
3f49dbf
fix bugs in syncbn (#46)
jeffdaily Jan 25, 2021
fbb8cd9
Revert "pass all TensorListMetadata as pointer to pinned host memory …
jeffdaily Feb 25, 2021
dde39c9
Merge pull request #47 from ROCmSoftwarePlatform/revert_workaround
sunway513 Mar 4, 2021
c285a67
Merge remote-tracking branch 'upstream/master' into IFU-2020-03-04
jeffdaily Mar 4, 2021
107f1ff
Merge pull request #48 from ROCmSoftwarePlatform/IFU-2020-03-04
jeffdaily Mar 4, 2021
799785a
Make torch version check numeric
jithunnair-amd Jun 25, 2021
95797c8
Merge pull request #50 from ROCmSoftwarePlatform/numeric_torch_versio…
jeffdaily Jun 25, 2021
955256d
enable --distributed_lamb for rocm
jeffdaily Aug 31, 2021
02ada95
Merge pull request #52 from ROCmSoftwarePlatform/add_distributed_fuse…
jithunnair-amd Aug 31, 2021
888e72a
work around hipify not finding headers
jeffdaily Sep 1, 2021
37d8410
Merge pull request #53 from ROCmSoftwarePlatform/hipify_workaround_in…
jithunnair-amd Sep 1, 2021
e57c84e
Enable group batch norm (--bnp) on ROCm (only bn_group = 1) (#51)
Sep 7, 2021
297ab21
in multi tensor apply, skip empty tensors (#54)
jeffdaily Oct 4, 2021
f79993d
Merge remote-tracking branch 'upstream/master' into IFU-master-2021-1…
Oct 15, 2021
1fd257e
Enable the following modules in apex/contrib:
athitten Oct 19, 2021
8091b3e
Fix the hipification issues for cublasGemmEx by adding rocblas_gemm_ex
Oct 19, 2021
203e323
scaled_upper_triang_masked_softmax_cuda and scaled_masked_softmax_cud…
Oct 19, 2021
93f3a3b
Revert back to the test_fused_optimizer.py in upstream to solve multi…
Oct 19, 2021
d36b3c6
Revert test_fused_layer_norm.py to prevent from missing torch.cuda.is…
Oct 20, 2021
88eee5f
updates to MHA, compilation still broken
jeffdaily Oct 21, 2021
c3ec935
apex definition of macro conflicts with pytorch macro WARP_SHFL_XOR
jeffdaily Oct 21, 2021
964e61f
Enable MLP unit tests on ROCm
Oct 26, 2021
aee9f00
Revert "Enable MLP unit tests on ROCm"
Oct 27, 2021
ba0e5fa
Hipify self_multihead_attn_bias_additive_mask.
Oct 28, 2021
8bdbb50
Hipify encdec_multihead_attn
Oct 29, 2021
325246e
Update README.md
sunway513 Oct 29, 2021
6141618
Hipify self_multihead_attn_bias
Oct 29, 2021
8318142
Hipify self_multihead_attn
Oct 29, 2021
9319318
Fix namespace for pybind11
Oct 29, 2021
4b15f64
Trigger Build
Nov 2, 2021
62f0696
Update setup.py
hubertlu-tw Nov 2, 2021
9f89976
Merge pull request #56 from ROCmSoftwarePlatform/dev/hubertlu/multihe…
hubertlu-tw Nov 2, 2021
5c79a27
THCDeviceUtils.cuh -> ATen/cuda/DeviceUtils.cuh (#1173)
crcrpar Sep 24, 2021
abb6e5b
cleanup missing THCDeviceUtils.cuh header (#1177)
xwang233 Sep 28, 2021
f386852
Enable Distributed FusedLAMB
athitten Nov 18, 2021
1549855
Add unit tests for Apex extensions and distributed Apex
Nov 19, 2021
bcf9d06
Bug fix for self_multihead_attn_norm_add
Nov 19, 2021
405956c
Update run_rocm.sh
hubertlu-tw Nov 22, 2021
51b402d
include iostream (#1144)
xwang233 Aug 20, 2021
3f3da21
Update run_rocm_distributed.sh
Dec 1, 2021
08e88b1
Enable Distributed FusedLAMB (#57)
athitten Dec 1, 2021
2228f1b
Merge remote-tracking branch 'origin/master' into dev/hubertlu/unit_t…
Dec 2, 2021
1436a66
Merge remote-tracking branch 'origin/master' into IFU-master-2021-10-15
Dec 2, 2021
541da7a
Merge pull request #58 from ROCmSoftwarePlatform/dev/hubertlu/unit_tests
hubertlu-tw Dec 2, 2021
1e0f9bc
Enable all supported CUDA extensions using --cuda_ext flag (#59)
jithunnair-amd Dec 2, 2021
39a65c9
Add IS_ROCM_PYTORCH if statement for some newly-added extensions
Dec 3, 2021
79a2d20
Merge remote-tracking branch 'origin/master' into IFU-master-2021-10-15
Dec 3, 2021
2155dab
remove THC headers/functions (#1192)
crcrpar Oct 18, 2021
fec3141
Replace THCudaCheck with C10_CUDA_CHECK
Dec 6, 2021
cc92a4b
Merge pull request #55 from ROCmSoftwarePlatform/IFU-master-2021-10-15
jithunnair-amd Dec 8, 2021
7990651
Merge remote-tracking branch 'upstream/master' into IFU-master-2021-1…
Dec 9, 2021
692e195
Update README.md
hubertlu-tw Dec 9, 2021
d11ddcc
Add fused mixed precision lamb optimizer. (#1237)
kevinstephano Dec 9, 2021
9615983
Remove `THCState` from `apex/contrib/multihead_attn` (#1239)
crcrpar Dec 9, 2021
cf0b0f0
Fix some bugs related to THCState and cutlass
Dec 9, 2021
67ded2e
Remove deprecated THC/THC.h
Dec 13, 2021
d150afd
Skip failing unit tests (#61)
hubertlu-tw Dec 14, 2021
68364b4
Conditionally define autocast_dtypes for different torch versions
Dec 14, 2021
db92ee1
Merge pull request #64 from ROCmSoftwarePlatform/IFU-master-2021-12-08
jithunnair-amd Dec 14, 2021
8f5ae43
Remove debug print statement
athitten Jan 21, 2022
151d150
Fix bn_addrelu's bitmask type error (#67)
Jan 25, 2022
1cb3da8
Optimize layer normalization for AMD GPUs (#66)
hubertlu-tw Jan 25, 2022
cfe106d
Update ATen/CUDAGeneratorImpl.h to ATen/cuda/CUDAGeneratorImpl.h to r…
jithunnair-amd Jan 26, 2022
5de49cc
Cherry-pick b2fdf9c from upstream Apex and resolve conflicts (#68)
jithunnair-amd Jan 28, 2022
980d5f4
Fix torch._softmax_backward_data arguments
Feb 16, 2022
7bef81f
Updated the handling of CUDAGeneratorImpl.h to new path
pruthvistony Mar 11, 2022
b6a1f48
Add rocblas_alt_impl falg for bwd rocblas calls in MHA (#70)
athitten Mar 18, 2022
063d720
Add rocblas_alt_impl flag for backprop in MLP (#71)
hubertlu-tw Mar 23, 2022
5ecad14
Make rocblas_gemm_flags_fp16_alt_impl in MHA and MLP backward compati…
hubertlu-tw Apr 6, 2022
29b3631
Cherry-picked the commit from upstream for faster --fast_multihead_at…
hubertlu-tw Apr 13, 2022
dd584a5
Added support for memory format API(torch.channels_last) in GBN (#72)
Mahathi-Vatsal Apr 14, 2022
27a4734
Apex transformer (#77)
hubertlu-tw Apr 15, 2022
c14cfb1
FusedRMSNorm/"T5LayerNorm" based on FusedLayerNorm (#1274)
eqy Feb 4, 2022
fceec07
fix and generate docs for FusedRMSNorm (#1285)
eqy Feb 7, 2022
4792170
[FusedRMSNorm doc] document where epsilon is added (#1295)
stas00 Feb 11, 2022
d755f1f
Fix some bugs
Apr 15, 2022
28c5638
Optimize HostRMSNormGradient and HostApplyRMSNorm for AMD GPUs
Apr 15, 2022
8df1b6b
Fix NaN issues in FusedRMSNorm
Apr 15, 2022
cf77e9b
Make rocblas_gemm_flags_fp16_alt_impl backward-compat for new naming …
hubertlu-tw May 31, 2022
0df6c4c
Update test_fused_layer_norm.py
Jul 29, 2022
016c8d4
Merge remote-tracking branch 'origin/dev/hubertlu/FusedRMSNorm'
Jul 29, 2022
795a5e5
Merge remote-tracking branch 'upstream/master' into IFU-master-2022-0…
Jul 29, 2022
bbf2c8d
Unskip run_transformer unit tests
Jul 29, 2022
038ed99
Fix some compiling errors
Jul 29, 2022
c97ebfa
Enable FusedRMSNorm (#78)
hubertlu-tw Aug 5, 2022
51783cc
Revert code changes to mutltihead_attn tests
Aug 8, 2022
cb8b7a8
Merge remote-tracking branch 'origin/master' into IFU-master-2022-07-29
Aug 8, 2022
57dea7f
Fix the cuda-specific transformer utils for ROCm
Aug 8, 2022
87fc412
Skip the failing unit tests from the FusedRMSNorm PR (#85)
hubertlu-tw Aug 8, 2022
4cfbe05
Addd a wrapper to skip flaky unit tests.
Aug 8, 2022
1b7b02e
Un-skip some tests and skip some flaky tests
Aug 8, 2022
975a0e5
Skip some flaky unit tests
Aug 9, 2022
8a8eb34
Skip a flaky unit test
Aug 9, 2022
ced59fc
Update L0 unit test script
Aug 9, 2022
4d56745
Remove run_pyprof_data and run_pyprof_nvtx unit tests
Aug 9, 2022
f1f28ff
Merge remote-tracking branch 'origin/dev/hubertlu/flaky_tests' into I…
Aug 9, 2022
cebbb04
Remove some comments in run_test.py
Aug 9, 2022
12ff0e2
Merge remote-tracking branch 'origin/dev/hubertlu/flaky_tests' into I…
Aug 9, 2022
cc5f83b
Skip a failing test introduced by a upstream PyTorch regression
Aug 10, 2022
96850df
Merge pull request #80 from ROCmSoftwarePlatform/IFU-master-2022-07-29
jithunnair-amd Aug 15, 2022
c662c70
Enable --peer_memory and --nccl_p2p extensions for ROCm
Aug 22, 2022
fd0f763
Fixed peer halo exchange module test
thorjohnsen Aug 15, 2022
40e1536
add customized fused op index mulitiplication (#1438)
BaoHhhhhhan Aug 1, 2022
ebb4e88
Enable --focal_loss and --index_mul_2d_cuda extensions on ROCm
Aug 23, 2022
a27b4e4
cached cast fix (#90)
hubertlu-tw Aug 26, 2022
9187ea1
Merge remote-tracking branch 'origin/master' into dev/hubertlu/focal_…
Sep 7, 2022
bc64ee8
Keep --peer_memory and --nccl_p2p CUDA-compatible
Sep 7, 2022
a53b441
Merge pull request #87 from ROCmSoftwarePlatform/dev/hubertlu/apex_pe…
jithunnair-amd Sep 8, 2022
ae5ca67
Enable --transducer extension for ROCm (#88)
hubertlu-tw Sep 8, 2022
7a34431
Merge branch 'master' into dev/hubertlu/focal_loss_and_index_mul_2d_cuda
jithunnair-amd Sep 8, 2022
5acb8d0
Merge pull request #91 from ROCmSoftwarePlatform/dev/hubertlu/focal_l…
jithunnair-amd Sep 8, 2022
89f5722
Faster build (#95)
hubertlu-tw Sep 19, 2022
719215b
Make index_mul_2d extension backward compatible for Atomic header inc…
hubertlu-tw Sep 21, 2022
9ebc53e
Consider both contiguous and channels_last tensors for FusedSGD (#97)
hubertlu-tw Dec 6, 2022
4dcf30a
Unskip some unit tests related to issue #82 (#98)
hubertlu-tw Dec 6, 2022
e90ba51
Fix a bug in fused_dense_cuda on ROCm
Dec 9, 2022
d63b5d1
Add fused_dense in the extension unit test script
Dec 9, 2022
6e453f1
Merge pull request #99 from ROCmSoftwarePlatform/dev/hubertlu/fused_d…
kkHuang-amd Dec 10, 2022
f05aaca
Update register keyword handling for C++17 (#100)
pruthvistony Dec 20, 2022
14db5c2
Updating BLOCK_SIZE to 1024 in all optimizers. (#103)
aspanday Jan 25, 2023
56c283b
Luise/gbn optimization (#105)
luise1030 Feb 13, 2023
b047a1f
Grid optimization - Chunk_Size optimization. (#104)
aspanday Feb 15, 2023
03d70c4
Cherry-picks some commits to replace torch.Tensor and remove dependen…
hubertlu-tw Mar 1, 2023
7a42877
Add FusedLARS optimizer (#109)
luise1030 Mar 23, 2023
1892147
Update rccl header include path (#110)
pruthvistony Mar 30, 2023
10c7482
Adding pyproject.toml file (#112)
pruthvistony Jun 20, 2023
8fc9b21
Changes to support hipblas migration (#113)
pruthvistony Aug 11, 2023
e4d2186
Revert "Changes to support hipblas migration (#113)"
pruthvistony Sep 6, 2023
3ba7192
Merge pull request #116 from ROCmSoftwarePlatform/revert_hipblas
sunway513 Sep 6, 2023
432ec5a
Adding version.txt with 1.1.0 (#121)
ramcherukuri Oct 27, 2023
1346a15
remove HCC references (#122)
jeffdaily Nov 14, 2023
4fa061d
Rel1.1.0 cherrypick master (#124)
ramcherukuri Dec 14, 2023
d835a88
Moving version to 1.2.0 (#126)
pruthvistony Dec 15, 2023
ba2cc25
moving from rocBLAS to hipBLAS
ramcherukuri Jan 12, 2024
aa0d0d2
Add setting of env flag when apex is turned on (#130)
pragupta Jan 24, 2024
608fe53
Batchnorm support (#129)
ramcherukuri Jan 24, 2024
7e78656
Moving master to version 1.3.0 (#131)
ramcherukuri Jan 26, 2024
1170a77
adding hipblas v2 changes
ramcherukuri Jan 27, 2024
92af951
ramcherukuri Jan 27, 2024
6873b49
Changes to supportHIPBLAS V1 and V2
ramcherukuri Jan 31, 2024
e208242
ramcherukuri Feb 1, 2024
335b147
Remove dead code
ramcherukuri Feb 2, 2024
d78ca9a
Remove unused hipOperationToRocOperation function
ramcherukuri Feb 2, 2024
99f5ec4
Merge branch 'master' into hpiblas-changes
ramcherukuri Feb 2, 2024
565a344
Merge pull request #127 from ROCm/hpiblas-changes
ramcherukuri Feb 2, 2024
9143459
Update README.md
jithunnair-amd May 20, 2024
a4d9af3
change compute type for F16 wrapper around cublas GEMMEx (#133)
suachong Jun 17, 2024
35c3474
Add ROCm version to version so it reflects in wheel name
jithunnair-amd Jun 25, 2024
bc680e2
support megatron seq_len > 4096 (#135)
ramcherukuri Jul 26, 2024
4d04ae6
Fix the build break (#136)
pruthvistony Jul 30, 2024
f065f5e
Hipblaslt support (#137)
ramcherukuri Oct 31, 2024
85d8a97
Updated setup.py to fix indent issue (#139)
BLOrange-AMD Nov 4, 2024
007e472
Revert hipblaslt (#141)
jagadish-amd Nov 11, 2024
73b7bca
Update version to 1.6.0a0 since 1.5.0 branch has been cut (#142)
jithunnair-amd Nov 20, 2024
7b7a3a7
Support hipblasLT (#144)
jagadish-amd Dec 6, 2024
46955b8
A fused `apply_rotary_pos_emb` implementation for Megatron-Core (#1746)
yaox12 Nov 14, 2023
dad8b4f
fix a bug in fused rope (#1750)
yaox12 Nov 16, 2023
e533ab5
Avoid `.contiguous()` in fused RoPE (#1751)
yaox12 Nov 23, 2023
1a04a39
[FusedRoPE] Fuse type conversion and cos/sin (#1752)
yaox12 Nov 29, 2023
69e7d88
Fused RoPE for `thd` format (#1756)
yaox12 Jan 12, 2024
035830f
Add 2D Fused RoPE (#1784)
yaox12 Apr 19, 2024
b30c03b
build: add fused_rotary_position_embedding in setup.py
caaatch22 Dec 23, 2024
7cd3447
Merge pull request #148 from caaatch22/fused_rope
jeffdaily Jan 3, 2025
9046c99
`Tensor.type()` -> `Tensor.scalar_type()` (#1855) (#147)
pruthvistony Jan 7, 2025
bf0f077
[test] update fix of #1859 (#1860) (#152)
amd-sriram Jan 21, 2025
60c1b8a
add setter for virtual world size (#1541) (#151)
amd-sriram Jan 21, 2025
e2a2796
Replaced amp function with autocast in mlp class (#153)
amd-sriram Jan 21, 2025
bb8dad8
Making fp16_utils tests run (#154)
amd-sriram Jan 21, 2025
90488c3
HipblasLT runtime (#145)
jagadish-amd Jan 22, 2025
3a5b941
Replaced amp function with torch autocast (#155)
amd-sriram Jan 23, 2025
ab24a29
Fused adam - added parameters - capturable, master weights, grad scal…
amd-sriram Jan 23, 2025
90ae2f9
Added unscale_grads to transformer Grad scaler (#157)
amd-sriram Jan 23, 2025
12dd820
Added torch check and release GIL in focal loss (#158)
amd-sriram Jan 28, 2025
084f047
Added torch check and release GIL in index_mul_2d (#159)
amd-sriram Jan 29, 2025
9f3b006
Added Release GIL, removed 2 skip statement for UT that used to fail …
amd-sriram Jan 29, 2025
6fc10c3
Reduced tolerance for lower precision data f16 and bf16. (#161)
amd-sriram Jan 30, 2025
59c1741
Update README.md (#162)
amd-sriram Jan 31, 2025
a467615
Bump version to 1.7.0
jithunnair-amd Feb 13, 2025
d711ff7
Append apex wheel name to include apex commit it is built on (#163)
ethanwee1 Feb 13, 2025
8af1d2a
Feature fused gradient accum (#164)
amd-sriram Feb 19, 2025
27017a4
Altered the HIPBLAS to CUBLAS (#169)
amd-sriram Feb 19, 2025
73201b7
Added fused_bias_swiglu kernel function and the test cases to test th…
amd-sriram Feb 24, 2025
0e16e6e
Reduced the tolerance of UT to -4, Auto initilize the maxthreads from…
amd-sriram Feb 26, 2025
b8ad311
Fix setup.py to read environment varialbe instead of parsing rocminfo…
amd-sriram Feb 27, 2025
9cce8f9
Fixed the unit test of fused dense by updating the parameters in the …
amd-sriram Mar 12, 2025
8051f20
replacing blas with blaslt in fused_weight_gradient_dense (#179)
amd-sriram Mar 12, 2025
6fd8b50
Feature distributed fused adam (#184)
amd-sriram Mar 21, 2025
ccb59d6
Ported distributed fused lamb from upstream repo. Add support for par…
amd-sriram Mar 21, 2025
386ecea
for distributed fused adam, add condition to remove nccl_allocator on…
amd-sriram Mar 24, 2025
a34f5c3
Building nccl_allocator only for pytorch 2.6 branch (#189)
amd-sriram Mar 25, 2025
b06b4c3
Change the location of the fused_dense unit tests. Fix the code for t…
amd-sriram Apr 15, 2025
1c44f5d
Include run_transformer UTs in the run_rocm.sh file (#194)
amd-sriram Apr 15, 2025
1667e85
Fix transformer unit tests (#195)
amd-sriram Apr 17, 2025
f06c72a
Update README.md (#196)
amd-sriram Apr 17, 2025
ab44c00
Update README.md (#198)
amd-sriram Apr 24, 2025
667af65
[ROCm] Use at::empty to manage workspace memory to avoid hip runtime …
RuibinCheung Apr 25, 2025
87e3bb0
Update version.txt (#203)
amd-sriram Apr 25, 2025
65f4584
Update the condition for building the NCCL allocator, PyTorch should …
amd-sriram Apr 26, 2025
09ffa0a
Update distributed fused adam - integrate Pipeline operations and sup…
amd-sriram Apr 26, 2025
6468501
upgrade matplotlib to resolve setuptools_scm error. (#213)
amd-sriram Apr 29, 2025
6729b2b
Update fused layer norm code from upstream apex repo. The intra-warp …
amd-sriram May 8, 2025
a31598c
Fix unit tests for transformer, fused dense, mlp (#218)
amd-sriram May 13, 2025
81eb2fb
Reset torch default device to cpu after running the amp unit tests. (…
amd-sriram May 19, 2025
89c37c8
change epilogue parameter for hipblaslt matmul in cuda kernel for fus…
amd-sriram Jun 3, 2025
7f38d9d
Do not use warpSize as a constexpr in nhwc_batch_norm_kernel.h
iassiour Jul 6, 2025
a8ba9a6
Merge pull request #227 from ROCm/amd/dev/iassiour/SWDEV-541770
iassiour Jul 8, 2025
d533e3f
[master] Added AITER as a submodule and use in fused_rope.py (#222)
amd-sriram Jul 8, 2025
95c7ed2
Replacing c10_warp_size with platform based warp_size values (#228)
amd-sriram Jul 8, 2025
7d9b032
Fixing the C10_warpsize issue. replacing the macros with at::cuda::wa…
amd-sriram Jul 9, 2025
ed2d044
Apex extensions import test (#245)
amd-sriram Jul 10, 2025
6e23ced
correct the approach to get to the apex folder from the test file (#248)
amd-sriram Jul 11, 2025
99c6242
Replaced warpsize with C10_WARP_SIZE (#249)
amd-sriram Jul 11, 2025
19eed3c
Disabling Aiter Installation in default build (#254)
amd-sriram Jul 11, 2025
051cba7
Fix warp size (#256)
amd-sriram Jul 15, 2025
61431e1
Update version.txt (#261)
amd-sriram Jul 22, 2025
7221c68
Update README.md (#262)
amd-sriram Jul 22, 2025
1e9236f
Fix build error (#264)
pragupta Jul 22, 2025
62c94ed
reset parameters for FusedDenseGeluDense similar to FusedDense to mak…
amd-sriram Jul 28, 2025
4b03581
update the param_id calculation so that it works on both CPX and SPX …
amd-sriram Aug 11, 2025
053a9b1
Update README.md (#273)
amd-sriram Oct 3, 2025
34160b8
Update version.txt (#274)
amd-sriram Oct 3, 2025
2190fba
Update aiter submodule to latest commit (#275)
amd-sriram Oct 3, 2025
4a04a64
add code to read BUILD_VERSION env variable, so that it is used inste…
amd-sriram Nov 21, 2025
b986681
Update version to 1.10.0 (#282)
amd-sriram Nov 21, 2025
267d397
Update README.md (#289)
amd-sriram Nov 24, 2025
cfaba56
Pow implementation is very expensive on AMD CDNA4. (#292)
JohnNikolay84 Jan 22, 2026
95043e3
[REDUX] Refactor Apex build process to use the PyTorch JIT extension …
jithunnair-amd Jan 26, 2026
e74e09a
Bump version from 1.10.0 to 1.11.0 (#293)
amd-sriram Jan 28, 2026
9495986
Port fused_conv_bias_relu to ROCm (#295)
amd-sriram Feb 4, 2026
31254da
add details of fused_conv_bias_relu in table of modules and fix error…
amd-sriram Feb 11, 2026
e17d1ed
Add new apex module to jit load system (#294)
amd-sriram Feb 18, 2026
4b5ca60
Create custom python operators for MixedFusedLayerNorm and MixedFused…
amd-sriram Feb 26, 2026
6269a50
Update README with release notes for version 1.11.0 (#310)
amd-sriram Mar 2, 2026
4fe55b9
CI: Added GHA CI workflow (#303)
leo-automation Mar 11, 2026
8504790
Add USE_ROCM
jithunnair-amd Apr 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
202 changes: 202 additions & 0 deletions .github/workflows/rocm-ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
name: Apex ROCm CI

on:
pull_request:
types: [opened, synchronize, ready_for_review]
branches:
- master
- release/1.8.0
- release/1.9.0
- release/1.10.0
workflow_dispatch:
inputs:
apex_gitref:
description: 'Apex branch or commit SHA to build'
required: false
default: 'master'
type: string
docker_image:
description: 'Docker image to use'
required: false
default: 'rocm/pytorch:latest'
type: string
run_extension:
description: 'Run Extension Import tests'
required: false
default: true
type: boolean
run_l0:
description: 'Run L0 tests'
required: false
default: true
type: boolean
run_contrib:
description: 'Run Contrib tests'
required: false
default: true
type: boolean
run_halo:
description: 'Run Peer Halo Exchange tests'
required: false
default: true
type: boolean
run_syncbn:
description: 'Run Distributed Synced BatchNorm tests'
required: false
default: true
type: boolean

env:
DOCKER_IMAGE: ${{ inputs.docker_image || 'rocm/pytorch:latest' }}

jobs:
build:
name: Build Apex Wheel
runs-on: build-only-apex
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
# Uses the specified branch on manual runs; defaults to the PR/Push context otherwise
ref: ${{ github.event_name == 'workflow_dispatch' && inputs.apex_gitref || '' }}
submodules: recursive

- name: Pull Docker Image
run: |
docker pull ${{ env.DOCKER_IMAGE }}

- name: Start Background Docker Container
run: |
docker run -d --name apex-build-container \
-v ${{ github.workspace }}:/workspace -w /workspace \
${{ env.DOCKER_IMAGE }} sleep infinity

- name: Build Apex Wheel
run: |
docker exec apex-build-container bash -c "
pip install --upgrade pip
pip install build ninja wheel packaging

python3 -m build --wheel --no-isolation -C--build-option=--cpp_ext -C--build-option=--cuda_ext

chown -R $(id -u):$(id -g) dist/
"

- name: Run Extension Import tests
if: ${{ github.event_name != 'workflow_dispatch' || inputs.run_extension }}
run: |
docker exec apex-build-container bash -c "
set -eo pipefail

pip install expecttest onnxscript
pip install dist/apex-*.whl

cd tests
python3 test_extension_import.py 2>&1 | tee ../extension_import_results.log
"

- name: Cleanup Build Container
if: always()
run: docker rm -f apex-build-container

- name: Upload Wheel Artifact
uses: actions/upload-artifact@v4
with:
name: apex-wheel
path: dist/*.whl
retention-days: 7

test:
name: Run Unit Tests
timeout-minutes: 720
runs-on: linux-apex-mi325-8
needs: build
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
ref: ${{ github.event_name == 'workflow_dispatch' && inputs.apex_gitref || '' }}
submodules: recursive

- name: Download Wheel Artifact
uses: actions/download-artifact@v4
with:
name: apex-wheel
path: dist/

- name: Pull Docker Image
run: |
docker pull ${{ env.DOCKER_IMAGE }}

- name: Start Background Docker Container
run: |
docker run -d --name apex-test-container \
--device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host \
-e OMP_NUM_THREADS=8 \
-e TORCH_NCCL_ASYNC_ERROR_HANDLING=1 \
-e NCCL_DEBUG=WARN \
-v ${{ github.workspace }}:/workspace -w /workspace \
${{ env.DOCKER_IMAGE }} sleep infinity

- name: Install Dependencies and Built Wheel
run: |
docker exec apex-test-container bash -c "
set -e
pip install expecttest onnxscript
pip install dist/apex-*.whl
"

- name: Run L0 tests
if: ${{ (always()) && (github.event_name != 'workflow_dispatch' || inputs.run_l0) }}
run: |
docker exec apex-test-container bash -c "
set -eo pipefail
cd tests/L0
sh run_rocm.sh 2>&1 | tee ../../L0_results.log
"

- name: Run Contrib tests
if: ${{ (success() || failure()) && (github.event_name != 'workflow_dispatch' || inputs.run_contrib) }}
run: |
docker exec apex-test-container bash -c "
set -eo pipefail
cd apex/contrib/test
python3 run_rocm_extensions.py 2>&1 | tee ../../../contrib_results.log
"

- name: Run Peer Halo Exchange tests
if: ${{ (success() || failure()) && (github.event_name != 'workflow_dispatch' || inputs.run_halo) }}
run: |
docker exec apex-test-container bash -c "
set -eo pipefail
export HSA_FORCE_FINE_GRAIN_PCIE=1
export HSA_ENABLE_SDMA=0
torchrun --nproc_per_node 8 apex/contrib/peer_memory/peer_halo_exchange_module_tests.py 2>&1 | tee halo_results.log
"

- name: Run Distributed Synced BatchNorm tests
if: ${{ (success() || failure()) && (github.event_name != 'workflow_dispatch' || inputs.run_syncbn) }}
run: |
docker exec apex-test-container bash -c "
set -eo pipefail
cd tests/distributed/synced_batchnorm
sh unit_test.sh 2>&1 | tee ../../../syncbn_results.log
"

- name: Fix Artifact Permissions
if: always()
run: |
docker exec apex-test-container bash -c "chown -R $(id -u):$(id -g) *.log"

- name: Cleanup Background Container
if: always()
run: docker rm -f apex-test-container

- name: Upload Test Logs
if: always()
uses: actions/upload-artifact@v4
with:
name: test-logs
path: |
*.log
retention-days: 14
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -145,3 +145,10 @@ dmypy.json

# Cython debug symbols
cython_debug/
*.hip
*_hip.*
*hip*


#file temporarily created for build process
apex/git_version_info_installed.py
7 changes: 3 additions & 4 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
[submodule "apex/contrib/csrc/multihead_attn/cutlass"]
path = apex/contrib/csrc/multihead_attn/cutlass
url = https://github.com/NVIDIA/cutlass.git
branch = v1.2.0
[submodule "apex/contrib/csrc/cudnn-frontend"]
path = apex/contrib/csrc/cudnn-frontend
url = https://github.com/NVIDIA/cudnn-frontend.git
[submodule "third_party/aiter"]
path = third_party/aiter
url = https://github.com/ROCm/aiter
1 change: 1 addition & 0 deletions .jenkins/docker/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
sudo docker build . --rm -t apex
1 change: 1 addition & 0 deletions .jenkins/docker/launch.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
sudo docker run -it -v $HOME:/data --rm --privileged --device=/dev/dri --device=/dev/kfd --network host --group-add video apex
7 changes: 7 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
ARG FROM_IMAGE=lcskrishna/rocm-pytorch:rocm3.3_ubuntu16.04_py3.6_pytorch_bfloat16_mgpu

FROM ${FROM_IMAGE}
RUN \
git clone --recursive https://github.com/ROCmSoftwarePlatform/apex.git && \
cd apex && \
python3.6 setup.py install --cpp_ext --cuda_ext
2 changes: 2 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
recursive-include apex/contrib/csrc *
recursive-include apex/csrc *
17 changes: 17 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
PYTHON = python3
PIP = $(PYTHON) -m pip

clean: # This will remove ALL build folders.
@test -d build/ && echo "Deleting build folder" || true
@test -d build/ && rm -r build/ || true
@test -d dist/ && echo "Deleting dist folder" || true
@test -d dist/ && rm -r dist/ || true
@test -d apex.egg-info/ && echo "Deleting apex.egg-info folder" || true
@test -d apex.egg-info/ && rm -r apex.egg-info/ || true

$(PYTHON) scripts/clean.py # remove the apex extensions installed at torch extensions folder

aiter:
$(PIP) uninstall -y aiter
cd third_party/aiter && $(PIP) install . --no-build-isolation --no-deps

Loading