Please find the comment in the PR we enabled --peer_memory and --nccl_p2p extensions: #87 (comment)
Some tests failed sporadically on ROCm by running the following test script:
cd apex/contrib/peer_memory
torchrun --nproc_per_node 2 peer_halo_exchange_module_tests.py
Please find the comment in the PR we enabled --peer_memory and --nccl_p2p extensions: #87 (comment)
Some tests failed sporadically on ROCm by running the following test script: