-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
What did you find confusing? Please describe.
I installed sagemaker with pip install sagemaker --update, and am attempting to use distributed model parallel with pytorch. However, I'm unable to import smdistributed.
The docs https://sagemaker.readthedocs.io/en/stable/api/training/smp_versions/v1.2.0/smd_model_parallel_pytorch.html don't have installation instructions for smdistributed. I was wondering how do I get smdistributed installed? Thank you!
I am also looking at https://docs.amazonaws.cn/en_us/sagemaker/latest/dg/model-parallel-customize-training-script-pt.html which directs me to https://sagemaker.readthedocs.io/en/stable/api/training/smp_versions/v1.2.0/smd_model_parallel_common_api.html#smp.init to initialize the sagemaker distributed environment. But again I'm not sure how to get the smdistributed library.
https://github.com/aws/amazon-sagemaker-examples has some smdistributed examples but doesn't provide any clear installation instructions. environment.yml in that repo seems to indicate all that's needed is sagemaker which I have installed.
Describe how documentation can be improved
Could not find clear installation instructions for smdistributed, would it be possible to add these?
Additional context
Add any other context or screenshots about the documentation request here.