Add support for --gpus options in Rocker (shm merged separately)#304
Add support for --gpus options in Rocker (shm merged separately)#304SamratThapa120 wants to merge 5 commits intoosrf:mainfrom
Conversation
tfoote
left a comment
There was a problem hiding this comment.
Overall both of these would be great. I think that the --shm-size pass through completely makes sense. it can be merged quickly if we add tests.
The --gpus option should be thought about a bit more as to how it will dovetail with the current --nvidia option. And hopefully can replace the current recommendation for --device /dev/dri/card0
src/rocker/extensions.py
Outdated
|
|
||
| def get_docker_args(self, cliargs): | ||
| args = '' | ||
| shm_size = cliargs.get('gpus', None) |
There was a problem hiding this comment.
Looks like some copy and paste here
setup.py
Outdated
| 'expose = rocker.extensions:Expose', | ||
| 'git = rocker.git_extension:Git', | ||
| 'group_add = rocker.extensions:GroupAdd', | ||
| 'shm_size = rocker.extensions:ShmSize', |
There was a problem hiding this comment.
This should be in alphabetical order.
|
Thank you for the review. I have made the requested changes, and added tests for both new features.
In addition to replace the current recommendation for --device /dev/dri/card0, the new |
|
@tfoote I have also modified the |
tfoote
left a comment
There was a problem hiding this comment.
Thanks for adding the tests. This looks like it's basically ready to merge. The interactions passing through the gpu arguments to --nvidia is a great way to get them to work together. I have one request for the code organization, but then this looks good to go.
src/rocker/extensions.py
Outdated
| help="Set the size of the shared memory for the container (e.g., 512m, 1g).") | ||
|
|
||
|
|
||
| class Gpus(RockerExtension): |
There was a problem hiding this comment.
With the cross coupling can you move this to the nvidia_extensions file instead? It's not explicitly nvidia but I'd like to keep them colocated. And that file could potentially be renamed in the future with it's expanded scope. The X11 class is already not nvidia tied.
test/test_extension.py
Outdated
| args = p.get_docker_args(mock_cliargs) | ||
| self.assertIn('--shm-size 12g', args) | ||
|
|
||
| class GpusExtensionTest(unittest.TestCase): |
There was a problem hiding this comment.
This would also go into the test_nvidia.py in parallel with moving the implementation
|
@tfoote Thank you. I have made the requested changes. |
tfoote
left a comment
There was a problem hiding this comment.
Thanks for the cleanup.
Unfortunately I tried testing this out on my non-nvidia machine and was quite disappointed to discover that it doesn't work. On deeper reading the --gpus option only works with NVIDIA gpus, and not generic gpus or integrated ones.
https://docs.docker.com/reference/cli/docker/container/run/#gpus
The --gpus flag allows you to access NVIDIA GPU resources. First you need to install the [nvidia-container-runtime](https://nvidia.github.io/nvidia-container-runtime/).
So I think that it makes sense to just collapse this into the nvidia plugin implementation. I'd like to have the argument be --nvidia-gpus but we can provide an alias to --gpus to mirror the upstream command line syntax.
And to support your use case of using nvidia without the content being added into the container we could extend the policy to support NVIDIA_GLVND_VALID_VERSIONS with None as a valid option.
Potentially as these tools get more integrated this might be able to become a default policy. A shortcut option --nvidia-no-glvnd that sets that policy would be reasonable to consider, but I'm not sure it's worth the extra complexity.
|
|
||
| @staticmethod | ||
| def register_arguments(parser, defaults={}): | ||
| parser.add_argument('--gpus', |
There was a problem hiding this comment.
| parser.add_argument('--gpus', | |
| parser.add_argument('--nvidia-gpus', '--gpus', |
And this would move up into the nvidia plugin.
|
I've separated out the shm extension and merged it in #306 so now it's just about cleaning up the nvidia gpu options. |
This PR introduces support for two new options,
--shm_sizeand--gpus, which are equivalent to the respective arguments in Docker. These options provide greater flexibility and usability when customizing container environments, particularly for resource-intensive and GPU-dependent workloads.