Add vfio mode to generate CDI specs for NVIDIA passthrough GPUs by elezar · Pull Request #315 · NVIDIA/nvidia-container-toolkit

elezar · 2024-01-29T16:02:38Z

This change adds a vfio mode to the nvcdi API (and nvidia-ctk generate command).

---
cdiVersion: 0.5.0
kind: nvidia.com/pgpu
devices:
    - name: "0"
      containerEdits:
        deviceNodes:
            - path: /dev/vfio/5
              hostPath: /dev/vfio/5
containerEdits:
    env:
        - NVIDIA_VISIBLE_DEVICES=void
    deviceNodes:
        - path: /dev/vfio/vfio
          hostPath: /dev/vfio/vfio

elezar · 2024-01-29T16:03:43Z

@cdesiniotis if this is still valid, please rebase or close.

elezar · 2025-01-15T15:38:49Z

@varunrsekar would this be interesting for what you're looking at in NVIDIA/k8s-dra-driver-gpu#183?

varunrsekar · 2025-01-15T22:18:18Z

pkg/nvcdi/lib-vfio.go

+// GetCommonEdits returns common edits for ALL devices.
+// Note, currently there are no common edits.
+func (l *vfiolib) GetCommonEdits() (*cdi.ContainerEdits, error) {
+	return &cdi.ContainerEdits{ContainerEdits: &specs.ContainerEdits{}}, nil


This is missing a device node for /dev/vfio/vfio:

&cdi.ContainerEdits{ ContainerEdits: &specs.ContainerEdits{ DeviceNodes: []*specs.DeviceNode{ &specs.DeviceNode{ Path: "/dev/vfio/vfio", }, }, }, }

varunrsekar · 2025-01-15T22:19:55Z

@varunrsekar would this be interesting for what you're looking at in NVIDIA/k8s-dra-driver#183?

yeah this will help!

cdesiniotis · 2025-01-15T23:23:12Z

@varunrsekar I am no longer working on this -- feel free to take ownership of this PR if you need it.

coveralls · 2025-07-07T12:06:16Z

Pull Request Test Coverage Report for Build 16119484166

Details

69 of 81 (85.19%) changed or added relevant lines in 4 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.3%) to 34.398%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pkg/nvcdi/mode.go	0	1	0.0%
pkg/nvcdi/lib.go	6	8	75.0%
pkg/nvcdi/lib-vfio.go	59	68	86.76%

Totals
Change from base Build 16115644541:	0.3%
Covered Lines:	4565
Relevant Lines:	13271

💛 - Coveralls

ArangoGutierrez

2 non blocking nits

pkg/nvcdi/lib-vfio.go

Copilot

Pull Request Overview

This PR adds a new VFIO mode to the nvcdi API and the nvidia-ctk generate command to support NVIDIA GPU passthrough via VFIO.

Introduces a ModeVfio constant and integrates it into the mode resolver.
Adds WithPCILib option and dependency on nvpci for querying PCI devices.
Implements vfiolib to generate VFIO-based CDI specs, including unit tests.

Reviewed Changes

Copilot reviewed 6 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
pkg/nvcdi/options.go	Import `nvpci` and add `WithPCILib` option
pkg/nvcdi/mode.go	Register `ModeVfio` in the list of supported modes
pkg/nvcdi/lib.go	Wire up `nvpcilib` in `New` for `ModeVfio`
pkg/nvcdi/lib-vfio.go	Implement `vfiolib` and VFIO device spec generation
pkg/nvcdi/lib-vfio_test.go	Add unit tests for the new VFIO mode
go.mod	Bump `github.com/NVIDIA/go-nvlib` for PCI library support

Comments suppressed due to low confidence (3)

pkg/nvcdi/lib-vfio.go:93

[nitpick] Consider using %q instead of %v to quote the invalid ID in the error message for clarity ("invalid channel ID %q: %w").

			return nil, fmt.Errorf("invalid channel ID %v: %w", id, err)

pkg/nvcdi/lib-vfio_test.go:28

Tests cover positive VFIO scenarios but don’t verify that non-vfio-pci devices are filtered out or that error paths (e.g., invalid ID parsing, PCI library failures) behave as expected. Consider adding cases for those.

func TestModeVfio(t *testing.T) {

pkg/nvcdi/lib-vfio.go:33

[nitpick] The field group in vfioDevice is ambiguous — consider renaming it to iommuGroup to make its purpose clearer.

	group   int

pkg/nvcdi/options.go

pkg/nvcdi/lib-vfio.go

varunrsekar · 2025-10-11T00:13:44Z

@elezar Sorry this completely missed my radar. Are you driving this to completion?
I've opened a new WIP PR NVIDIA/k8s-dra-driver-gpu#668 where I can consume the outcome of this PR.

…through GPUs Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>

elezar · 2025-11-05T12:35:20Z

Thanks @varunrsekar. I have rebased this. Please take a look.

varunrsekar

@elezar Thanks for this change. This looks good to me. Just had some minor comments.

Also as a follow-up:
We'd also want to support IOMMUFD devices (https://lists.nongnu.org/archive/html/qemu-devel/2022-06/msg01294.html):

The legacy way is /dev/vfio/vfio and /dev/vfio/<group>. The IOMMUFD way is: /dev/iommu and /dev/vfio/devices/<vfioX>

varunrsekar · 2025-11-17T21:01:36Z

pkg/nvcdi/lib-vfio.go

+
+	var vfioDevices []*vfioDevice
+	for i, dev := range devices {
+		if dev.Driver != "vfio-pci" {


I recently found out that this driver is different for GB200/GH200 systems (nvgrace-gpu-vfio-pci). So perhaps this needs to account for it

On a related note, I am working on updating our vfio-manage script (which binds GPUs to the vfio-pci driver) to account for this new VFIO module required for Grace: NVIDIA/k8s-driver-manager#128. Hopefully we can push the relevant abstractions to go-nvlib so that our various clients don't need to re-implement logic to detect what module name to use.

yes having this in go-nvlib would be very useful!

varunrsekar · 2025-11-17T21:19:09Z

pkg/nvcdi/lib-vfio.go

+func (l *vfioDevice) GetDeviceSpecs() ([]specs.Device, error) {
+	path := fmt.Sprintf("/dev/vfio/%d", l.group)
+	deviceSpec := specs.Device{
+		Name: fmt.Sprintf("%d", l.index),


Can you add a prefix to this name? I can imagine that other non-gpu vfio device might also need something like this. So we'd need a prefix to also differentiate the device types. Something like: vfio-<deviceType>-<deviceIndex>. Eg: vfio-gpu-0 / gpu-vfio-0.

Would you prefer that over an explicit class by default? What about:

nvidia.com/gpu.vfio=0

or

nvidia.com/vfio.gpu=0

Eg:

$ sudo nvidia-ctk cdi list k8s.gpu.nvidia.com/device=gpu-0 k8s.gpu.nvidia.com/device=gpu-1 nvidia.com/gpu.vfio=0 nvidia.com/gpu.vfio=1

Would it be possible for our container runtime to allocate from the different classes? I remember I couldnt get it to work when I attempted a new class

elezar assigned cdesiniotis Jan 29, 2024

varunrsekar reviewed Jan 15, 2025

View reviewed changes

elezar force-pushed the vfio-cdi-mode branch from 9921c02 to d91b1e9 Compare July 7, 2025 12:03

elezar force-pushed the vfio-cdi-mode branch from d91b1e9 to 0e79c46 Compare July 7, 2025 12:10

elezar requested a review from cdesiniotis July 7, 2025 12:10

elezar marked this pull request as ready for review July 7, 2025 12:10

ArangoGutierrez requested review from ArangoGutierrez and Copilot July 7, 2025 12:27

This comment was marked as outdated.

Sign in to view

elezar changed the title ~~Add 'vfio' mode to pkg/nvcdi for generating CDI specs for NVIDIA pass…~~ Add vfio mode to generate CDI specs for NVIDIA passthrough GPUs Jul 7, 2025

ArangoGutierrez reviewed Jul 7, 2025

View reviewed changes

pkg/nvcdi/lib-vfio.go Show resolved Hide resolved

pkg/nvcdi/lib-vfio.go Outdated Show resolved Hide resolved

elezar force-pushed the vfio-cdi-mode branch from 0e79c46 to 81047b1 Compare July 7, 2025 14:16

elezar requested a review from varunrsekar July 7, 2025 14:20

elezar self-assigned this Jul 7, 2025

ArangoGutierrez requested review from ArangoGutierrez and Copilot July 7, 2025 14:26

Copilot AI reviewed Jul 7, 2025

View reviewed changes

pkg/nvcdi/options.go Outdated Show resolved Hide resolved

pkg/nvcdi/lib-vfio.go Show resolved Hide resolved

pkg/nvcdi/lib-vfio.go Show resolved Hide resolved

varunrsekar mentioned this pull request Oct 15, 2025

Support VFIO passthrough NVIDIA/k8s-dra-driver-gpu#668

Merged

elezar force-pushed the vfio-cdi-mode branch from 81047b1 to b6b7cd1 Compare November 5, 2025 12:30

Add 'vfio' mode to pkg/nvcdi for generating CDI specs for NVIDIA pass…

517859c

…through GPUs Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>

elezar force-pushed the vfio-cdi-mode branch from b6b7cd1 to 517859c Compare November 5, 2025 12:34

elezar modified the milestone: next-minor Nov 5, 2025

elezar added this to the v1.19.0 milestone Nov 5, 2025

varunrsekar reviewed Nov 17, 2025

View reviewed changes

Conversation

elezar commented Jan 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elezar commented Jan 29, 2024

Uh oh!

elezar commented Jan 15, 2025

Uh oh!

varunrsekar Jan 15, 2025

Choose a reason for hiding this comment

Uh oh!

varunrsekar commented Jan 15, 2025

Uh oh!

cdesiniotis commented Jan 15, 2025

Uh oh!

coveralls commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 16119484166

Details

💛 - Coveralls

Uh oh!

This comment was marked as outdated.

Uh oh!

ArangoGutierrez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

varunrsekar commented Oct 11, 2025

Uh oh!

elezar commented Nov 5, 2025

Uh oh!

varunrsekar left a comment

Choose a reason for hiding this comment

Uh oh!

varunrsekar Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

cdesiniotis Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

varunrsekar Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

varunrsekar Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

elezar Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

varunrsekar Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

elezar commented Jan 29, 2024 •

edited

Loading

coveralls commented Jul 7, 2025 •

edited

Loading