Skip to content

rdp role: make GPU acceleration physical/VM/none-aware; never apply networking sysctls on physical machines #9

Description

@catinspace-au

Summary

The rdp role assumes a virtio-gpu VM throughout its GPU/acceleration logic and applies TCP networking sysctls unconditionally (no VM gating). On a physical machine this produces silent software (CPU) encoding and pushes VM-oriented network tuning onto bare metal.

Two asks, scoped to the rdp role only:

  1. Make GPU acceleration physical-/VM-/none-aware (detect the actual GPU and install the matching VA-API stack; degrade gracefully when there is no usable GPU).
  2. Never apply networking sysctl changes on a physical machine — gate them on the VM fact the repo already uses elsewhere.

Observed on a real host

Bare-metal HP Dragonfly G4 laptop (systemd-detect-virtnone) with an Intel Iris Xe iGPU (i915). After the role ran:

  • vainfo failed — iHD_drv_video.so was absent. The role installed mesa-va-drivers (Gallium/VirGL state tracker, for virtio-gpu), not the Intel media driver, so VA-API never initialised.
  • No GStreamer VA encoder element present → gnome-remote-desktop (GNOME 50, system "Remote Login" mode) fell back to software RFX. The Nice=-10 priority drop-in was masking that CPU cost.
  • The TCP/MTU sysctl drop-ins (99-rdp-optimization.conf, 99-rdp-mtu.conf) were applied on bare metal.

Manually installing intel-media-va-driver-non-free + gstreamer1.0-plugins-bad, adding the grd user to render/video, and removing the sysctl drop-ins gave confirmed hardware H.264 encode (iGPU VCS engine active during a session).


Problem 1 — GPU/VA-API driver selection is virtio-gpu-only

ansible/roles/rdp/tasks/vaapi.yml installs only mesa-va-drivers + vainfo (Ubuntu) / mesa-va-drivers + libva-utils (Fedora). The file comment even states it's "the Gallium VA-API state tracker for virtio-gpu". That is correct for VirGL VMs and for physical AMD (radeonsi), but wrong for physical Intel (needs intel-media-va-driver / iHD) and for NVIDIA (needs nvidia-vaapi-driver).

There is also no install of the GStreamer VA encoder plugin that grd 50 needs (vah264enc / vah264lpenc, from gstreamer1.0-plugins-bad on Ubuntu). Without it, HW encode can't happen even with a working driver.

Proposed: detect GPU type and branch:

Target VA driver Notes
virtio-gpu (VM) mesa-va-drivers current behaviour
Intel (physical) intel-media-va-driver[-non-free] iHD; Iris Xe etc.
AMD (physical) mesa-va-drivers radeonsi/RADV
NVIDIA (physical) nvidia-vaapi-driver or document as unsupported
none / no /dev/dri skip; software encode

Plus install the GStreamer VA encoder plugin (gstreamer1.0-plugins-bad on Ubuntu) wherever a HW encoder is expected. Detection can use PCI vendor (lspci/sysfs vendor IDs: 0x8086 Intel, 0x1002 AMD, 0x10de NVIDIA, 0x1af4 virtio) combined with the existing ansible_virtualization_role.

Problem 2 — HW-encode enablement targets the wrong systemd unit

ansible/roles/rdp/files/vaapi-check writes GRD_DEBUG=vkva-renderer into /etc/systemd/user/gnome-remote-desktop.service.d/vaapi.conf — the per-user service. Hosts using Remote Login run the system service (/usr/lib/systemd/system/gnome-remote-desktop.service), so this override never takes effect there.

Proposed: detect which mode is active (system Remote Login vs per-user Desktop Sharing) and write the override to the matching unit. Also re-evaluate whether GRD_DEBUG=vkva-renderer is still the right mechanism — recent grd auto-selects vah264enc when the VA encoder is present, so simply ensuring the driver+plugin exist may be sufficient.

Problem 3 — GPU device access uses world-open 0666 instead of group membership

ansible/roles/rdp/tasks/gpu_groups.yml deploys /etc/udev/rules.d/99-gpu-open-access.rules with SUBSYSTEM=="drm", MODE="0666", making all DRM devices world read/write. That's a least-privilege/security smell, especially on physical multi-user machines.

Proposed: for the GPU-using service account (the grd system user in Remote Login mode), add it to the render/video groups instead of opening the device to the world; if a udev rule is still wanted, scope it to KERNEL=="renderD*" rather than all of drm.

Problem 4 — networking sysctls applied to physical machines (no VM gating)

ansible/roles/rdp/tasks/tcp.yml (BBR, fq, 16 MB socket buffers, netdev_max_backlog=5000) and ansible/roles/rdp/tasks/mtu.yml (tcp_mtu_probing=1, ip_no_pmtu_disc=0) are gated only on:

when:
  - ansible_facts['distribution'] in ['Fedora', 'Ubuntu']
  - has_gnome

No virtualization check, so they land on bare metal. Networking sysctl changes must not be applied to physical machines. The repo already has the right idiom — vm_optimizer gates with ansible_virtualization_role == 'guest'.

Proposed: add - ansible_virtualization_role == 'guest' to tcp.yml and mtu.yml (or move these RDP network tunings into vm_optimizer). On physical hosts, skip them entirely.

Problem 5 — Nice=-10 priority drop-in is a software-encode band-aid

ansible/roles/rdp/files/grd-priority.conf (deployed by service.yml) compensates for CPU-bound software RFX. Once HW encode is correctly provisioned it's unnecessary.

Proposed: tie the priority boost to the software-encode/none path only; skip it when HW encode is active.


Acceptance criteria

  • GPU detection branches on physical/VM/none and installs the correct VA-API driver per vendor (Intel iHD, AMD/virtio mesa, NVIDIA bridge).
  • GStreamer VA encoder plugin installed where HW encode is expected.
  • HW-encode override written to the correct unit for both Remote Login (system) and Desktop Sharing (user) modes.
  • GPU access via group membership (or renderD*-scoped rule), not world-0666 on all DRM.
  • tcp.yml and mtu.yml only run when ansible_virtualization_role == 'guest'; physical machines get no RDP networking sysctls.
  • Nice=-10 priority applied only on the software-encode/none path.
  • Idempotent and verified on: physical Intel, physical AMD, virtio-gpu VM, and a no-GPU host.

Scope: rdp role only.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions