Skip to content

Conversation

@phip1611
Copy link
Member

@phip1611 phip1611 commented Jan 5, 2026

This change updates our Garden Linux patch set to Cloud Hypervisor v50.0, upgrading from v48.0 (v49.0 was skipped). In line with our established process, this requires review and ideally three approvals. After the merge, the base branch (that we merge into) will be renamed to gardenlinux, and the current main branch will be renamed to gardenlinux-v48 via the GitHub UI.

libvirt and libvirt-tests changes

How I Created This and What I've Changed

  • I took the original commits
  • The order and grouping mostly still exists, as well as the original authorship
  • regrouped them (add A -> add B -> fix A became add A + fix A -> add B
  • I only squashed where having dedicated commits close to each other doesn't make sense
  • For every commit, I ran cargo check && cargo +nightly fmt --all && cargo clippy --all-targets --tests && cargo nextest run
    • Achieved via: git rebase -i --exec 'cargo check && cargo +nightly fmt --all && cargo clippy --all-targets --tests && cargo nextest run' HEAD~<N>
  • Therefore, I made changes to most commits that are not in the original commits. This mostly comes from tightened clippy lints in v50.0
  • I didn't experience any major problems, just a lot of small things
  • I could drop more than 30 commits from the original gardenlinux branch as they are already upstream
  • I added a new unit test for the memory range partition code because I couldn't fix it because I didn't understand it. The unit test helped me.
  • I changed most of the old commit messages
    • to include the On-behalf-of: SAP marker
    • satisfy the latest gitlint rules
  • I verified that this runs in the latest libvirt-tests

Hints for Reviewers

  • It is not really possible to review this nicely. I think looking at a few commit ranges here and there is a good starting point
  • Look at the original git commit history and try to match the commits with the ones of this PR (the commit messages are mostly the same)
  • It might help you to look at the v49 PR ([DON'T MERGE] gardenlinux-v49  #40) as intermediate step. I also used this one to prepare the upgrade to v.50

@phip1611 phip1611 self-assigned this Jan 5, 2026
@phip1611 phip1611 force-pushed the next-gardenlinux-v50 branch from ba0eb48 to 882e0b3 Compare January 5, 2026 17:12
@phip1611 phip1611 changed the title [DON'T MERGE] Rebase of gardenlinux Patchset to v50.0 [DON'T MERGE] Port gardenlinux Patchset to v50.0 Jan 5, 2026
@phip1611 phip1611 mentioned this pull request Jan 5, 2026
3 tasks
@phip1611 phip1611 changed the title [DON'T MERGE] Port gardenlinux Patchset to v50.0 [DON'T MERGE] Port gardenlinux Patchset v48.0 -> v50.0 Jan 5, 2026
@phip1611 phip1611 force-pushed the next-gardenlinux-v50 branch from 882e0b3 to 1eb4fca Compare January 6, 2026 07:45
phip1611 added a commit to phip1611/libvirt-tests that referenced this pull request Jan 6, 2026
Since PR #7525 (Dec 2025)[0], Cloud Hypervisor uses a virtual Cargo manifest and
the main package was moved to the `./cloud-hypervisor` subdirectory. This change
is effective since v50. Since we updated our Cloud Hypervisor to v50 [1], we
need to adjust that here.

[0] cloud-hypervisor/cloud-hypervisor#7525
[1] cyberus-technology/cloud-hypervisor#60

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
phip1611 added a commit to phip1611/libvirt-tests that referenced this pull request Jan 6, 2026
Since PR #7525 (Dec 2025)[0], Cloud Hypervisor uses a virtual Cargo manifest and
the main package was moved to the `./cloud-hypervisor` subdirectory. This change
is effective since v50, which we just upgraded to [1].

[0] cloud-hypervisor/cloud-hypervisor#7525
[1] cyberus-technology/cloud-hypervisor#60

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
phip1611 pushed a commit to phip1611/libvirt-tests that referenced this pull request Jan 6, 2026
Since PR #7525 (Dec 2025)[0], Cloud Hypervisor uses a virtual Cargo manifest and
the main package was moved to the `./cloud-hypervisor` subdirectory. This change
is effective since v50, which we just upgraded to [1].

[0] cloud-hypervisor/cloud-hypervisor#7525
[1] cyberus-technology/cloud-hypervisor#60

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
phip1611 added a commit to phip1611/libvirt-tests that referenced this pull request Jan 6, 2026
Since PR #7525 (Dec 2025)[0], Cloud Hypervisor uses a virtual Cargo manifest and
the main package was moved to the `./cloud-hypervisor` subdirectory. This change
is effective since v50, which we just upgraded to [1].

[0] cloud-hypervisor/cloud-hypervisor#7525
[1] cyberus-technology/cloud-hypervisor#60

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
phip1611 added a commit to phip1611/libvirt-tests that referenced this pull request Jan 6, 2026
Since PR #7525 (Dec 2025)[0], Cloud Hypervisor uses a virtual Cargo manifest and
the main package was moved to the `./cloud-hypervisor` subdirectory. This change
is effective since v50, which we just upgraded to [1].

[0] cloud-hypervisor/cloud-hypervisor#7525
[1] cyberus-technology/cloud-hypervisor#60

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
@phip1611 phip1611 force-pushed the next-gardenlinux-v50 branch from 1eb4fca to 9d7ac56 Compare January 6, 2026 12:06
@phip1611 phip1611 changed the title [DON'T MERGE] Port gardenlinux Patchset v48.0 -> v50.0 Port gardenlinux Patchset v48.0 -> v50.0 Jan 6, 2026
phip1611 and others added 12 commits January 6, 2026 13:57
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
Remove irrelevant/annoying CI here to accelerate development.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
To check gitlint locally, one can run:

gitlint --commits "HEAD~2..HEAD"

which for example checks the last two commits.

Although this is just our kinda private (but public) fork, people might
cherry-pick commits from us for whatever reason. So we should have
proper commit style.

On-behalf-of: SAP philipp.schuster@sap.com
TL;DR: Fix for long rebuilds locally when testing things.

The release profile is optimized for maximum performance,
sacrificing build speed. As local development and testing requires
frequent rebuilds, but the dev profile is way too slow for
"real testing", this profile is a sweet spot and helps to
investigate things.

Instead of `cargo run --release`, one can now run
`cargo run --profile optimized-dev`.

# Measurements

Measurements were done using
`$ [cargo clean;] time cargo build --profile release|optimized-dev` and
rustc 1.89. I've used the `time`-builtin from zsh.

Note that user time is much higher as we have more threads
(codegen units) now. The total time is much shorter, tho.

## Clean Build

Speedup of 56%.

- `$ time cargo clean --release`:
  `109,67s user 13,64s system 211% cpu 58,343 total`
- `$ time cargo clean --profile optimized-dev`:
  `185,41s user 14,92s system 528% cpu 37,876 total`

## Incremental Build

Speedup of 153%.

- `$ time cargo clean --release`:
  `37,58s user 1,53s system 117% cpu 33,356 total`
- `$ time cargo clean --profile optimized-dev`:
  `47,62s user 1,71s system 373% cpu 13,220 total`

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
With debug symbols, we will get better backtraces and can
improve our experience debugging. The only downside is larger
binary size which is negligible in our case. There are no
implications for the performance.

Stripped:   3.9M
Unstripped: 4.7M

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
Current (squashed) state of:
 https://github.com/cloud-hypervisor/cloud-hypervisor/pull/7033/commits

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
---

vm-migration: Add support for downtime limits

Add handling of migration timeout failures to provide more flexible
live migration options. Implement downtime limiting logic to minimize
service disruptions. Support for setting downtime thresholds and
migration timeouts.

Signed-off-by: Jinrong Liang <cloudliang@tencent.com>
Signed-off-by: Songqian Li <sionli@tencent.com>

docs: Add migration parameters to live migration document

Updated live migration documentation to include migration timeout
controls and downtime limits.

Signed-off-by: Jinrong Liang <cloudliang@tencent.com>
Signed-off-by: Songqian Li <sionli@tencent.com>

tests: Add downtime and migration timeout tests

Signed-off-by: Jinrong Liang <cloudliang@tencent.com>
Signed-off-by: Songqian Li <sionli@tencent.com>
This allows to attach FDs provided by the management layer to virtio-net
devices on the live-migration receiver side.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
Also see [0] for more info.

[0] https://docs.kernel.org/virt/kvm/api.html#the-kvm-run-structure

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
No need to grab the lock multiple times in this short period
of time. The lock is anyway held for the duration of the long
operation (KVM_RUN).

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
These are the prerequisites for the upcoming (quick and dirty)
solution to the problem that we might miss some events.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
A common scenario for a VMM to regain control over the vCPU thread from
the hypervisor is to interrupt the vCPU. A use-case might be the `pause`
API call of CHV.

VMMs using KVM as hypervisor must use signals for this interception,
i.e., a thread sends a signal to the vCPU thread. Sending and handling
these signals is inherently racy because the signal sender does not know
if the receiving thread is currently in the RUN_VCPU [0] call, or
executing userspace VMM code.

If we are in kernel space in KVM_RUN, things are easy as KVM just exits
with -EINTR. For user-space this is more complicated. For example, it
might happen that we receive a signal but the vCPU thread was about to
go into the KVM_RUN system call as next instruction. There is no more
opportunity to check for any pending signal flag or similar.

KVM offers the `immediate_exit` flag [1] as part of the KVM_RUN
structure for that. The signal handler of a vCPU is supposed to set this
flag, to ensure that we do not miss any events. If the flag is set,
KVM_RUN will exit immediately [2].

We will miss signals to the vCPU if the vCPU thread is in userspace VMM
code and we do not use the `immediate_exit` flag.

We must have access to the KVM_RUN data structure when the signal
handler executes in a vCPU thread's context and set the
`immediate_exit` [1] flag. This way, the next invocation of KVM_RUN
exits immediately and the userspace VMM code can do the normal event
handling.

We must not use any shared locks between the normal vCPU thread VMM
code and the signal handler, as otherwise we might end up in deadlocks.

The signal handler therefore needs its dedicated mutable version of
KVM_RUN.

This commit introduces a (very hacky but good enough for a PoC) solution
to this problem.

[0] https://docs.kernel.org/virt/kvm/api.html#kvm-run
[1] https://docs.kernel.org/virt/kvm/api.html#the-kvm-run-structure
[2] https://elixir.bootlin.com/linux/v6.12/source/arch/x86/kvm/x86.c#L11566

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
amphi and others added 19 commits January 6, 2026 13:57
When using multiple tcp connections during live migration, the main
thread spawns multiple worker threads to send data. When one of those
workers encountered an error, the VMM would panic. With these changes
worker threads will report errors to the main thread which can then stop
the live migration without panicking.

On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
This is a pre-requisite for the following commit which puts the
migration into a dedicated thread. It allows the VMM to react to
migration events (success/failure).

The commit series was inspired by @ljcore [0] but was changed quite
significantly.

[0] cloud-hypervisor#7038

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
This puts the send-migration action into a dedicated thread. This
means:

1. The send-migration call will exit sooner (just trigger the
   migration)
2. Other API Call will not be possible as the VM's ownership is
   transferred from the VMM to the migration thread. E.g., hotplugging
   won't work (which is good).
3. If the migration causes the VMM process to crash, this currently
   can't be observed. A mechanism to query the migration status doesn't
   exist.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
The commit prepares to properly handle API events during ongoing
live-migrations. The VmInfo call is currently not working when a VM is
migrating. This will be addressed in a follow-up as part of statistics
migration statistics about ongoing live-migrations.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
Once we have a mechanism to query the progress of an ongoing
live-migration, we can remove this workaround.

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
Allocating a device ID is crucial for assigning a specific ID to a
device. We need this to implement configurable PCI BDF.

Signed-off-by: Pascal Scholz <pascal.scholz@cyberus-technology.de>
On-behalf-of: SAP pascal.scholz@sap.com
Next to tests for `allocate_device_bdf`, we introduce a new constructor
`new_without_address_manager`, only available in the test build. As
there is no way to instantiate an `AddressManager` in the tests, we use
this constructor to work around this.

Signed-off-by: Pascal Scholz <pascal.scholz@cyberus-technology.de>
On-behalf-of: SAP pascal.scholz@sap.com
Updates all config structs in order to make the new config option
available to all PCI device. Additionally, update the parser so the new
option becomes available on the CLI.

Signed-off-by: Pascal Scholz <pascal.scholz@cyberus-technology.de>
On-behalf-of: SAP pascal.scholz@sap.com
Signed-off-by: Pascal Scholz <pascal.scholz@cyberus-technology.de>
On-behalf-of: SAP pascal.scholz@sap.com
Signed-off-by: Pascal Scholz <pascal.scholz@cyberus-technology.de>
On-behalf-of: SAP pascal.scholz@sap.com
We use `VecDeque` to sort devices implicitly. Devices whose config
contains a fixed BDF are added to the front, while those without a BDF
given are added to the back. Processing the `VecDeque` sequentially
from first to last then ensures that no clashes occur when assigning
BDFs to devices. Otherwise, we could end up in the case that we assigned
a BDF required by one device's config to one without a BDF.

Signed-off-by: Pascal Scholz <pascal.scholz@cyberus-technology.de>
On-behalf-of: SAP pascal.scholz@sap.com
TLS connections have a TLS server (the endpoint that listens for a
connection) and a TLS client (the endpoint that initiates the
connection). This commit adds the code for the client side, which will
be the source host.

On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
This is the TLS server side, which will be the live migration target.

On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
Also it seems like AsRawFd should be avoided
https://rust-lang.github.io/rfcs/3128-io-safety.html

On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
This allows (more or less) transparent usage of TLS encrypted TCP
connections.

On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
For TLS we need certificates (and a key for the TLS server). This
commits adds parameters for that and encrypts the connection with TLS if
the necessary parameters are provided.

Co-authored-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
The ReadVolatile and WriteVolatile implementations of TlsStream were
very slow, mainly because they allocated a large buffer on each
invocation. The TlsStreamWrapper carries a buffer that it uses for
ReadVolatile and WriteVolatile and that is allocated once on creation.

On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
Removes all the unwraps from the TLS code to make sure the VMM doesn't
panic.

On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
This is a pre-requisite to allow multiple connections simultaneously,
such as:
- start migration (blocking)
- query migration stats

On-behalf-of: SAP sebastian.eydam@sap.com
Signed-off-by: Sebastian Eydam <sebastian.eydam@cyberus-technology.de>
@phip1611 phip1611 force-pushed the next-gardenlinux-v50 branch from 9d7ac56 to 481edf2 Compare January 6, 2026 12:57
phip1611 added a commit to phip1611/libvirt-tests that referenced this pull request Jan 6, 2026
Since PR #7525 (Dec 2025)[0], Cloud Hypervisor uses a virtual Cargo manifest and
the main package was moved to the `./cloud-hypervisor` subdirectory. This change
is effective since v50, which we just upgraded to [1].

[0] cloud-hypervisor/cloud-hypervisor#7525
[1] cyberus-technology/cloud-hypervisor#60

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
@tpressure
Copy link

I have tested this with https://gitlab.cyberus-technology.de/cyberus/cloud/libvirt/-/merge_requests/58 and it passes our test pipeline

@hertrste hertrste merged commit daaea60 into cyberus-technology:next-gardenlinux-v50-base Jan 6, 2026
10 checks passed
phip1611 added a commit to phip1611/libvirt-tests that referenced this pull request Jan 7, 2026
Since PR #7525 (Dec 2025)[0], Cloud Hypervisor uses a virtual Cargo manifest and
the main package was moved to the `./cloud-hypervisor` subdirectory. This change
is effective since v50, which we just upgraded our Cloud Hypervisor patchset
to [1].

[0] cloud-hypervisor/cloud-hypervisor#7525
[1] cyberus-technology/cloud-hypervisor#60

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
phip1611 added a commit to phip1611/libvirt-tests that referenced this pull request Jan 7, 2026
Since PR #7525 (Dec 2025)[0], Cloud Hypervisor uses a virtual Cargo manifest and
the main package was moved to the `./cloud-hypervisor` subdirectory. This change
is effective since v50, which we just upgraded our Cloud Hypervisor patchset
to [1].

[0] cloud-hypervisor/cloud-hypervisor#7525
[1] cyberus-technology/cloud-hypervisor#60

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
github-merge-queue bot pushed a commit to cyberus-technology/libvirt-tests that referenced this pull request Jan 7, 2026
Since PR #7525 (Dec 2025)[0], Cloud Hypervisor uses a virtual Cargo manifest and
the main package was moved to the `./cloud-hypervisor` subdirectory. This change
is effective since v50, which we just upgraded our Cloud Hypervisor patchset
to [1].

[0] cloud-hypervisor/cloud-hypervisor#7525
[1] cyberus-technology/cloud-hypervisor#60

Signed-off-by: Philipp Schuster <philipp.schuster@cyberus-technology.de>
On-behalf-of: SAP philipp.schuster@sap.com
@phip1611 phip1611 deleted the next-gardenlinux-v50 branch January 8, 2026 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants