Skip to content

feat: add memory ballooning support#1688

Merged
kroese merged 4 commits intodockur:masterfrom
lukakama:memory-ballooning
May 6, 2026
Merged

feat: add memory ballooning support#1688
kroese merged 4 commits intodockur:masterfrom
lukakama:memory-ballooning

Conversation

@lukakama
Copy link
Copy Markdown
Contributor

@lukakama lukakama commented Apr 22, 2026

Summary

This PR enable the opt-in dynamic memory ballooning support, provided by the base QEMU image PR qemus/qemu#1012, for Windows VMs.

Currently tested on Windows 11 guests.

Closes #751

Changes

Startup script — Added the balloning.sh script execution from the base image for ballooning initialization.

Guest driver installation — all Windows unattended installations now include a setup step that installs the VirtIO Balloon service (blnsvr.exe -i).

@kroese
Copy link
Copy Markdown
Contributor

kroese commented Apr 23, 2026

Great work! I will look into it soon

Copy link
Copy Markdown

@dragetd dragetd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked over it (not too in detail) and it looks reasonable. Cool feature. :)

Comment thread src/memory-ballooning.py Outdated
"""Convert QMP uint64 value to signed int64 (QMP returns 18446744073709551615 for -1)."""
return ctypes.c_int64(val).value

def get_host_ram_info() -> Optional[tuple[int, int, int]]:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is used to check if there is free memory on the host.

meminfo always returns the total physical memory and ignores cgroup limits, or? So if I limit the RAM on the container to something significantly less than my host memory, the balloon will never reduce the RAM usage, or?

One would have to check if cgroups are configured and read somewhere from /sys/fs/cgroup the configured RAM limit. Might be fiddly - the alternative could be to at least document this.

Something like:
"If the container has a Docker or Kubernetes memory limit, ballooning still uses host memory pressure and may not react before the container reaches its own limit."

Copy link
Copy Markdown
Contributor Author

@lukakama lukakama Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, container memory limits are not accounted for in /proc/meminfo, which always reports host-wide memory stats. The same applies to PSI stats.

I initially wrote the implementation considering just static container limits (which could be dealt with by tuning the RAM_SIZE environment variable at boot), but I agree that ballooning becomes much more useful if it can react to dynamic container limits updated at runtime.

I also think that building a full dual-handling system for both host and container memory pressures could get very fiddly. Maybe a good trade-off could be to just cap the target max guest memory size considering container memory overhead and limits from cgroupfs with some margin (128MB?), something like:

target_max = min(target_max, container_max_mem - (container_used_mem - container_cache_mem - guest_total_memory_usage) - container_margin)

This should accounts for QEMU and controller footprints, and it should reliably prevents OOM kills if the container limits are shrunk.

I will also add a watcher on cgroupfs files to trigger a main loop on changes, as the polling interval could be too slow to prevent OOMs. As you pointed out, I'll definitely add a note to the documentation mentioning that if the container limit is suddenly shrunk below current usage, the kernel OOM killer will likely hit before the ballooning has time to reclaim the memory from the guest.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed an updated implementation with container cap handling.

To calculate a reliable container overhead, real-time RSS value of QEMU memory regions related to guest RAM has been used.

I also improved the integral error clamping, to improve reaction times on host memory pressure changes when sitting at boundaries.

@kroese
Copy link
Copy Markdown
Contributor

kroese commented May 4, 2026

I still didnt find the time to look at this in-depth, sorry. But I feel that a lot of the code in this pull should ideally go to https://github.com/qemus/qemu instead.

This Windows container is just a very thin layer on top of that base image, and if the ballooning would be implemented in the base image, it would also work for all other images on top of that base (like the macOS container).

So ideally only the asset .xml files (that execute the balloon server during boot) should be in this pull.

@lukakama lukakama force-pushed the memory-ballooning branch from 5fd3b9d to 958b508 Compare May 5, 2026 16:43
@lukakama
Copy link
Copy Markdown
Contributor Author

lukakama commented May 5, 2026

I moved the ballooning controller (and related changes) to a new PR qemus/qemu#1012.

This PR has been refactored and rebased to just enable ballooning support provided by https://github.com/qemus/qemu for Windows guests.

It just miss the updated qemux/qemu image version.

kroese added 2 commits May 6, 2026 07:51
Removed section on enabling dynamic memory allocation and related variables.
@kroese
Copy link
Copy Markdown
Contributor

kroese commented May 6, 2026

Great! I merged the assets, so everything will be prepared for a new qemu image. I just undid the readme for now, because its not applicable yet. Thanks for your effort!

@kroese kroese merged commit 0ea6396 into dockur:master May 6, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Dynamic RAM allocation for Windows VM

3 participants