Skip to content

Conversation

@rgarcia
Copy link
Contributor

@rgarcia rgarcia commented Dec 24, 2025

Summary

This PR adds support for running systemd-based OCI images, enabling a VM experience where systemd is PID 1 and manages the full system.

Motivation

Previously, hypeman only supported "exec mode" where the Go init binary runs as PID 1 and executes the container entrypoint directly. This works great for Docker-style single-process containers, but doesn't support images designed to run a full init system.

With this change, you can now run images like jrei/systemd-ubuntu:22.04 and get a full Linux system experience:

  • systemctl works
  • journalctl works
  • SSH and other system services can run
  • The VM feels like an EC2 instance

How it works

Auto-detection

The mode is auto-detected from the image's CMD:

  • If CMD is /sbin/init, /lib/systemd/systemd, or similar → systemd mode
  • Otherwise → exec mode (existing behavior)

Boot Flow

Kernel → init.sh (mount /proc /sys /dev) → Go init binary
                                              ↓
                                    ┌─────────┴─────────┐
                                    ↓                   ↓
                              Exec Mode           Systemd Mode
                                    ↓                   ↓
                            Run entrypoint      chroot + exec /sbin/init
                            as child process    (systemd becomes PID 1)

Guest Agent

In systemd mode, the guest-agent is installed as a systemd service (hypeman-agent.service) that starts automatically. This enables hypeman exec, hypeman cp, and other remote operations.

Key Changes

File Description
lib/system/init/*.go New Go-based init binary with modular boot phases
lib/system/init/init.sh Shell wrapper (Go runtime needs /proc /sys /dev before main())
lib/images/systemd.go IsSystemdImage() auto-detection from CMD
lib/instances/configdisk.go Passes INIT_MODE to guest via config disk
integration/systemd_test.go Full E2E test

Testing

# Run the systemd integration test
make test TEST=TestSystemdMode

The test:

  1. Pulls jrei/systemd-ubuntu:22.04
  2. Verifies IsSystemdImage() detects it correctly
  3. Boots a VM
  4. Waits for guest-agent to be ready
  5. Verifies:
    • PID 1 is systemd
    • /opt/hypeman/guest-agent exists
    • hypeman-agent.service is active
    • journalctl -u hypeman-agent works

Demo

# Run a systemd VM
hypeman run --name demo jrei/systemd-ubuntu:22.04

# Wait a few seconds for boot, then:
hypeman exec demo cat /proc/1/comm
# → systemd

hypeman exec demo systemctl status hypeman-agent
# → active (running)

hypeman exec demo journalctl -u hypeman-agent --no-pager -n 5
# → shows agent logs

Note

Enables full systemd VMs while preserving existing exec mode, and replaces the shell init with a modular Go init embedded in initrd.

  • New lib/system/init/* Go init (mounts, config, network, GPU, volumes) with mode_exec and mode_systemd (injects hypeman-agent.service, chroot, exec /sbin/init)
  • Auto-detect systemd images via images.IsSystemdImage(); configdisk.go writes INIT_MODE accordingly
  • Initrd build revamped to embed init.bin and init.sh wrapper; updates staleness hash; Makefile builds and embeds guest-agent and init
  • Exec API: adds wait_for_agent (seconds) in ExecRequest; guest client adds retryable vsock dial and WaitForAgent option
  • Tests: new integration/systemd_test.go, unit tests for detection, updated exec tests/log assertions; docs updated in lib/system/README.md

Written by Cursor Bugbot for commit d79a403. This will update automatically on new commits. Configure here.

Replace shell-based init script with Go binary that supports two modes:

## Exec Mode (existing behavior)
- Go init runs as PID 1
- Starts guest-agent in background
- Runs container entrypoint as child process
- Used for standard Docker images (nginx, python, etc.)

## Systemd Mode (new)
- Auto-detected when image CMD is /sbin/init or /lib/systemd/systemd
- Go init sets up rootfs, then chroots and execs systemd
- Systemd becomes PID 1 and manages the full system
- guest-agent runs as a systemd service (hypeman-agent.service)
- Enables EC2-like experience: ssh, systemctl, journalctl all work

## Key changes:
- lib/system/init/: New Go-based init binary with modular boot phases
- lib/images/systemd.go: IsSystemdImage() auto-detection from CMD
- lib/instances/configdisk.go: Passes INIT_MODE to guest
- lib/system/init/init.sh: Shell wrapper to mount /proc /sys /dev
  before Go runtime (Go requires these during initialization)
- integration/systemd_test.go: Full E2E test verifying:
  - systemd is PID 1
  - hypeman-agent.service is active
  - journalctl works for viewing logs

## Boot flow:
1. Kernel loads initrd with busybox + Go init + guest-agent
2. init.sh mounts /proc, /sys, /dev (Go runtime needs these)
3. init.sh execs Go init binary
4. Go init mounts overlay rootfs, configures network, copies agent
5. Based on INIT_MODE: exec mode (run entrypoint) or systemd mode (chroot + exec /sbin/init)
@rgarcia rgarcia changed the title feat: add systemd mode for EC2-like VMs feat: add systemd mode for full VM experience Dec 24, 2025
// - Any path ending in /init
func IsSystemdImage(entrypoint, cmd []string) bool {
// Combine to get the actual command that will run
effective := append(entrypoint, cmd...)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slice append may corrupt caller's entrypoint data

The IsSystemdImage function uses append(entrypoint, cmd...) to combine slices, which is a classic Go pitfall. If the entrypoint slice has spare capacity in its backing array, append will write cmd elements directly into that capacity instead of allocating a new array. This can corrupt memory that belongs to the caller's slice. The function is called with imageInfo.Entrypoint and imageInfo.Cmd which are then reused for metadata generation. The safe pattern is append([]string(nil), entrypoint...) followed by appending cmd, which always creates a new backing array.

Fix in Cursor Fix in Web

rgarcia added a commit to onkernel/hypeman-cli that referenced this pull request Dec 24, 2025
By default, waits up to 30 seconds for the guest agent to become ready.
This prevents immediate failures when the VM is still booting.

Use --wait-for-agent=0 to fail immediately (old behavior).

Depends on: onkernel/hypeman#50
@@ -0,0 +1,418 @@
---
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove before merge

}

// dropToShell drops to an interactive shell for debugging when boot fails
func dropToShell() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this is actually useful... opus tooks some liberties




``` No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plan document should be removed before merge

This planning document was explicitly marked for removal in the PR discussion with "@rgarcia: 'will remove before merge'". The .cursor/plans/ directory containing development planning notes is being committed to the repository but isn't in .gitignore. This file contains internal development documentation and TODO tracking that shouldn't be part of the production codebase.

Fix in Cursor Fix in Web

@rgarcia rgarcia requested a review from sjmiller609 December 24, 2025 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants