Skip to content

Refactor image build, create multi-arch images, drop Builder usage#347

Open
sairon wants to merge 11 commits intomasterfrom
use-no-builder
Open

Refactor image build, create multi-arch images, drop Builder usage#347
sairon wants to merge 11 commits intomasterfrom
use-no-builder

Conversation

@sairon
Copy link
Member

@sairon sairon commented Mar 3, 2026

This PR fundamentally changes how our images are built. The usage of the Builder container is dropped in favor of "native" build using BuildKit with docker/build-push-action.

Dockerfiles are now the single source of truth for all labels and build arguments - the build metadata (version, date, architecture, repository) is passed via --build-arg and consumed directly in the Dockerfile's LABEL instruction, removing the need for external label injection.

Build caching uses GitHub Actions cache as the primary backend, with inline cache metadata embedded in pushed images as a fallback for cache reuse across git refs (since GHA cache is scoped per branch/tag). Registry images are verified with cosign before being used as cache sources.

Images are compressed with zstd (level 9) instead of gzip, reducing image size and improving pull times on registries and runtimes that support it.

Multi-arch support is handled by building per-architecture images in parallel on native runners (amd64 on ubuntu-24.04, aarch64 on ubuntu-24.04-arm), then combining them into a single manifest list using docker buildx imagetools.

The reusable builder workflow (.github/workflows/reuseable-builder.yml) and the build-image composite action (.github/actions/build-image/) are designed to be generic enough to be extracted to the original home-assistant/builder repo, replacing the current docker-in-docker approach with a simpler, more cacheable workflow.

Thanks to the caching, the builder workflow now also runs on push to the master branch, keeping the GHA cache warm for release builds without adding significant CI cost.

sairon added 2 commits March 3, 2026 18:52
This PR fundamentally changes how our images are built. The usage of the
Builder container is dropped in favor of "native" build using BuildKit with
docker/build-push-action.

Dockerfiles are now the single source of truth for all labels and build
arguments - the build metadata (version, date, architecture, repository) is
passed via --build-arg and consumed directly in the Dockerfile's LABEL
instruction, removing the need for external label injection.

Build caching uses GitHub Actions cache as the primary backend, with inline
cache metadata embedded in pushed images as a fallback for cache reuse across
git refs (since GHA cache is scoped per branch/tag). Registry images are
verified with cosign before being used as cache sources.

Images are compressed with zstd (level 9) instead of gzip, reducing image size
and improving pull times on registries and runtimes that support it.

Multi-arch support is handled by building per-architecture images in parallel
on native runners (amd64 on ubuntu-24.04, aarch64 on ubuntu-24.04-arm), then
combining them into a single manifest list using docker buildx imagetools.

The reusable builder workflow (.github/workflows/reuseable-builder.yml) and the
build-image composite action (.github/actions/build-image/) are designed to be
generic enough to be extracted to the original home-assistant/builder repo,
replacing the current docker-in-docker approach with a simpler, more cacheable
workflow.

Thanks to the caching, the builder workflow now also runs on push to the master
branch, keeping the GHA cache warm for release builds without adding
significant CI cost.
@sairon
Copy link
Member Author

sairon commented Mar 3, 2026

The build failures for Python are expected - it's a chicken-egg problem. Without having ghcr.io/home-assistant/base in the registry, we can't use as the base image. We had similar problems for PR builds in the past when bumping Alpine versions while updating the base image matrix for Python as well.

The builds were tested in my fork, so I'd say the CI can be ignored here - after merge, the base image should be published before the Python builds and everything should pass.

@sairon sairon requested review from agners, edenhaus and frenck March 3, 2026 18:03
sairon added a commit to home-assistant/builder that referenced this pull request Mar 4, 2026
This PR fundamentally changes how our images are built. The usage of the
Builder container is dropped in favor of "native" build using BuildKit with
docker/build-push-action.

Dockerfiles are now the single source of truth for all labels and build
arguments - the build metadata (version, date, architecture, repository) is
passed via --build-arg and consumed directly in the Dockerfile's LABEL
instruction, removing the need for external label injection.

Build caching uses GitHub Actions cache as the primary backend, with inline
cache metadata embedded in pushed images as a fallback for cache reuse across
git refs (since GHA cache is scoped per branch/tag). Registry images are
verified with cosign before being used as cache sources.

Images are compressed with zstd (level 9) instead of gzip, reducing image size
and improving pull times on registries and runtimes that support it.

Multi-arch support is handled by building per-architecture images in parallel
on native runners (amd64 on ubuntu-24.04, aarch64 on ubuntu-24.04-arm), then
combining them into a single manifest list using docker buildx imagetools.

Thanks to the caching, the builder workflow now also runs on push to the master
branch, keeping the GHA cache warm for release builds without adding
significant CI cost.

A reference implementation is in home-assistant/docker-base#347.
Copy link
Member

@agners agners left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks quite good to me.

I wonder if it will feel easier to follow what exactly is happening. The old build.yaml was kinda nice summary of all parameters. We do have some in the builder workflow, and some in the Dockerfile now. But tradeoffs... We'll see.

@sairon sairon requested a review from agners March 4, 2026 11:32
@sairon
Copy link
Member Author

sairon commented Mar 4, 2026

I wonder if it will feel easier to follow what exactly is happening. The old build.yaml was kinda nice summary of all parameters. We do have some in the builder workflow, and some in the Dockerfile now. But tradeoffs... We'll see.

The builder workflow now essentially supplies only the build date, version and source repository, which were (or should have been) dynamically generated anyway. For example for the Python images, builder injects these args:

BUILD_VERSION=2026.03.31
BUILD_ARCH=amd64
BUILD_DATE=2026-03-03 17:13:27+00:00
BUILD_REPOSITORY=https://github.com/sairon/ha-docker-base
BASE_IMAGE=ghcr.io/sairon/base
BASE_VERSION=3.21

The BASE_* args are special here because of the matrix, most images will have a single static BUILD_FROM.

Where we make a little trade-off is the dependencies versions, which were nicely on a single place before, but nothing what git blame/git log couldn't help with.

@sairon
Copy link
Member Author

sairon commented Mar 4, 2026

FTR, home-assistant/builder#273 needs to be merged first and references to the gha-builder branch updated here.

sairon added 3 commits March 5, 2026 12:02
Because the Cosign subject is derived from the running workflow, we need to run
the action using Cosign in a local workflow instead of calling reusable
workflow from another repo.
@@ -0,0 +1,173 @@
name: Reusable workflow for single multi-arch image build
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"image" is singular so I think single is only confusing

Suggested change
name: Reusable workflow for single multi-arch image build
name: Reusable workflow to build a multi-arch image


RUN \
set -x \
&& if [ -z "${TARGETARCH}" ]; then \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only needed for non-buildx builds right?

It seems that buildx is the default builder since Docker 23, so a while back already. And for our users we require that version for the zstd support as well. So I'd suggest to simply reject if buildx isn't used (exit 1 in this if clause, maybe add a hint that buildx is required to have TARGETARCH).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants