Skip to content

refactor(build): unify image build graph for cache reuse#390

Merged
drew merged 15 commits intomainfrom
simplify-image-build-cache-an
Mar 18, 2026
Merged

refactor(build): unify image build graph for cache reuse#390
drew merged 15 commits intomainfrom
simplify-image-build-cache-an

Conversation

@drew
Copy link
Collaborator

@drew drew commented Mar 17, 2026

Summary

Simplify the two-image container build pipeline around one shared Docker build graph so gateway and cluster builds reuse the same Rust dependency cache instead of recompiling overlapping workspace state separately.

Related Issue

N/A

Changes

  • replace the split gateway and cluster Dockerfiles with a shared deploy/docker/Dockerfile.images build graph
  • add a shared docker-build-image.sh helper and route local, publish, and fast-deploy flows through it
  • make supervisor fast deploy build from the shared supervisor-builder target so it reuses the same cache path as full image builds
  • update architecture docs and the cluster debug skill to reflect the new shared image build layout

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Additional validation:

  • bash -n tasks/scripts/docker-build-image.sh tasks/scripts/docker-build-component.sh tasks/scripts/docker-build-cluster.sh tasks/scripts/docker-publish-multiarch.sh tasks/scripts/cluster-deploy-fast.sh tasks/scripts/cluster-bootstrap.sh tasks/scripts/cluster-push-component.sh
  • RUSTC_WRAPPER= cargo check --workspace
  • docker buildx build --check -f deploy/docker/Dockerfile.images --target gateway .
  • docker buildx build --check -f deploy/docker/Dockerfile.images --target cluster .
  • docker buildx build --check -f deploy/docker/Dockerfile.images --target supervisor-builder .
  • mise run pre-commit (fails on existing workspace lint warnings outside this change)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

@drew drew self-assigned this Mar 17, 2026
@drew drew force-pushed the simplify-image-build-cache-an branch from 707e362 to 4a2920b Compare March 18, 2026 00:58
@drew drew requested a review from a team as a code owner March 18, 2026 00:58
drew added 9 commits March 18, 2026 14:12
Signed-off-by: Drew Newberry <anewberry@nvidia.com>
Signed-off-by: Drew Newberry <anewberry@nvidia.com>
- Add scratch export stage (supervisor-output) reducing export from
  968 MB / 10s to 14 MB / 0.1s
- Unify builder drivers: supervisor now uses desktop-linux driver when
  not cross-compiling, sharing BuildKit cache with gateway builds
- Split monolithic COPY crates/ into per-target workspace stages so
  sandbox changes don't invalidate gateway builds and vice versa
- Remove LTO from default release profile for fast local linking;
  CI restores codegen-units=1 via CARGO_CODEGEN_UNITS build arg
- Set workspace version to 0.0.0 so stub builds in deps stage are
  clearly distinguishable from real versioned builds
Explain how to read each column and what to look for when
reviewing fast-deploy benchmark results.
The supervisor committed-tree fingerprint was missing
deploy/docker/Dockerfile.images, and neither gateway nor supervisor
included tasks/scripts/docker-build-image.sh. Changes to these files
(e.g. from rebasing main) would not trigger a rebuild.

Align the git ls-tree paths with the matches_* functions so committed
and uncommitted changes are detected consistently.
@drew drew force-pushed the simplify-image-build-cache-an branch from a687416 to 7784b50 Compare March 18, 2026 21:12
drew added 4 commits March 18, 2026 14:15
…erfiles

This ARG was declared early but never referenced, causing every layer
below it to be cache-busted whenever the image tag changed. Removing
it lets dependency-install and toolchain layers stay cached across
tag changes.
- Remove docker-build-cluster.sh: helm packaging is now inlined into
  docker-build-image.sh when the target is 'cluster'
- Remove docker-build-component.sh: the gateway case was a passthrough
  to docker-build-image.sh; the CI case is now docker-build-ci.sh
- Simplify docker-publish-multiarch.sh: remove --mode flag since only
  'registry' mode remains after ECR removal
- Remove dead docker:publish:cluster:multiarch ECR task from docker.toml
- Update all callers (cluster-deploy-fast, cluster-bootstrap,
  cluster-push-component, remote-deploy, publish.toml)

The build entry point is now docker-build-image.sh for all Rust targets
(gateway, cluster, supervisor-builder, supervisor-output) and
docker-build-ci.sh for the CI image.
@drew drew added the test:e2e Requires end-to-end coverage label Mar 18, 2026
@drew drew merged commit a912848 into main Mar 18, 2026
9 checks passed
@drew drew deleted the simplify-image-build-cache-an branch March 18, 2026 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:e2e Requires end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants