ci: cache docker layers in GitHub Actions cache across runs#2563
Conversation
8f2921d to
d687084
Compare
Stacked on the npm-cache-mount PR. This PR exports the docker layer cache (including the npm cache mount populated by the parent commit) to GitHub Actions cache, so a fresh runner starts with a warm cache instead of an empty one. ## How 1. `docker-compose.test.yml` — explicit `image:` field on each buildable service, matching what compose's auto-naming produces (`<project>-<service>:latest`). This is required so that `docker buildx bake` (used by bake-action below) produces tags addressable by the subsequent `docker compose up -d`. 2. All three workflows that build docker images — `docker compose build` swapped for `docker/bake-action@v6` with `cache-from=type=gha` and `cache-to=type=gha,mode=max`. cypress keeps `postgres.no-cache=true` to preserve the existing stale-migration-file safeguard. ## Why not the simpler COMPOSE_BAKE=true + x-bake approach `COMPOSE_BAKE=true` does delegate `docker compose build` to bake, but the compose-to-bake serialization layer **silently drops `x-bake:` fields entirely**. Confirmed locally with `docker compose ... build --print`: the bake JSON payload has no `cache-from`/`cache-to`. The build succeeds but no cache is read or written — verified on a real CI run that produced cold-baseline timings on both a cache-populate and a supposed-cache-hit batch. `docker/bake-action` bypasses that serialization. It also needs the `image:` fields above, because going straight to bake (rather than through compose) means bake doesn't know the compose project name and would tag the images dangling. ## Verified locally - `docker buildx bake -f docker-compose.test.yml --print` shows correct tags (`polis-test-<service>:latest`) for every buildable service. - Second build after a fresh `docker rmi` reports `CACHED` for every Dockerfile RUN step and `importing cache manifest` — cache import + load both work end-to-end.
d687084 to
8b9fc18
Compare
Delphi Coverage Report
|
|
Closing — GitHub Actions cache backend isn't a fit for our docker image sizes. What I learned across the iterations
Numbers (final iteration, with
|
| Workflow | Baseline | Cache write | Cache read |
|---|---|---|---|
| E2E | ~16m | 14m 09s* | 21m 27s |
| Server | ~4m 10s | 5m 34s | 23m 04s |
| Delphi | ~7m 55s | 12m 43s | 14m 33s |
* The E2E "win" on cache write was incidental — bake collapsed the workflow's two sequential docker compose build calls (the --no-cache postgres one + the full build) into one parallel pass. That accounts for the +1m 50s and is unrelated to caching. Worth doing as a small follow-up PR.
What would actually win
- A registry-based cache backend (requires a registry — out of scope here).
- Self-hosted runners with persistent local disk.
- Aggressive image size reduction (separate project).
mode=min was considered. It would cache only the final image per target — cheaper export, but no partial-hit benefit. For our workload (many independent services; PRs typically touch one) partial hits are what we'd most want, so mode=min is also a loser.
What's preserved
The within-build npm cache mount in #2562 still provides robustness during npm fetch retries (the original ECONNRESET fix doesn't depend on this PR). Closing this PR doesn't lose that.
Summary
Stacked on #2562. Together they fix the ECONNRESET flake and cut docker-build time on cache hits.
RUN --mount=type=cache,target=/root/.npmon every npm Dockerfile (the cache infrastructure).How
1.
docker-compose.test.yml— each buildable service gets an explicitimage: ${COMPOSE_PROJECT_NAME:-polis-test}-<service>:latest. This matches what compose's auto-naming was already producing, but makes the tags addressable bydocker buildx bake(used below).2. Three workflows (
cypress-tests.yml,jest-server-test.yml,python-ci.yml) — thedocker compose buildstep is replaced withdocker/bake-action@v6. Cache config is passed viaset::cypress-tests.ymlalso keepspostgres.no-cache=true(Docker layer caching has been known to retain stale migration files).The subsequent
docker compose up -dfinds the bake-built images because theimage:tags match what compose expects.Why not
COMPOSE_BAKE=true+x-bake:(the prior attempt)COMPOSE_BAKE=truedoes delegatedocker compose buildto bake, but the compose-to-bake serialization layer silently dropsx-bake:fields entirely. Confirmed bydocker compose ... build --print: the bake JSON payload has nocache-from/cache-towhatsoever. The build succeeds but no cache is read or written. A real CI run with that approach produced cold-baseline timings on both a supposed cache-populate batch and a supposed cache-hit batch — the "speedup" was zero because the cache was never engaged.docker/bake-actionbypasses that serialization and applies--setdirectly, so cache config arrives intact.No-op for local devs
Adding
image:to compose services is the only externally visible change for non-CI users. It doesn't change behavior — it just locks in the tag that compose was already auto-generating, so it's equivalent for everyone runningdocker compose ...locally.Expected impact
(Build step compresses to ~30-60s on cache hit, per handoff doc estimate. First run after merge writes the cache; subsequent runs benefit.)
Verified locally
docker buildx bake -f docker-compose.test.yml --printshows correct tags (polis-test-<service>:latest) for every buildable service, including the delphi ECR tag for delphi.docker rmi, a seconddocker buildx bakeagainst a local cache backend reports#10-#16 CACHEDfor every Dockerfile RUN step andimporting cache manifest from local:...— cache import + load both work end-to-end.CI verification (cache-populate + cache-hit batches) is in progress on this PR. Description will be edited with the numbers once both batches complete.
Stacked PR note
Base is
jc/ci-econnreset-fix(the #2562 branch). When #2562 merges, GitHub auto-retargets this PR toedge. With jj, edits to the base commit propagate viajj rebase/jj squash.