fix(docker): bake DeepFace/Facenet weights + self-healing cache volume#104
Merged
ahmetabdullahgultekin merged 1 commit intoMay 28, 2026
Merged
Conversation
Closes the 4th recurrence of feedback_readonly_rootfs_cache_dirs (DeepFace + Numba + UniFace, now MiniFASNet). With read_only:true rootfs and the cache named volume owned by root:root, DeepFace running as uid 100 silently failed to download MiniFASNet weights on first inference, collapsing the anti-spoof verdict to a false-positive. Today's hot-fix manually docker-cp'd the .pth files into the live volume; that fix was load-bearing on operator memory and would have vanished on the next `docker volume rm`. Defense in depth, two layers: 1. Image bake-in. New `model-fetcher` build stage downloads the four critical weight files with SHA256 verification: - facenet512_weights.h5 3f76b51... - centerface.onnx 77e394b... - 2.7_80x80_MiniFASNetV2.pth a5eb02e... - 4_0_0_80x80_MiniFASNetV1SE.pth 84ee1d3... All four match upstream (serengil/deepface_models, Star-Clouds/CenterFace, minivision-ai/Silent-Face-Anti-Spoofing) and the running container's live SHAs. COPY'd into the runtime stage at /opt/baked-models/.deepface with --chown=100:101. 2. Entrypoint shim (deploy/entrypoint.sh). Runs as root, chowns any externally-mounted /tmp/.deepface cache volume to 100:101, seeds missing weight files from the baked /opt/baked-models layer (so a wiped named volume self-heals on next boot), then drops to uid 100 via gosu before exec'ing the CMD. Idempotent + best-effort. Pins the app user UID/GID to 100/101 explicitly so host-side chown matches across rebuilds (the previous --system numbering was implicit and drifted). Companion changes: - .env.example documents DEEPFACE_FACENET512_SHA256 (required runtime pin per PR #102 `DEEPFACE_SHA256_REQUIRED=true`) plus the three other SHAs for audit reference. - docker-compose.prod.yml comments document that the `biometric_models` volume is now self-healing and `docker volume rm` is safe (operator no longer has to remember the manual docker-cp dance). Coordinated with parent PR (OPERATOR_ACTIONS_2026-05-12.md item 11) which gives the post-merge cleanup runbook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the 4th recurrence of
feedback_readonly_rootfs_cache_dirs(prior offenders: DeepFace, Numba, UniFace; now MiniFASNet). Today's
hot-fix manually
docker cp'd the two MiniFASNet.pthweights into therunning container's volume; that fix was load-bearing on operator memory
and would have vanished on the next
docker volume rm.This PR shifts the fix to the image layer + entrypoint shim so the volume
becomes self-healing and operator memory is no longer a dependency.
Why the bug exists in the first place
docker-compose.prod.ymlruns the bio container with:read_only: truerootfsbiometric_modelsmounted at/tmp/.deepfaceadduser --system)The named volume is created by Docker owned by
root:root. When DeepFace0.0.98 tries to download
2.7_80x80_MiniFASNetV2.pthon first inference,it cannot write under uid 100 → silently falls back → anti-spoof verdict
collapses to a false-positive. Team A (PR forthcoming) is fixing the
runtime error-path; this PR fixes the build-time + ops layer.
What changed
1.
Dockerfile— newmodel-fetcherbuilder stageDownloads the four critical weights with SHA256 verification, then
COPYs them into the runtime stage at
/opt/baked-models/.deepfacewith--chown=100:101. Build is reproducible because eachcurlis followedby
sha256sum -cagainst an ARG-pinned hash.facenet512_weights.h53f76b5117a9ca574d536af8199e6720089eb4ad3dc7e93534496d88265de864fserengil/deepface_models@v1.0centerface.onnx77e394b51108381b4c4f7b4baf1c64ca9f4aba73e5e803b2636419578913b5feStar-Clouds/CenterFace@master2.7_80x80_MiniFASNetV2.ptha5eb02e1843f19b5386b953cc4c9f011c3f985d0ee2bb9819eea9a142099bec0minivision-ai/Silent-Face-Anti-Spoofing@master4_0_0_80x80_MiniFASNetV1SE.pth84ee1d37d96894d5e82de5a57df044ef80a58be2b218b5ed7cdfd875ec2f5990minivision-ai/Silent-Face-Anti-Spoofing@masterAll four match the running container's live SHAs (captured via
docker exec biometric-api sha256sum ...) AND cross-verify againstupstream — confirmed by
curl | sha256sumfrom the host before openingthis PR.
2.
Dockerfile— pinned uid/gid 100/101 explicitlyThe previous
adduser --systemleft numbering implicit. Now:RUN addgroup --system --gid 101 app \ && adduser --system --ingroup app --uid 100 appSo host-side
chown -R 100:101 /var/lib/docker/volumes/...alwaysmatches the in-container app user across rebuilds.
3.
deploy/entrypoint.sh(new) — self-healing cache shimRuns as root, performs two idempotent best-effort operations, then drops
to uid 100 via
gosu:/tmp/.deepfaceto100:101— so any externally-mountedroot-owned named volume doesn't shadow the baked weights.
/opt/baked-models/.deepface/weights/into the cache dir — so a fresh
docker volume rmrepopulates thefour critical files on the next boot without operator intervention.
Both steps fail-soft (
|| true); the entrypoint never blocks containerstartup.
4.
.env.exampleDocuments the runtime SHA pin required by PR #102
(
DEEPFACE_SHA256_REQUIRED=true):Plus the three other SHAs documented inline for audit reference (DeepFace
0.0.98 has no integrity hook for centerface / MiniFASNet today, so they
are documented not enforced).
5.
docker-compose.prod.ymlInline comment documents the new semantics: the volume is now
self-healing,
docker volume rmis safe, and removing the volume mountentirely is also safe (the image-baked layer would be served directly).
Test plan
--no-cacheand confirm all foursha256sum -cchecks pass during themodel-fetcherstage.docker run --rm <image> ls -la /opt/baked-models/.deepface/weights/returns the four files owned by 100:101.docker compose -f docker-compose.prod.yml --env-file .env.prod down biometric-apidocker volume rm biometric-processor_biometric_modelsdocker compose -f docker-compose.prod.yml --env-file .env.prod up -d biometric-apidocker exec biometric-api stat -c '%u:%g' /tmp/.deepface/.deepface/weights/facenet512_weights.h5returns100:101./verifycall against the testbed completes withoutrecommended_action=blockdue to missing MiniFASNet.DEEPFACE_FACENET512_SHA256set to the value documented above (Team A PR fix(verify): enforce anti-spoof block + EAR + aged-threshold + SHA-pin + verify-challenge (2026-05-12 ML review) #102 enforces this)./api/v1/healthreturns 200 within 60s ofup -d.Operator notes
Coordinated with parent PR (FIVUCSAS /
fix/2026-05-12-bake-mini-fasnet-models) which adds Operator Action item 11 toOPERATOR_ACTIONS_2026-05-12.mdwith the post-merge cleanup runbook. No prod rebuild from this PR — the operator owns deployment.Out of scope (intentionally not in this PR)
Dockerfile.gpu/Dockerfile.optimizedparity (not used in prod today; deferred until those paths are reactivated).Memory references
feedback_readonly_rootfs_cache_dirs(4th sighting)feedback_env_file_docker(PR body commands all use--env-file .env.prod)feedback_git_push(used baregit push -u origin <branch>)🤖 Generated with Claude Code