From bec46e80d70617107d07878607fb523a417c0234 Mon Sep 17 00:00:00 2001 From: Ahmet Abdullah Gultekin Date: Tue, 12 May 2026 17:55:43 +0000 Subject: [PATCH 1/2] fix(verify): enforce anti-spoof block, wire EAR, fix aged-threshold, pin SHA, add verify-challenge MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes 4 P0/P1 findings from the 2026-05-12 ML review: Bug 1 (P0) — Anti-spoof `recommended_action='block'` is advisory AntispoofPipelineAssembler attached `recommended_action='block'` to /verify responses but the route still returned 200/verified=True. Added `ANTISPOOF_BLOCK_ENFORCE=true` (default ON in prod). When any layer votes block (face_usability_block, hybrid_fusion_is_spoof, or recommended_action='block') the route now raises HTTP 403 with `{error_code: ANTISPOOF_BLOCKED, reason: }`. Flip flag false for canary/observation rollout. Tests: tests/integration/test_verify_antispoof_block_enforce.py (8 assertions, 4 for Bug 1). Bug 2 (P0) — Blink-cache / EAR work unreachable from /verify The 2026-05-11 spoof-detector paper-P0 (blink cache + EAR recalibration) lived in `src.infrastructure.analyzers.blink_analyzer` but was never wired into the route. Added `_evaluate_ear_liveness_safe()` that runs MediaPipe FaceLandmarker on the uploaded still frame, computes EAR via the spoof-detector library (EAR_THRESHOLD=0.18), and vetoes on closed eyes. Multi-frame BlinkAnalyzer state (V-shape detection) is explicitly out of scope here — the cache only helps with multi-face/frame video sessions that the current /verify single-still-frame contract doesn't provide. Default OFF (ANTISPOOF_EAR_VETO_ENABLED) until ops deploys the face_landmarker.task asset; helper fails-soft to None when the model or MediaPipe is missing. Companion spoof-detector PR exposes the blink_analyzer module on the public `spoof_detector.*` namespace (per `feedback_spoof_detector_architecture`, algorithms live there). Bug 3 (P0) — VERIFICATION_THRESHOLD_AGED semantics inverted Comparator is `verified = distance < threshold`; default was THRESHOLD=0.45, THRESHOLD_AGED=0.38 — making aged users *stricter* (higher FRR), the opposite of the adaptive feature's intent. Raised THRESHOLD_AGED default to 0.55 (still well below Facenet cosine operating-point ceiling ~0.6 so FAR stays controlled). Added a Pydantic model_validator that hard-rejects aged < standard at config-load — the regression cannot silently come back via env-file edits. .env.example documents the comparator semantics inline. Tests: tests/unit/test_verification_threshold_aged.py (4 assertions). Bug 4 (P1) — Web puzzles call onSuccess client-side, no server validation Added POST /api/v1/liveness/verify-challenge for the web biometric-puzzles training surface. Single-action contract: `{action, start_timestamp_ms, end_timestamp_ms, confidence, ...}` → `{verified, action, duration_seconds, reason_code, message}`. Structural validation only (action enum, timestamps monotonic + sane duration 120ms..60s, confidence floor 0.5). Heavier server-side detection belongs to multi-step /liveness/verify. Tests: tests/integration/test_verify_challenge_endpoint.py (7 assertions). Web-app wiring lands in a companion PR on web-app. Bug 5 (P1) — SHA256 model integrity pins empty / advisory `_verify_model_integrity` previously logged a WARNING when the pin was empty. Added `DEEPFACE_SHA256_REQUIRED=true` (default). With this flag on AND ENVIRONMENT=production, an empty pin now raises RuntimeError at model-load — defense against silent ~/.deepface/weights/ rotations. Operator action: compute `sha256sum` against the in-container facenet512_weights.h5 and pin it via DEEPFACE_FACENET512_SHA256 in .env.prod (captured 2026-05-12 from running container: 3f76b5117a9ca574d536af8199e6720089eb4ad3dc7e93534496d88265de864f). The face/hand_landmarker.task hashes intentionally stay empty — those models are NOT loaded server-side; the server only delivers them as static SHA256-verified assets to clients. Tests: tests/unit/test_deepface_sha256_required.py (5 assertions). Test results (DATABASE_URL=postgresql://test:test@localhost:5432/test): - 4 new unit tests (verification_threshold_aged) - 5 new unit tests (deepface_sha256_required) - 8 new integration tests (verify_antispoof_block_enforce) - 7 new integration tests (verify_challenge_endpoint) - 6 pre-existing integration tests (verify_antispoof_wiring) — now also run locally thanks to added `resemblyzer` mock (baseline-rot fix). - test_config_validator.py — 14 pre-existing tests still green. Total: 44 pass / 0 fail locally. Operator action items: 1. Pin `DEEPFACE_FACENET512_SHA256` in /opt/projects/fivucsas/biometric-processor/.env.prod with the value captured above (already added to local .env.prod, NOT committed because .env.prod is gitignored). 2. Rebuild biometric-processor container to pick up these changes. 3. Decide whether to flip `ANTISPOOF_BLOCK_ENFORCE=false` for a canary rollout before relying on the default-ON behavior. 4. To enable Bug 2 EAR veto: deploy `models/face_landmarker.task`, set `FACE_LANDMARKER_MODEL_PATH`, then `ANTISPOOF_EAR_VETO_ENABLED=true`. 5. Add the identity-core-api proxy for `/biometric/puzzles/verify-challenge` when convenient — web-app soft-passes on 404 until it lands. Memory rules respected: - feedback_spoof_detector_architecture: algorithms come from spoof-detector via the new public shim; biometric-processor only imports + wires. - feedback_liveness_hybrid_vs_passive: no liveness backend changes; prod LIVENESS_BACKEND remains as configured by ops. - feedback_readonly_rootfs_cache_dirs: new lazy FaceLandmarker init respects the existing FACE_LANDMARKER_MODEL_PATH env contract; cache dirs unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) --- .env.example | 15 + app/api/routes/puzzle.py | 138 ++++++ app/api/routes/verification.py | 211 ++++++++- app/api/schemas/single_challenge.py | 85 ++++ app/api/schemas/verification.py | 17 +- app/core/config.py | 87 +++- .../ml/extractors/deepface_extractor.py | 26 +- .../test_verify_antispoof_block_enforce.py | 431 ++++++++++++++++++ .../test_verify_antispoof_wiring.py | 11 +- .../test_verify_challenge_endpoint.py | 132 ++++++ tests/unit/test_deepface_sha256_required.py | 166 +++++++ .../unit/test_verification_threshold_aged.py | 76 +++ 12 files changed, 1385 insertions(+), 10 deletions(-) create mode 100644 app/api/schemas/single_challenge.py create mode 100644 tests/integration/test_verify_antispoof_block_enforce.py create mode 100644 tests/integration/test_verify_challenge_endpoint.py create mode 100644 tests/unit/test_deepface_sha256_required.py create mode 100644 tests/unit/test_verification_threshold_aged.py diff --git a/.env.example b/.env.example index ac4b679..511f2fb 100644 --- a/.env.example +++ b/.env.example @@ -65,6 +65,21 @@ SIMILARITY_METRIC=cosine SIMILARITY_THRESHOLD=0.6 EMBEDDING_DIMENSION=2622 +# --------------------------------------------------------------------------- +# Verification thresholds (cosine distance) +# --------------------------------------------------------------------------- +# Comparator: verified = distance < threshold +# HIGHER threshold = MORE LENIENT (further-allowed distance still matches) +# LOWER threshold = STRICTER (only near-zero distances match) +# VERIFICATION_THRESHOLD_AGED must be >= VERIFICATION_THRESHOLD; the config +# loader rejects inverted values (see app/core/config.py +# _validate_aged_threshold_lenience). Bug 2026-05-12: an earlier default of +# 0.38 for aged users made them STRICTER, the opposite of the adaptive +# feature's intent. +# VERIFICATION_THRESHOLD=0.45 +# VERIFICATION_THRESHOLD_AGED_YEARS=2.0 +# VERIFICATION_THRESHOLD_AGED=0.55 # higher than 0.45 ⇒ more lenient for aged + # Alternative Models (comment/uncomment to switch) # IMPORTANT: When using pgvector, EMBEDDING_DIMENSION must match your model! diff --git a/app/api/routes/puzzle.py b/app/api/routes/puzzle.py index a100f7e..d48c56c 100644 --- a/app/api/routes/puzzle.py +++ b/app/api/routes/puzzle.py @@ -18,6 +18,10 @@ VerifyPuzzleRequest, VerifyPuzzleResponse, ) +from app.api.schemas.single_challenge import ( + VerifyChallengeRequest, + VerifyChallengeResponse, +) from app.application.use_cases.generate_puzzle import GeneratePuzzleUseCase from app.application.use_cases.verify_puzzle import VerifyPuzzleUseCase from app.core.container import get_generate_puzzle_use_case, get_verify_puzzle_use_case @@ -268,3 +272,137 @@ async def verify_puzzle( status_code=500, detail="Failed to verify puzzle. Please try again.", ) + + +# --------------------------------------------------------------------------- +# Bug 4 (2026-05-12) — single-challenge server validation for the web +# biometric-puzzles training surface. +# --------------------------------------------------------------------------- +# +# Before this endpoint, ``FacePuzzle.tsx`` and ``HandGesturePuzzle.tsx`` +# detected gestures client-side and called ``onSuccess()`` directly. A +# malicious user could mock the component out and "pass" any challenge. +# This endpoint adds a server round-trip the web layer waits on, so +# ``onSuccess`` is only invoked when the backend confirms the structural +# checks below. +# +# Scope: structural validation only (action enum, timestamp monotonicity, +# duration sanity, confidence floor). Heavier server-side detection +# (re-running MediaPipe on uploaded frames) belongs to the multi-step +# ``/liveness/verify`` flow used by enrollment. The training surface is +# explicitly lightweight. + + +# Minimum challenge duration (seconds). A real human gesture takes at +# least ~120 ms even for the fastest blinks; bot scripts firing the +# endpoint immediately are caught here. +_MIN_CHALLENGE_DURATION_S = 0.12 + +# Maximum challenge duration (seconds). Anything beyond 60 s is a stale +# session or a replay; reject. +_MAX_CHALLENGE_DURATION_S = 60.0 + +# Minimum detection confidence the client must report. Below this the +# server treats the submission as "no detection" regardless of the local +# verdict. The floor is conservative (matches the engine's typical +# detected-pass threshold of 0.5). +_MIN_CHALLENGE_CONFIDENCE = 0.5 + + +@router.post( + "/verify-challenge", + response_model=VerifyChallengeResponse, + summary="Verify a single liveness challenge (web training surface)", + description=( + "Lightweight server validation for the biometric-puzzles training " + "surface. Accepts one completed challenge, runs structural checks " + "(action enum, timestamp monotonicity, duration sanity, confidence " + "floor) and returns a verdict. The web layer must wait on this " + "before resolving its onSuccess()." + ), + responses={ + 200: {"description": "Verdict returned (success=true|false)"}, + 400: {"description": "Malformed request"}, + }, +) +async def verify_challenge( + request: VerifyChallengeRequest, +) -> VerifyChallengeResponse: + """Server validation for a single training puzzle challenge.""" + duration_s = max( + 0.0, (request.end_timestamp_ms - request.start_timestamp_ms) / 1000.0 + ) + + # 1. Timestamps must be monotonic (end >= start). + if request.end_timestamp_ms < request.start_timestamp_ms: + logger.info( + "verify-challenge rejected: timestamps out of order action=%s", + request.action.value, + ) + return VerifyChallengeResponse( + verified=False, + action=request.action, + duration_seconds=duration_s, + reason_code="TIMESTAMPS_OUT_OF_ORDER", + message="Challenge timestamps are not monotonic.", + ) + + # 2. Duration in sane bounds. + if duration_s < _MIN_CHALLENGE_DURATION_S: + logger.info( + "verify-challenge rejected: duration_too_short action=%s duration=%.3fs", + request.action.value, + duration_s, + ) + return VerifyChallengeResponse( + verified=False, + action=request.action, + duration_seconds=duration_s, + reason_code="DURATION_TOO_SHORT", + message="Challenge duration is implausibly short.", + ) + if duration_s > _MAX_CHALLENGE_DURATION_S: + logger.info( + "verify-challenge rejected: duration_too_long action=%s duration=%.1fs", + request.action.value, + duration_s, + ) + return VerifyChallengeResponse( + verified=False, + action=request.action, + duration_seconds=duration_s, + reason_code="DURATION_TOO_LONG", + message="Challenge duration exceeds the allowed window.", + ) + + # 3. Confidence floor. + if request.confidence < _MIN_CHALLENGE_CONFIDENCE: + logger.info( + "verify-challenge rejected: confidence_below_floor action=%s conf=%.2f", + request.action.value, + request.confidence, + ) + return VerifyChallengeResponse( + verified=False, + action=request.action, + duration_seconds=duration_s, + reason_code="CONFIDENCE_BELOW_FLOOR", + message="Detection confidence is below the acceptance floor.", + ) + + logger.info( + "verify-challenge accepted: action=%s tenant=%s user=%s " + "duration=%.2fs confidence=%.2f", + request.action.value, + request.tenant_id, + request.user_id, + duration_s, + request.confidence, + ) + return VerifyChallengeResponse( + verified=True, + action=request.action, + duration_seconds=duration_s, + reason_code=None, + message="Challenge verified.", + ) diff --git a/app/api/routes/verification.py b/app/api/routes/verification.py index 27183a0..7220c94 100644 --- a/app/api/routes/verification.py +++ b/app/api/routes/verification.py @@ -50,6 +50,15 @@ _antispoof_assembler: Optional[Any] = None # AntispoofPipelineAssembler when available _antispoof_assembler_init_failed = False +# Bug 2 (2026-05-12) — single-frame EAR liveness signal. Wires the +# spoof-detector EAR computation into /verify so the closed-eye signal can +# veto a verification. Multi-frame BlinkAnalyzer state (cache + per-face +# history) is out of scope here because /verify only receives one still +# frame; that wiring belongs in the multi-frame /liveness/verify route +# which already runs the puzzle pipeline. +_face_landmarker_for_ear: Optional[Any] = None +_face_landmarker_for_ear_init_failed = False + def _get_device_spoof_risk_evaluator() -> DeviceSpoofRiskEvaluator: """Lazy-init singleton — DeviceSpoofRiskEvaluator constructs cv2 detectors @@ -129,6 +138,164 @@ def _get_antispoof_assembler() -> Optional[Any]: return None +def _get_face_landmarker_for_ear() -> Optional[Any]: + """Lazy-init a MediaPipe FaceLandmarker for single-frame EAR extraction. + + Returns None if MediaPipe is not importable or the asset isn't present. + The result is cached so we don't pay model-init cost on every request. + Failures are recorded so we don't spam logs on every request. + """ + global _face_landmarker_for_ear, _face_landmarker_for_ear_init_failed + if _face_landmarker_for_ear is not None: + return _face_landmarker_for_ear + if _face_landmarker_for_ear_init_failed: + return None + try: + import os + from pathlib import Path + import mediapipe as mp + + # Reuse the same model path the active_liveness_manager loader uses, + # honouring FACE_LANDMARKER_MODEL_PATH so ops can override per-env. + default_path = ( + Path(__file__).parent.parent.parent.parent / "models" / "face_landmarker.task" + ) + model_path = Path(os.getenv("FACE_LANDMARKER_MODEL_PATH", str(default_path))) + if not model_path.exists(): + logger.info( + "EAR check disabled — face_landmarker.task not found at %s " + "(set FACE_LANDMARKER_MODEL_PATH to a deployed asset to enable)", + model_path, + ) + _face_landmarker_for_ear_init_failed = True + return None + + options = mp.tasks.vision.FaceLandmarkerOptions( + base_options=mp.tasks.BaseOptions(model_asset_path=str(model_path)), + running_mode=mp.tasks.vision.RunningMode.IMAGE, + num_faces=1, + min_face_detection_confidence=0.4, + min_tracking_confidence=0.4, + ) + _face_landmarker_for_ear = mp.tasks.vision.FaceLandmarker.create_from_options( + options + ) + logger.info("FaceLandmarker initialised for single-frame EAR liveness check") + return _face_landmarker_for_ear + except Exception as exc: # noqa: BLE001 + logger.warning( + "FaceLandmarker init for EAR failed; closed-eye veto disabled: %s", + exc, + ) + _face_landmarker_for_ear_init_failed = True + return None + + +def _evaluate_ear_liveness_safe(image_path: str) -> Optional[dict]: + """Run a one-shot Eye Aspect Ratio check on a single still frame. + + Uses the EAR computation from the spoof-detector library (paper-P0 + calibration 2026-05-11: EAR_THRESHOLD=0.18). A frame where BOTH eyes + are clearly closed is treated as a strong spoof signal — single photos + of closed eyes are rare in legitimate verification flows, but they + matter as a defensive complement to texture-based liveness. + + Returns a dict with shape: + { + "eyes_closed": bool, + "left_ear": float, + "right_ear": float, + "avg_ear": float, + "threshold": float, + } + or None when the check can't run (no MediaPipe, no model, no face). + """ + if not settings.ANTISPOOF_EAR_VETO_ENABLED: + return None + try: + import cv2 + import numpy as np + import mediapipe as mp + from spoof_detector.infrastructure.analyzers.blink_analyzer import ( + BlinkAnalyzer, + LEFT_EYE, + RIGHT_EYE, + compute_ear, + ) + except ImportError as exc: # pragma: no cover - dep missing in CI + logger.warning("EAR check unavailable; import failed: %s", exc) + return None + try: + landmarker = _get_face_landmarker_for_ear() + if landmarker is None: + return None + + frame_bgr = cv2.imread(image_path) + if frame_bgr is None or frame_bgr.size == 0: + return None + + h, w = frame_bgr.shape[:2] + rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB) + mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=rgb) + result = landmarker.detect(mp_image) + face_landmarks = result.face_landmarks or [] + if not face_landmarks: + return None + + # Use the first detected face. The pixel-space conversion matches + # the spoof-detector blink_analyzer contract. + lm = np.array( + [[l.x * w, l.y * h, l.z] for l in face_landmarks[0]] + ) + if len(lm) < 468: + return None + + left = compute_ear(lm, LEFT_EYE) + right = compute_ear(lm, RIGHT_EYE) + avg = (left + right) / 2.0 + threshold = BlinkAnalyzer.EAR_THRESHOLD + return { + "eyes_closed": bool(avg < threshold), + "left_ear": round(left, 4), + "right_ear": round(right, 4), + "avg_ear": round(avg, 4), + "threshold": threshold, + } + except Exception as exc: # noqa: BLE001 — fail-soft + logger.warning("EAR liveness check failed: %s", exc) + return None + + +def _merge_block_verdict( + *, + antispoof_pipeline: Optional[dict], + ear_liveness: Optional[dict], +) -> Optional[str]: + """Conservative veto — any spoof-leaning signal wins. + + Returns a reason category string when verification MUST be blocked, or + ``None`` when the request is allowed to proceed. The reason category is + surfaced in the 403 body so callers can branch on it. + + Veto rules (any one triggers a block): + * ``antispoof_pipeline.recommended_action == "block"`` + * EAR check says ``eyes_closed=True`` (single still frame of closed + eyes is a strong spoof indicator). + """ + if antispoof_pipeline is not None: + action = str(antispoof_pipeline.get("recommended_action", "")).lower() + if action == "block": + # Use the most specific reason the assembler attached. + if antispoof_pipeline.get("face_usability_block"): + return "FACE_UNUSABLE" + if antispoof_pipeline.get("hybrid_fusion_is_spoof"): + return "HYBRID_FUSION_SPOOF" + return "ANTISPOOF_BLOCK" + if ear_liveness is not None and ear_liveness.get("eyes_closed") is True: + return "EYES_CLOSED" + return None + + def _evaluate_antispoof_pipeline_safe(image_path: str) -> Optional[dict]: """Run the full anti-spoof assembler on an on-disk image. @@ -273,11 +440,52 @@ async def verify_face( # Anti-spoof attachments. Both fields default None and are populated # only when their respective flags are on. The helpers swallow any - # exception — they never block verification. + # exception — they never block verification by raising. device_spoof_risk: Optional[dict] = None if settings.ANTISPOOF_DEVICE_RISK_ENABLED: device_spoof_risk = _evaluate_device_spoof_risk_safe(image_path) antispoof_pipeline = _evaluate_antispoof_pipeline_safe(image_path) + ear_liveness = _evaluate_ear_liveness_safe(image_path) + + # Bug 1 (2026-05-12) — enforce assembler's `recommended_action="block"` + # and the EAR single-frame closed-eye signal. Previously the assembler + # verdict was advisory: a "block" recommendation was attached to the + # response but the route still returned `verified=true`. With + # ANTISPOOF_BLOCK_ENFORCE=true (the default) we now return 403 with + # a structured body. An operator can flip the flag to false for + # observation-only / canary rollout. + block_reason = _merge_block_verdict( + antispoof_pipeline=antispoof_pipeline, + ear_liveness=ear_liveness, + ) + if block_reason is not None and settings.ANTISPOOF_BLOCK_ENFORCE: + logger.warning( + "Verification BLOCKED by anti-spoof veto: user_id=%s reason=%s " + "assembler_action=%s ear_avg=%s", + user_id, + block_reason, + (antispoof_pipeline or {}).get("recommended_action"), + (ear_liveness or {}).get("avg_ear"), + ) + raise HTTPException( + status_code=403, + detail={ + "error_code": "ANTISPOOF_BLOCKED", + "reason": block_reason, + "antispoof_pipeline": antispoof_pipeline, + "ear_liveness": ear_liveness, + "message": "Verification rejected by anti-spoof checks", + }, + ) + if block_reason is not None: + # Enforcement disabled: log the bypass loudly so it's visible in + # production log streams when an operator runs in observation mode. + logger.warning( + "Verification anti-spoof veto SUPPRESSED (ANTISPOOF_BLOCK_ENFORCE=false): " + "user_id=%s reason=%s", + user_id, + block_reason, + ) response = VerificationResponse( verified=result.verified, @@ -287,6 +495,7 @@ async def verify_face( message=message, device_spoof_risk=device_spoof_risk, antispoof_pipeline=antispoof_pipeline, + ear_liveness=ear_liveness, ) # D1 log-only: persist client pre-filter embedding for offline analysis. diff --git a/app/api/schemas/single_challenge.py b/app/api/schemas/single_challenge.py new file mode 100644 index 0000000..f643c80 --- /dev/null +++ b/app/api/schemas/single_challenge.py @@ -0,0 +1,85 @@ +"""Schema for single-challenge server validation (Bug 4, 2026-05-12). + +The web biometric-puzzles training surface (``BiometricPuzzlesPage``) runs +one challenge at a time with local MediaPipe detection. Before this fix, it +called ``onSuccess`` purely client-side — anyone could trivially mock the +component and "pass" the puzzle. This schema is for the new +``/liveness/verify-challenge`` endpoint that records a server round-trip +for each completed challenge and returns a server verdict. + +The contract is intentionally narrow: + * One action per request. + * Client supplies start/end timestamps and a detection confidence + derived from MediaPipe. + * Server runs the cheap structural validations (action is a known type, + timestamps are monotonic and within a reasonable window, confidence + above a floor) and returns a verdict. + +Heavier server-side detection (re-running MediaPipe on uploaded frames) is +out of scope for the training surface — the deep validation belongs to +multi-step ``/liveness/verify`` flows used by enrollment. +""" + +from __future__ import annotations + +from typing import Any, Dict, Optional + +from pydantic import BaseModel, Field + +from app.api.schemas.active_liveness import ChallengeType + + +class VerifyChallengeRequest(BaseModel): + """Single challenge completion record submitted by the web client.""" + + action: ChallengeType = Field( + ..., description="The completed challenge action (e.g. blink, smile, pinch)." + ) + start_timestamp_ms: float = Field( + ..., + gt=0, + description=( + "Client clock (performance.now() base or unix-ms) when the " + "challenge started. Used for monotonicity + duration sanity." + ), + ) + end_timestamp_ms: float = Field( + ..., + gt=0, + description="Client clock when the challenge was detected as completed.", + ) + confidence: float = Field( + ..., + ge=0.0, + le=1.0, + description="Detection confidence reported by the client engine [0..1].", + ) + tenant_id: Optional[str] = Field(default=None, description="Tenant identifier.") + user_id: Optional[str] = Field(default=None, description="User identifier.") + metrics: Dict[str, Any] = Field( + default_factory=dict, + description=( + "Optional metric payload (e.g. min_ear for blink, mar_ratio for " + "smile, finger_count for hand puzzles). Logged for audit, never " + "used as the sole pass/fail signal." + ), + ) + + +class VerifyChallengeResponse(BaseModel): + """Server verdict for a single challenge submission.""" + + verified: bool = Field(..., description="Whether the challenge passed.") + action: ChallengeType = Field(..., description="The echoed challenge action.") + duration_seconds: float = Field( + ..., ge=0.0, description="end - start, in seconds (post-validation)." + ) + reason_code: Optional[str] = Field( + default=None, + description=( + "Failure category when ``verified=false`` " + "(e.g. TIMESTAMPS_OUT_OF_ORDER, DURATION_TOO_SHORT, " + "CONFIDENCE_BELOW_FLOOR, UNKNOWN_ACTION)." + ), + ) + message: str = Field(default="", description="Human-readable result message.") diff --git a/app/api/schemas/verification.py b/app/api/schemas/verification.py index cff0db7..5f8febf 100644 --- a/app/api/schemas/verification.py +++ b/app/api/schemas/verification.py @@ -32,8 +32,20 @@ class VerificationResponse(BaseModel): description=( "Optional combined verdict from spoof_detector.pipeline.AntispoofPipelineAssembler. " "Populated only when at least one of ANTISPOOF_USABILITY_GATE_ENABLED / " - "ANTISPOOF_FUSION_ENABLED is true. The `recommended_action` is advisory; " - "this service never enforces it." + "ANTISPOOF_FUSION_ENABLED is true. When `recommended_action` is " + "'block' AND ANTISPOOF_BLOCK_ENFORCE is true (default since " + "2026-05-12), the route returns HTTP 403 instead of attaching the " + "verdict here." + ), + ) + ear_liveness: Optional[dict[str, Any]] = Field( + default=None, + description=( + "Optional single-frame Eye Aspect Ratio liveness observation from " + "spoof_detector.infrastructure.analyzers.blink_analyzer. Populated " + "only when ANTISPOOF_EAR_VETO_ENABLED=true. When 'eyes_closed' is " + "True AND ANTISPOOF_BLOCK_ENFORCE is true, the route returns 403 " + "instead of attaching the verdict here." ), ) @@ -47,6 +59,7 @@ class VerificationResponse(BaseModel): "message": "Face verified successfully", "device_spoof_risk": None, "antispoof_pipeline": None, + "ear_liveness": None, } } } diff --git a/app/core/config.py b/app/core/config.py index 41c19e5..dffbcaa 100644 --- a/app/core/config.py +++ b/app/core/config.py @@ -153,6 +153,13 @@ def parse_cors_origins(cls, v): ) # Thresholds + # Comparator semantics: ``verified = distance < threshold``. + # → HIGHER threshold = MORE LENIENT (allows greater distance ⇒ still a match). + # → LOWER threshold = STRICTER (only very close distances accepted). + # This is the cosine-distance convention used in + # ``verify_face.py`` (line 181). Do not flip the comparator without also + # flipping every threshold pin in env.example / .env.prod, otherwise the + # FAR/FRR will silently invert. VERIFICATION_THRESHOLD: float = Field(default=0.45, ge=0.0, le=1.0) LIVENESS_THRESHOLD: float = Field(default=70.0, ge=0.0, le=100.0) QUALITY_THRESHOLD: float = Field(default=70.0, ge=0.0, le=100.0) @@ -160,6 +167,12 @@ def parse_cors_origins(cls, v): # Adaptive verification threshold for aged embeddings (Faz 3-1) # When the stored embedding is older than VERIFICATION_THRESHOLD_AGED_YEARS, # a more lenient threshold is used to account for natural appearance changes. + # Bug fix 2026-05-12: previously default=0.38 which is LOWER than the + # standard 0.45 — under ``distance < threshold`` semantics that made aged + # users *stricter*, the opposite of intent (higher FRR). The default is + # now 0.55, raising the allowed distance ceiling so aged users match more + # easily, while staying well below the Facenet cosine-distance ceiling + # of ~0.6 (the model's known operating point for cosine distance). VERIFICATION_THRESHOLD_AGED_YEARS: float = Field( default=2.0, ge=0.0, @@ -169,16 +182,40 @@ def parse_cors_origins(cls, v): ), ) VERIFICATION_THRESHOLD_AGED: float = Field( - default=0.38, + default=0.55, ge=0.0, le=1.0, description=( "Cosine-distance threshold applied when embedding age exceeds " - "VERIFICATION_THRESHOLD_AGED_YEARS. Lower than the default (0.45) " - "to be more lenient with aged embeddings." + "VERIFICATION_THRESHOLD_AGED_YEARS. HIGHER than the default (0.45) " + "because the comparator is ``distance < threshold`` — a larger " + "allowed-distance ceiling means more lenient matching for aged " + "embeddings. Must remain below the Facenet cosine-distance " + "ceiling (~0.6) to keep FAR under control." ), ) + @model_validator(mode="after") + def _validate_aged_threshold_lenience(self) -> "Settings": + """Catch the pre-2026-05-12 inversion regression at config-load time. + + ``VERIFICATION_THRESHOLD_AGED`` must be >= ``VERIFICATION_THRESHOLD`` + because the comparator is ``distance < threshold`` — a stricter + ceiling for aged embeddings is meaningless (it would force aged users + to match the standard with *additional* margin, the opposite of the + adaptive feature's purpose). + """ + if self.VERIFICATION_THRESHOLD_AGED < self.VERIFICATION_THRESHOLD: + raise ValueError( + "Configuration inversion detected: VERIFICATION_THRESHOLD_AGED " + f"({self.VERIFICATION_THRESHOLD_AGED}) must be >= " + f"VERIFICATION_THRESHOLD ({self.VERIFICATION_THRESHOLD}) " + "under the ``distance < threshold`` comparator. A lower aged " + "threshold makes aged users *stricter*, not more lenient. " + "See app/application/use_cases/verify_face.py:181." + ) + return self + # ML Model Timeouts (prevents hung requests) ML_MODEL_TIMEOUT_SECONDS: int = Field(default=30, ge=5, le=120, description="Timeout for ML model operations") @@ -678,6 +715,36 @@ def get_api_key_config(self) -> dict: "toggle." ), ) + # Bug 1 (2026-05-12) — enforcement flag for `recommended_action="block"`. + # Before this, the assembler's block verdict was attached to the + # response but the route still returned 200/verified=True (advisory + # only). Default is now ON: any "block" verdict from the assembler or + # any closed-eye signal from the EAR check yields a 403 with a + # structured reason. Flip to false for canary/observation rollout. + ANTISPOOF_BLOCK_ENFORCE: bool = Field( + default=True, + description=( + "When True, AntispoofPipelineAssembler 'recommended_action=block' " + "and EAR closed-eye detection cause the /verify route to return " + "HTTP 403 (was advisory-only prior to 2026-05-12)." + ), + ) + # Bug 2 (2026-05-12) — single-frame EAR liveness signal flag. + # When True, /verify runs MediaPipe FaceLandmarker on the uploaded + # frame and computes Eye Aspect Ratio via the spoof-detector library + # (calibration EAR_THRESHOLD=0.18, paper-P0 2026-05-11). If both eyes + # are clearly closed, the request is vetoed alongside the assembler. + # Default OFF until ops deploys the face_landmarker.task asset to the + # container — the helper fails-soft to None when the model is missing. + ANTISPOOF_EAR_VETO_ENABLED: bool = Field( + default=False, + description=( + "Enable the single-frame EAR (Eye Aspect Ratio) closed-eye veto " + "on /verify. Requires FACE_LANDMARKER_MODEL_PATH to point at a " + "deployed face_landmarker.task asset. Fails-soft to no-op when " + "MediaPipe or the model is unavailable." + ), + ) GESTURE_HAND_LANDMARKER_MODEL_PATH: str = Field( default=str(_REPO_ROOT / "models" / "hand_landmarker.task"), description=( @@ -732,6 +799,20 @@ def get_api_key_config(self) -> dict: default="", description="Expected SHA256 hex digest for Facenet512 weights file (empty = skip with warning)", ) + # Bug 5 (2026-05-12) — fail-fast in prod when SHA pin is missing. + # Previously an empty DEEPFACE_FACENET512_SHA256 only logged a warning, + # which let an undetected weight rotation (or supply-chain compromise of + # ``~/.deepface/weights/``) silently change embeddings. With this flag on + # and ENVIRONMENT=production, an empty pin now raises at model-load time. + DEEPFACE_SHA256_REQUIRED: bool = Field( + default=True, + description=( + "When True (default) AND ENVIRONMENT=production, refuse to load " + "the DeepFace model unless DEEPFACE_FACENET512_SHA256 is pinned. " + "Set False to opt out (e.g. first-deploy of a new model version " + "before the hash has been captured)." + ), + ) # ML-M5: server-side caps on find_similar threshold/limit (caller-controlled today). FIND_SIMILAR_FACE_MAX_THRESHOLD: float = Field( diff --git a/app/infrastructure/ml/extractors/deepface_extractor.py b/app/infrastructure/ml/extractors/deepface_extractor.py index 4ad7854..0463e17 100644 --- a/app/infrastructure/ml/extractors/deepface_extractor.py +++ b/app/infrastructure/ml/extractors/deepface_extractor.py @@ -88,9 +88,29 @@ def _verify_model_integrity(model_name: str) -> None: return if not expected: - # TODO: pin DEEPFACE_FACENET512_SHA256 in config.py once the known-good - # hash has been recorded from a trusted build. See ML-M1 in - # docs/audits/AUDIT_2026-04-19.md. + # Bug 5 (2026-05-12) — defense in depth: in production, an empty pin + # is no longer "warn + skip". An unpinned model means a weight + # rotation (or supply-chain compromise) of ~/.deepface/weights/ can + # land silently, so prod fails fast unless an operator explicitly + # opts out via DEEPFACE_SHA256_REQUIRED=False (e.g. during the very + # first deploy of a new model version where the hash hasn't been + # captured yet). + required = getattr(settings, "DEEPFACE_SHA256_REQUIRED", False) + env = (getattr(settings, "ENVIRONMENT", "") or "").lower() + if required and env == "production": + logger.error( + "DeepFace model integrity pin missing for %s while " + "DEEPFACE_SHA256_REQUIRED=true on production. Refusing to " + "load the model — set DEEPFACE_FACENET512_SHA256 in .env.prod " + "with the output of `sha256sum %s`.", + weight_path, + weight_path, + ) + raise RuntimeError( + "DeepFace model integrity pin missing — refusing to load " + f"{weight_path}. Set DEEPFACE_FACENET512_SHA256 or set " + "DEEPFACE_SHA256_REQUIRED=false to opt out (not recommended)." + ) logger.warning( "DeepFace model integrity check skipped (no pinned hash): %s. " "Set DEEPFACE_FACENET512_SHA256 once verified.", diff --git a/tests/integration/test_verify_antispoof_block_enforce.py b/tests/integration/test_verify_antispoof_block_enforce.py new file mode 100644 index 0000000..25effa6 --- /dev/null +++ b/tests/integration/test_verify_antispoof_block_enforce.py @@ -0,0 +1,431 @@ +"""Integration tests for ANTISPOOF_BLOCK_ENFORCE + ANTISPOOF_EAR_VETO_ENABLED. + +Bugs fixed 2026-05-12: + * Bug 1: AntispoofPipelineAssembler `recommended_action="block"` was + advisory — the route attached it to the response but still returned + 200/verified=True. We now return 403 with a structured body when + enforce is on. + * Bug 2: blink-cache/EAR work from spoof-detector was unreachable from + /verify. We now wire `compute_ear` into a single-frame check and + veto when both eyes are closed. + +Per the existing test_verify_antispoof_wiring.py convention this file uses +a module-scoped TestClient to avoid the anyio-portal closed-loop issue when +the route's lru-cached deps are recreated mid-suite. +""" + +from __future__ import annotations + +import io +import sys +from unittest.mock import AsyncMock, Mock, patch + +import cv2 +import numpy as np +import pytest + +# Mock DeepFace before any imports that depend on it (same pattern as +# test_verify_antispoof_wiring.py). Resemblyzer is required by main.py's +# lifespan via SpeakerEmbedder; the dev host doesn't have it installed +# (it's a CPU-heavy optional dep), so mock it too so the TestClient +# lifespan succeeds. This is the "baseline rot" pattern documented in +# bio main — 79 pre-existing failing tests share the same root cause. +sys.modules.setdefault("deepface", Mock()) +sys.modules.setdefault("deepface.DeepFace", Mock()) +sys.modules.setdefault("resemblyzer", Mock(VoiceEncoder=Mock())) + +from fastapi.testclient import TestClient + +from app.api.routes import verification as verify_route +from app.core.container import ( + get_check_liveness_use_case, + get_client_embedding_observation_repository, + get_file_storage, + get_verify_face_use_case, +) +from app.domain.entities.liveness_result import LivenessResult +from app.domain.entities.verification_result import VerificationResult +from app.main import app + + +@pytest.fixture(scope="module") +def _module_client(): + with TestClient(app) as c: + yield c + + +@pytest.fixture +def client(_module_client) -> TestClient: + verify_route._antispoof_assembler = None + verify_route._antispoof_assembler_init_failed = False + verify_route._device_spoof_risk_evaluator = None + verify_route._face_landmarker_for_ear = None + verify_route._face_landmarker_for_ear_init_failed = False + app.dependency_overrides.clear() + + yield _module_client + + app.dependency_overrides.clear() + verify_route._antispoof_assembler = None + verify_route._antispoof_assembler_init_failed = False + verify_route._device_spoof_risk_evaluator = None + verify_route._face_landmarker_for_ear = None + verify_route._face_landmarker_for_ear_init_failed = False + + +@pytest.fixture +def test_image_file(): + img = np.full((100, 100, 3), 80, dtype=np.uint8) + ok, buf = cv2.imencode(".jpg", img) + assert ok + return ("test.jpg", io.BytesIO(buf.tobytes()), "image/jpeg") + + +@pytest.fixture +def mocks(tmp_path): + """Wire all upstream deps with fast, deterministic AsyncMocks.""" + img = np.full((100, 100, 3), 80, dtype=np.uint8) + ok, buf = cv2.imencode(".jpg", img) + assert ok + image_path = tmp_path / "saved.jpg" + image_path.write_bytes(buf.tobytes()) + + verify_uc = Mock() + verify_uc.execute = AsyncMock( + return_value=VerificationResult( + verified=True, confidence=0.87, distance=0.13, threshold=0.6, + ) + ) + + liveness_uc = Mock() + liveness_uc.execute = AsyncMock( + return_value=LivenessResult( + is_live=True, score=92.0, challenge="none", + challenge_completed=True, confidence=0.91, + ) + ) + + storage = Mock() + storage.save_temp = AsyncMock(return_value=str(image_path)) + storage.cleanup = AsyncMock() + + observation_repo = Mock() + observation_repo.record = AsyncMock() + + return verify_uc, liveness_uc, storage, observation_repo + + +def _wire(verify_uc, liveness_uc, storage, observation_repo) -> None: + app.dependency_overrides[get_verify_face_use_case] = lambda: verify_uc + app.dependency_overrides[get_check_liveness_use_case] = lambda: liveness_uc + app.dependency_overrides[get_file_storage] = lambda: storage + app.dependency_overrides[get_client_embedding_observation_repository] = ( + lambda: observation_repo + ) + + +# --------------------------------------------------------------------------- +# Bug 1: enforce assembler recommended_action="block" +# --------------------------------------------------------------------------- + + +def test_block_verdict_triggers_403_when_enforce_on( + client: TestClient, mocks, test_image_file +) -> None: + """recommended_action='block' + enforce=True → HTTP 403.""" + verify_uc, liveness_uc, storage, observation_repo = mocks + _wire(verify_uc, liveness_uc, storage, observation_repo) + + fake_block_result = { + "face_usability_block": True, + "face_usability_reason": "occluded", + "device_replay_risk": 0.05, + "device_signals": {"moire_risk": 0.0}, + "hybrid_fusion_is_spoof": None, + "hybrid_fusion_score": None, + "hybrid_fusion_reasoning": None, + "recommended_action": "block", + "layers_evaluated": ["face_usability"], + } + + with patch.object( + verify_route.settings, "ANTISPOOF_BLOCK_ENFORCE", True + ), patch.object( + verify_route.settings, "ANTISPOOF_EAR_VETO_ENABLED", False + ), patch.object( + verify_route.settings, "ANTISPOOF_FUSION_ENABLED", True + ), patch.object( + verify_route, "_evaluate_antispoof_pipeline_safe", + return_value=fake_block_result, + ): + resp = client.post( + "/api/v1/verify", + data={"user_id": "test_user_block"}, + files={"file": test_image_file}, + ) + + assert resp.status_code == 403, resp.text + body = resp.json() + detail = body.get("detail") or body + assert detail.get("error_code") == "ANTISPOOF_BLOCKED" + assert detail.get("reason") == "FACE_UNUSABLE" + assert detail.get("antispoof_pipeline") == fake_block_result + + +def test_block_verdict_passes_when_enforce_off( + client: TestClient, mocks, test_image_file +) -> None: + """recommended_action='block' + enforce=False → 200 + verdict attached.""" + verify_uc, liveness_uc, storage, observation_repo = mocks + _wire(verify_uc, liveness_uc, storage, observation_repo) + + fake_block_result = { + "face_usability_block": False, + "face_usability_reason": None, + "device_replay_risk": 0.85, + "device_signals": {"moire_risk": 0.7}, + "hybrid_fusion_is_spoof": True, + "hybrid_fusion_score": 0.92, + "hybrid_fusion_reasoning": "spoof detected via fusion", + "recommended_action": "block", + "layers_evaluated": ["device_spoof_risk", "hybrid_fusion"], + } + + with patch.object( + verify_route.settings, "ANTISPOOF_BLOCK_ENFORCE", False + ), patch.object( + verify_route.settings, "ANTISPOOF_EAR_VETO_ENABLED", False + ), patch.object( + verify_route.settings, "ANTISPOOF_FUSION_ENABLED", True + ), patch.object( + verify_route, "_evaluate_antispoof_pipeline_safe", + return_value=fake_block_result, + ): + resp = client.post( + "/api/v1/verify", + data={"user_id": "test_user_observe"}, + files={"file": test_image_file}, + ) + + # Enforce off — verification still returns 200 with the verdict attached. + assert resp.status_code == 200, resp.text + body = resp.json() + assert body["verified"] is True + assert body["antispoof_pipeline"] == fake_block_result + + +def test_allow_verdict_passes_with_enforce_on( + client: TestClient, mocks, test_image_file +) -> None: + """recommended_action='allow' must never trigger a block.""" + verify_uc, liveness_uc, storage, observation_repo = mocks + _wire(verify_uc, liveness_uc, storage, observation_repo) + + fake_allow_result = { + "face_usability_block": False, + "face_usability_reason": None, + "device_replay_risk": 0.05, + "device_signals": {"moire_risk": 0.01}, + "hybrid_fusion_is_spoof": False, + "hybrid_fusion_score": 0.18, + "hybrid_fusion_reasoning": "LIVE verified", + "recommended_action": "allow", + "layers_evaluated": ["device_spoof_risk", "hybrid_fusion"], + } + + with patch.object( + verify_route.settings, "ANTISPOOF_BLOCK_ENFORCE", True + ), patch.object( + verify_route.settings, "ANTISPOOF_EAR_VETO_ENABLED", False + ), patch.object( + verify_route.settings, "ANTISPOOF_FUSION_ENABLED", True + ), patch.object( + verify_route, "_evaluate_antispoof_pipeline_safe", + return_value=fake_allow_result, + ): + resp = client.post( + "/api/v1/verify", + data={"user_id": "test_user_allow"}, + files={"file": test_image_file}, + ) + + assert resp.status_code == 200, resp.text + body = resp.json() + assert body["verified"] is True + + +def test_review_verdict_passes_with_enforce_on( + client: TestClient, mocks, test_image_file +) -> None: + """recommended_action='review' must NOT cause a block (review != block).""" + verify_uc, liveness_uc, storage, observation_repo = mocks + _wire(verify_uc, liveness_uc, storage, observation_repo) + + fake_review_result = { + "face_usability_block": False, + "face_usability_reason": None, + "device_replay_risk": 0.72, + "device_signals": {"moire_risk": 0.3}, + "hybrid_fusion_is_spoof": False, + "hybrid_fusion_score": 0.5, + "hybrid_fusion_reasoning": "borderline", + "recommended_action": "review", + "layers_evaluated": ["device_spoof_risk", "hybrid_fusion"], + } + + with patch.object( + verify_route.settings, "ANTISPOOF_BLOCK_ENFORCE", True + ), patch.object( + verify_route.settings, "ANTISPOOF_EAR_VETO_ENABLED", False + ), patch.object( + verify_route.settings, "ANTISPOOF_FUSION_ENABLED", True + ), patch.object( + verify_route, "_evaluate_antispoof_pipeline_safe", + return_value=fake_review_result, + ): + resp = client.post( + "/api/v1/verify", + data={"user_id": "test_user_review"}, + files={"file": test_image_file}, + ) + + assert resp.status_code == 200, resp.text + assert resp.json()["antispoof_pipeline"] == fake_review_result + + +# --------------------------------------------------------------------------- +# Bug 2: EAR closed-eye veto +# --------------------------------------------------------------------------- + + +def test_ear_closed_eyes_triggers_403_when_enforce_on( + client: TestClient, mocks, test_image_file +) -> None: + """eyes_closed=True from EAR check vetoes the request alongside assembler.""" + verify_uc, liveness_uc, storage, observation_repo = mocks + _wire(verify_uc, liveness_uc, storage, observation_repo) + + fake_ear_result = { + "eyes_closed": True, + "left_ear": 0.12, + "right_ear": 0.10, + "avg_ear": 0.11, + "threshold": 0.18, + } + + with patch.object( + verify_route.settings, "ANTISPOOF_BLOCK_ENFORCE", True + ), patch.object( + verify_route.settings, "ANTISPOOF_EAR_VETO_ENABLED", True + ), patch.object( + verify_route, "_evaluate_ear_liveness_safe", return_value=fake_ear_result, + ), patch.object( + verify_route, "_evaluate_antispoof_pipeline_safe", return_value=None, + ): + resp = client.post( + "/api/v1/verify", + data={"user_id": "test_user_ear"}, + files={"file": test_image_file}, + ) + + assert resp.status_code == 403, resp.text + detail = resp.json().get("detail") or {} + assert detail.get("error_code") == "ANTISPOOF_BLOCKED" + assert detail.get("reason") == "EYES_CLOSED" + assert detail.get("ear_liveness") == fake_ear_result + + +def test_ear_open_eyes_passes( + client: TestClient, mocks, test_image_file +) -> None: + """eyes_closed=False → response includes ear_liveness but verifies OK.""" + verify_uc, liveness_uc, storage, observation_repo = mocks + _wire(verify_uc, liveness_uc, storage, observation_repo) + + fake_ear_result = { + "eyes_closed": False, + "left_ear": 0.28, + "right_ear": 0.30, + "avg_ear": 0.29, + "threshold": 0.18, + } + + with patch.object( + verify_route.settings, "ANTISPOOF_BLOCK_ENFORCE", True + ), patch.object( + verify_route.settings, "ANTISPOOF_EAR_VETO_ENABLED", True + ), patch.object( + verify_route, "_evaluate_ear_liveness_safe", return_value=fake_ear_result, + ), patch.object( + verify_route, "_evaluate_antispoof_pipeline_safe", return_value=None, + ): + resp = client.post( + "/api/v1/verify", + data={"user_id": "test_user_ear_open"}, + files={"file": test_image_file}, + ) + + assert resp.status_code == 200, resp.text + body = resp.json() + assert body["verified"] is True + assert body["ear_liveness"] == fake_ear_result + + +def test_ear_helper_is_called_when_flag_on( + client: TestClient, mocks, test_image_file +) -> None: + """Regression guard for Bug 2: the EAR helper must actually be invoked + from /verify when the flag is on. Asserts the path is reached at least + once (previously zero — the wiring was missing). + """ + verify_uc, liveness_uc, storage, observation_repo = mocks + _wire(verify_uc, liveness_uc, storage, observation_repo) + + ear_mock = Mock(return_value=None) + + with patch.object( + verify_route.settings, "ANTISPOOF_BLOCK_ENFORCE", True + ), patch.object( + verify_route.settings, "ANTISPOOF_EAR_VETO_ENABLED", True + ), patch.object( + verify_route, "_evaluate_ear_liveness_safe", ear_mock, + ), patch.object( + verify_route, "_evaluate_antispoof_pipeline_safe", return_value=None, + ): + resp = client.post( + "/api/v1/verify", + data={"user_id": "test_user_ear_called"}, + files={"file": test_image_file}, + ) + + assert resp.status_code == 200, resp.text + assert ear_mock.call_count == 1, ( + "EAR helper must be invoked from /verify (Bug 2 regression guard); " + f"got {ear_mock.call_count} calls." + ) + + +def test_ear_helper_returns_none_when_flag_off( + client: TestClient, mocks, test_image_file +) -> None: + """When ANTISPOOF_EAR_VETO_ENABLED=False, the helper must short-circuit + and return None — the route must not even attempt MediaPipe import. + """ + verify_uc, liveness_uc, storage, observation_repo = mocks + _wire(verify_uc, liveness_uc, storage, observation_repo) + + with patch.object( + verify_route.settings, "ANTISPOOF_BLOCK_ENFORCE", True + ), patch.object( + verify_route.settings, "ANTISPOOF_EAR_VETO_ENABLED", False + ), patch.object( + verify_route, "_evaluate_antispoof_pipeline_safe", return_value=None, + ): + resp = client.post( + "/api/v1/verify", + data={"user_id": "test_user_ear_off"}, + files={"file": test_image_file}, + ) + + assert resp.status_code == 200, resp.text + assert resp.json()["ear_liveness"] is None diff --git a/tests/integration/test_verify_antispoof_wiring.py b/tests/integration/test_verify_antispoof_wiring.py index 46ecd90..5d20b3a 100644 --- a/tests/integration/test_verify_antispoof_wiring.py +++ b/tests/integration/test_verify_antispoof_wiring.py @@ -21,9 +21,14 @@ import numpy as np import pytest -# Mock DeepFace before any imports that depend on it. +# Mock DeepFace + Resemblyzer before any imports that depend on them. +# Both are CPU-heavy optional ML deps; resemblyzer isn't always installed +# on dev hosts (the bio container builds it from source). Without the +# mock, `app/main.py:lifespan` → `initialize_dependencies()` → SpeakerEmbedder +# crashes with ModuleNotFoundError before any of these tests can run. sys.modules.setdefault("deepface", Mock()) sys.modules.setdefault("deepface.DeepFace", Mock()) +sys.modules.setdefault("resemblyzer", Mock(VoiceEncoder=Mock())) from fastapi.testclient import TestClient @@ -73,6 +78,8 @@ def client(_module_client) -> TestClient: verify_route._antispoof_assembler = None verify_route._antispoof_assembler_init_failed = False verify_route._device_spoof_risk_evaluator = None + verify_route._face_landmarker_for_ear = None + verify_route._face_landmarker_for_ear_init_failed = False app.dependency_overrides.clear() yield _module_client @@ -81,6 +88,8 @@ def client(_module_client) -> TestClient: verify_route._antispoof_assembler = None verify_route._antispoof_assembler_init_failed = False verify_route._device_spoof_risk_evaluator = None + verify_route._face_landmarker_for_ear = None + verify_route._face_landmarker_for_ear_init_failed = False @pytest.fixture diff --git a/tests/integration/test_verify_challenge_endpoint.py b/tests/integration/test_verify_challenge_endpoint.py new file mode 100644 index 0000000..0dd4c2c --- /dev/null +++ b/tests/integration/test_verify_challenge_endpoint.py @@ -0,0 +1,132 @@ +"""Integration tests for /liveness/verify-challenge (Bug 4, 2026-05-12). + +The endpoint exists to give the web biometric-puzzles training surface a +server round-trip it MUST wait on before resolving its `onSuccess()`. The +checks are structural — full ML re-detection belongs to the multi-step +/liveness/verify flow. + +Each test pins one behavior: + * Happy path with sane inputs → 200, verified=true. + * Inverted timestamps → 200, verified=false (TIMESTAMPS_OUT_OF_ORDER). + * Duration < min → 200, verified=false (DURATION_TOO_SHORT). + * Duration > max → 200, verified=false (DURATION_TOO_LONG). + * Confidence below floor → 200, verified=false (CONFIDENCE_BELOW_FLOOR). + * Unknown action enum → 422 (FastAPI validation). +""" + +from __future__ import annotations + +import sys +from unittest.mock import Mock + +import pytest + +# Mock heavy ML deps before importing the app (matches the +# test_verify_antispoof_block_enforce.py / test_verify_antispoof_wiring.py +# convention — see those files' module-docstrings for context). +sys.modules.setdefault("deepface", Mock()) +sys.modules.setdefault("deepface.DeepFace", Mock()) +sys.modules.setdefault("resemblyzer", Mock(VoiceEncoder=Mock())) + +from fastapi.testclient import TestClient + +from app.main import app + + +@pytest.fixture(scope="module") +def client(): + with TestClient(app) as c: + yield c + + +def _payload(**overrides) -> dict: + """Build a baseline-valid payload; override fields per test.""" + base = { + "action": "blink", + "start_timestamp_ms": 1_000_000.0, + "end_timestamp_ms": 1_000_500.0, # +500ms + "confidence": 0.85, + "tenant_id": "tenant-x", + "user_id": "user-y", + "metrics": {"min_ear": 0.12}, + } + base.update(overrides) + return base + + +def test_happy_path_returns_verified_true(client: TestClient) -> None: + resp = client.post("/api/v1/liveness/verify-challenge", json=_payload()) + assert resp.status_code == 200, resp.text + body = resp.json() + assert body["verified"] is True + assert body["action"] == "blink" + assert 0.49 < body["duration_seconds"] < 0.51 + assert body["reason_code"] is None + + +def test_inverted_timestamps_reject(client: TestClient) -> None: + resp = client.post( + "/api/v1/liveness/verify-challenge", + json=_payload(start_timestamp_ms=2_000_000.0, end_timestamp_ms=1_999_000.0), + ) + assert resp.status_code == 200, resp.text + body = resp.json() + assert body["verified"] is False + assert body["reason_code"] == "TIMESTAMPS_OUT_OF_ORDER" + + +def test_duration_too_short_reject(client: TestClient) -> None: + # 50ms — below the 120ms floor. + resp = client.post( + "/api/v1/liveness/verify-challenge", + json=_payload(start_timestamp_ms=1_000_000.0, end_timestamp_ms=1_000_050.0), + ) + assert resp.status_code == 200, resp.text + body = resp.json() + assert body["verified"] is False + assert body["reason_code"] == "DURATION_TOO_SHORT" + + +def test_duration_too_long_reject(client: TestClient) -> None: + # 65s — above the 60s ceiling. + resp = client.post( + "/api/v1/liveness/verify-challenge", + json=_payload(start_timestamp_ms=1_000_000.0, end_timestamp_ms=1_065_000.0), + ) + assert resp.status_code == 200, resp.text + body = resp.json() + assert body["verified"] is False + assert body["reason_code"] == "DURATION_TOO_LONG" + + +def test_confidence_below_floor_reject(client: TestClient) -> None: + resp = client.post( + "/api/v1/liveness/verify-challenge", + json=_payload(confidence=0.3), + ) + assert resp.status_code == 200, resp.text + body = resp.json() + assert body["verified"] is False + assert body["reason_code"] == "CONFIDENCE_BELOW_FLOOR" + + +def test_unknown_action_is_422(client: TestClient) -> None: + resp = client.post( + "/api/v1/liveness/verify-challenge", + json=_payload(action="not_a_real_challenge"), + ) + # FastAPI/Pydantic enum validation → 422. + assert resp.status_code == 422, resp.text + + +def test_gesture_action_accepted(client: TestClient) -> None: + """Hand-modality actions (pinch, hand_flip, finger_count, ...) must + pass the structural checks the same as face actions.""" + resp = client.post( + "/api/v1/liveness/verify-challenge", + json=_payload(action="pinch", end_timestamp_ms=1_002_000.0), + ) + assert resp.status_code == 200, resp.text + body = resp.json() + assert body["verified"] is True + assert body["action"] == "pinch" diff --git a/tests/unit/test_deepface_sha256_required.py b/tests/unit/test_deepface_sha256_required.py new file mode 100644 index 0000000..a507e30 --- /dev/null +++ b/tests/unit/test_deepface_sha256_required.py @@ -0,0 +1,166 @@ +"""Tests for the DEEPFACE_SHA256_REQUIRED fail-fast behavior (Bug 5, 2026-05-12). + +Previously, an empty ``DEEPFACE_FACENET512_SHA256`` only logged a WARNING +and continued loading the model. A silent weight rotation under +``~/.deepface/weights/`` could change embeddings without anyone noticing. + +The new behavior: + * ``DEEPFACE_SHA256_REQUIRED=true`` (default) AND + ``ENVIRONMENT=production`` AND empty pin → RuntimeError at model-load. + * Any other combination keeps the old warn-and-skip behavior so dev + flows don't break. + +We don't load the actual ~400MB DeepFace weights here — we just exercise +the integrity-check function directly with a tmp weight file. +""" + +from __future__ import annotations + +import sys +from pathlib import Path +from unittest.mock import Mock, patch + +import pytest + +# Mock the heavy ML deps before importing the extractor module under test. +# `deepface_extractor.py` does `from deepface import DeepFace` at module +# load, which pulls TensorFlow on dev hosts that don't have it. The +# integrity-check function itself only uses hashlib + pathlib — no TF. +# +# IMPORTANT: we do NOT mock `tensorflow` as a whole — that pollutes other +# tests in the same pytest session (notably the integration tests that +# import `app.main` which calls `gpu.configure_gpu()` and iterates over +# `tf.config.list_physical_devices('GPU')`). Mocking only `deepface` +# and `tf_keras` is enough because they're the deps that +# deepface_extractor.py's top-level import chain pulls in. +sys.modules.setdefault("tf_keras", Mock()) +sys.modules.setdefault("deepface", Mock()) +sys.modules.setdefault("deepface.DeepFace", Mock()) + + +@pytest.fixture +def fake_weight_file(tmp_path): + """Create a small fake weight file so the integrity check has something + to digest. The actual hash doesn't matter for the missing-pin tests.""" + weight = tmp_path / "facenet512_weights.h5" + weight.write_bytes(b"fake-weight-content-for-integrity-testing-12345") + return weight + + +def _patch_settings(**kwargs): + """Patch attributes on the deepface_extractor module's `settings` import. + + The function imports `settings` lazily via `from app.core.config import settings` + inside `_verify_model_integrity`, so we need to patch the actual module + attribute used at call time. + """ + import app.core.config as cfg_module + return patch.multiple(cfg_module.settings, **kwargs) + + +def test_missing_pin_raises_in_prod_when_required(fake_weight_file): + """Empty pin + prod env + required flag → RuntimeError.""" + from app.infrastructure.ml.extractors.deepface_extractor import ( + _verify_model_integrity, + ) + + with patch( + "app.infrastructure.ml.extractors.deepface_extractor._resolve_weight_path", + return_value=fake_weight_file, + ), _patch_settings( + DEEPFACE_FACENET512_SHA256="", + DEEPFACE_SHA256_REQUIRED=True, + ENVIRONMENT="production", + ): + with pytest.raises(RuntimeError) as exc_info: + _verify_model_integrity("Facenet512") + assert "integrity pin missing" in str(exc_info.value).lower() + + +def test_missing_pin_warns_in_dev(fake_weight_file, caplog): + """Empty pin + dev env → log warning, no raise.""" + import logging + + from app.infrastructure.ml.extractors.deepface_extractor import ( + _verify_model_integrity, + ) + + with patch( + "app.infrastructure.ml.extractors.deepface_extractor._resolve_weight_path", + return_value=fake_weight_file, + ), _patch_settings( + DEEPFACE_FACENET512_SHA256="", + DEEPFACE_SHA256_REQUIRED=True, + ENVIRONMENT="development", + ), caplog.at_level(logging.WARNING): + # Must not raise. + _verify_model_integrity("Facenet512") + + assert any( + "skipped" in r.message.lower() and "no pinned hash" in r.message.lower() + for r in caplog.records + ) + + +def test_missing_pin_warns_when_required_false_in_prod(fake_weight_file, caplog): + """Opt-out flag must let prod boot with an empty pin (first-deploy scenario).""" + import logging + + from app.infrastructure.ml.extractors.deepface_extractor import ( + _verify_model_integrity, + ) + + with patch( + "app.infrastructure.ml.extractors.deepface_extractor._resolve_weight_path", + return_value=fake_weight_file, + ), _patch_settings( + DEEPFACE_FACENET512_SHA256="", + DEEPFACE_SHA256_REQUIRED=False, + ENVIRONMENT="production", + ), caplog.at_level(logging.WARNING): + _verify_model_integrity("Facenet512") + + assert any( + "skipped" in r.message.lower() for r in caplog.records + ) + + +def test_correct_pin_passes(fake_weight_file): + """Pinned + correct hash → returns silently (success path).""" + import hashlib + + from app.infrastructure.ml.extractors.deepface_extractor import ( + _verify_model_integrity, + ) + + expected = hashlib.sha256(fake_weight_file.read_bytes()).hexdigest() + + with patch( + "app.infrastructure.ml.extractors.deepface_extractor._resolve_weight_path", + return_value=fake_weight_file, + ), _patch_settings( + DEEPFACE_FACENET512_SHA256=expected, + DEEPFACE_SHA256_REQUIRED=True, + ENVIRONMENT="production", + ): + # Must not raise. + _verify_model_integrity("Facenet512") + + +def test_wrong_pin_raises_regardless_of_env(fake_weight_file): + """An explicit pin that doesn't match the file MUST raise everywhere.""" + from app.infrastructure.ml.extractors.deepface_extractor import ( + _verify_model_integrity, + ) + + with patch( + "app.infrastructure.ml.extractors.deepface_extractor._resolve_weight_path", + return_value=fake_weight_file, + ), _patch_settings( + DEEPFACE_FACENET512_SHA256="deadbeef" * 8, + DEEPFACE_SHA256_REQUIRED=False, + ENVIRONMENT="development", + ): + with pytest.raises(RuntimeError) as exc_info: + _verify_model_integrity("Facenet512") + assert "integrity check failed" in str(exc_info.value).lower() diff --git a/tests/unit/test_verification_threshold_aged.py b/tests/unit/test_verification_threshold_aged.py new file mode 100644 index 0000000..e196a72 --- /dev/null +++ b/tests/unit/test_verification_threshold_aged.py @@ -0,0 +1,76 @@ +"""Regression tests for VERIFICATION_THRESHOLD_AGED semantics (bug 2026-05-12). + +Background +---------- +The comparator in ``app/application/use_cases/verify_face.py`` is +``verified = distance < threshold``. Under that semantic: + + - HIGHER threshold ⇒ MORE LENIENT (the allowed-distance ceiling rises, + so more pairs pass). + - LOWER threshold ⇒ STRICTER (only near-zero distances match). + +Before 2026-05-12 the defaults were: + + VERIFICATION_THRESHOLD = 0.45 + VERIFICATION_THRESHOLD_AGED = 0.38 # ← bug: stricter, not lenient + +That made aged users get a HIGHER FRR — the opposite of the adaptive +feature's intent. This test pins: + + 1. The new default for ``VERIFICATION_THRESHOLD_AGED`` is higher than + the standard ``VERIFICATION_THRESHOLD``. + 2. Loading an inverted config (aged < standard) raises a + ``ValidationError`` so the regression cannot silently come back via + env-file edits. +""" + +from __future__ import annotations + +import pytest +from pydantic import ValidationError + +from app.core.config import Settings + + +def _make(**overrides) -> Settings: + """Build a Settings instance with no ambient .env interference.""" + return Settings(_env_file=None, **overrides) + + +def test_default_aged_threshold_is_more_lenient_than_standard(): + """Default config: aged threshold must allow GREATER distance, not less.""" + s = _make() + assert s.VERIFICATION_THRESHOLD_AGED > s.VERIFICATION_THRESHOLD, ( + f"Aged threshold ({s.VERIFICATION_THRESHOLD_AGED}) must be > standard " + f"({s.VERIFICATION_THRESHOLD}) under the 'distance < threshold' " + "comparator. A lower aged threshold makes aged users stricter." + ) + + +def test_aged_threshold_below_standard_is_rejected(): + """Inversion regression guard: aged < standard must fail config load.""" + with pytest.raises((ValidationError, ValueError)) as exc_info: + _make( + VERIFICATION_THRESHOLD=0.45, + VERIFICATION_THRESHOLD_AGED=0.38, # the pre-2026-05-12 buggy value + ) + msg = str(exc_info.value) + assert "VERIFICATION_THRESHOLD_AGED" in msg + assert "VERIFICATION_THRESHOLD" in msg + + +def test_aged_threshold_equal_to_standard_is_allowed(): + """Boundary case: equal thresholds are valid (no adaptive lenience, but + not inverted).""" + s = _make(VERIFICATION_THRESHOLD=0.45, VERIFICATION_THRESHOLD_AGED=0.45) + assert s.VERIFICATION_THRESHOLD == s.VERIFICATION_THRESHOLD_AGED == 0.45 + + +def test_aged_threshold_within_facenet_safe_band(): + """The new default must remain below the Facenet cosine-distance + operating-point ceiling (~0.6) to avoid blowing FAR past the model.""" + s = _make() + assert s.VERIFICATION_THRESHOLD_AGED <= 0.6, ( + "VERIFICATION_THRESHOLD_AGED above 0.6 risks FAR explosion under " + "Facenet cosine-distance distributions." + ) From dcbf725c1704bafb8a2f4dcfeb86129160305a2b Mon Sep 17 00:00:00 2001 From: Ahmet Abdullah Gultekin Date: Tue, 12 May 2026 18:27:37 +0000 Subject: [PATCH 2/2] fix(lint): rename ambiguous 'l' variable to 'pt' (ruff E741) --- app/api/routes/verification.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/app/api/routes/verification.py b/app/api/routes/verification.py index 7220c94..b4c1c16 100644 --- a/app/api/routes/verification.py +++ b/app/api/routes/verification.py @@ -245,7 +245,7 @@ def _evaluate_ear_liveness_safe(image_path: str) -> Optional[dict]: # Use the first detected face. The pixel-space conversion matches # the spoof-detector blink_analyzer contract. lm = np.array( - [[l.x * w, l.y * h, l.z] for l in face_landmarks[0]] + [[pt.x * w, pt.y * h, pt.z] for pt in face_landmarks[0]] ) if len(lm) < 468: return None