fix(auth): P0/P1 bundle from senior reviews 2026-05-12 (aud claim, MFA fail-open, OTP exhausted, hs-2026-04 revoked)#100
Conversation
SECURITY_REVIEW / BACKEND_REVIEW 2026-05-12 §aud-claim. Memory has missed the same fix three times (2026-04-20, 2026-05-02, 2026-05-12): only `iss` was bound on the parser, `aud` was never stamped or required. - New `app.security.jwt.audience` property (single string for now; multi-audience parser already supported via JJWT requireAudience). - JwtService stamps the audience claim on every minted access token (refresh tokens are opaque UUIDs in DB, not JWTs). - Parser requires the configured audience on extractAllClaims; empty config leaves it permissive (dev/test parity with the issuer config). - application-prod.yml pins aud=fivucsas-api; JwtService.assertProdAudienceIsConfigured fails fast at boot if the prod profile loads with a blank audience. - New JwtServiceAudienceTest pins the contract on both sides so this fix cannot disappear a fourth time. 9 tests cover mint, parse, prod boot assertion, and the empty-config permissive path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BACKEND_REVIEW 2026-05-12 §P0. PR #83 removed the 0.7 cosine confidence fallback in FaceAuthHandler and AuthController.verify2FAMethod but missed this N-step MFA handler — which is the one hosted login actually uses. INVESTIGATION_FAILOPEN_2026-05-07.md F3 documents this as a fail-open vector: a handler-side fallback silently overrides the bio processor's adaptive aging threshold (VERIFICATION_THRESHOLD_AGED_*). Mirrors the exact pattern PR #83 used in FaceAuthHandler: - Trust ONLY the bio processor's `verified` field. - Missing/null `verified` is a hard reject + ERROR log. - Cache user-id once to keep the entity.User boundary surface at a single call-site per method (UserDomainBoundary ratchet). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SECURITY_REVIEW / BACKEND_REVIEW 2026-05-12 §OTP-exhausted. NIST 800-63B
§5.1.1.2 5-strike lockout is correct at primitive level (OtpService.
validateWithResult returns exhausted=true on the 5th wrong guess) but all
5 MFA OTP call-sites used the boolean validate() overload and threw the
flag away — users waited 5min for TTL instead of getting an immediate
"send another OTP" prompt.
Call-sites switched to validateWithResult:
- EmailOtpAuthHandler (N-step flow)
- SmsOtpAuthHandler (N-step flow, local Redis fallback path)
- EmailOtpVerifyMfaStepHandler (N-step MFA, hosted-login path)
- AuthController.verify2FAMethod SMS_OTP arm
- AuthController.verify2FAMethod EMAIL_OTP arm
On exhausted=true each call-site throws OtpAttemptsExhaustedException.
GlobalExceptionHandler now maps the exception to HTTP 429 with
{errorCode:"OTP_ATTEMPTS_EXHAUSTED", action:"resend", remainingAttempts:0}
plus Retry-After: 300 (RFC 6585 §4). Mirrors the existing contract from
OtpController. verify2FAMethod's catch (Exception) had to be split so the
OtpAttemptsExhaustedException re-throws instead of being collapsed into a
500.
Test updates: existing handler tests stubbed otpService.validate(); switched
to validateWithResult with the structured result (valid/invalid/notFound).
ArchUnit baseline refreshed for shifted line numbers (no net new violations
except +1 in verify2FAMethod for the new catch arm).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
CI note (2026-05-12) The failing Maven unit tests pass: 1271 passed, 0 failed, 58 Testcontainers-gated tests skipped. GitGuardian + scan pass. |
|
Update on #101: my fixture resolves the first V40 failure (the |
VerifyMfaStepService caught the handler's OtpAttemptsExhaustedException in its generic catch(RuntimeException) blocks and collapsed it into a 200 ERROR, defeating the rate-limit contract on the exact path hosted login uses. Re-throw the exception ahead of those catches so GlobalExceptionHandler maps it to 429 + Retry-After. Adds a regression test (now 16 passing). Addresses the senior-review finding on PR #100. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ision) (#101) * test(flyway): V39_5 fixture renames audit_logs_pkey to unblock V40 in CI V40 (partition audit_logs) renames audit_logs to audit_logs_legacy, then creates a new partitioned audit_logs root with a composite PK named audit_logs_pkey. On a fresh Testcontainers DB this fails with: ERROR: relation "audit_logs_pkey" already exists Location: db/migration/V40__partition_audit_logs.sql because PG keeps the V5-declared inline PK index/constraint name attached to the renamed (legacy) table. The new composite-PK ADD collides on the shared name space. Prod does not hit this: V40 was BASELINE-SKIPPED in 2026-04 (see V57 header comment + DB review 2026-04-30 §4), so the migration is recorded as success=t in flyway_schema_history but never ran. Prod's audit_logs is still relkind='r' and V57 (idempotent partman-aware partitioning) is the prod-side conversion path; it's fail-soft when partman is missing (current pgvector/pgvector:pg17 image) so deploys are safe. CI runs every migration end-to-end on a fresh DB, so it hits the V40 PG name collision. The Integration tests (Testcontainers) CI job has been red on every PR (including #99 V61 and #100 P0 bundle) since pre-existing 2026-05-04 baseline; it's continue-on-error so it does not block merges but is permanent noise on every PR's checks page. Fix: test-only fixture V39_5 loaded only by the integration profile via spring.flyway.locations=classpath:db/migration,classpath:db/test-fixtures. It renames audit_logs_pkey -> audit_logs_v5_pkey before V40 runs, freeing the global name. Idempotent: only renames when audit_logs is still a plain heap with the V5-default constraint name. Matches the V28_5__align_default_login_flow_for_v29.sql precedent exactly (test-only fixture pre-fixing state for a checksum-locked production migration). Zero prod surface area: the file lives in src/test/resources/db/test-fixtures/ which is never on the prod classpath. Verified locally against pgvector/pgvector:pg17: - V5-style CREATE TABLE + V40-style RENAME+CREATE+ADD-CONSTRAINT reproduces ERROR exactly. - Same sequence with V39_5 inserted between V5 and V40 succeeds; both the legacy table (audit_logs_v5_pkey) and the new partitioned root (audit_logs_pkey) coexist with distinct constraint names. - Idempotency confirmed: re-running V39_5 against a post-V40 state falls through the IF guards and is a no-op. Why not edit V40 in place: three previous in-place V40 fixes (commits 2b00a7b, a30287b, 81101f0) were each reverted (43d0050, 614a94f, f404d12) because V40's checksum is locked in prod flyway_schema_history. Why not a new V62 production migration: V57 already provides the prod-side path (idempotent heap->partitioned conversion + partman setup, fail-soft when partman absent). A V62 would duplicate V57. The CI-side problem is purely the V40 in-place hard failure, addressed surgically by this test fixture. Unblocks the Integration tests check on PR #99 (V61 audit_logs.tenant_id NOT NULL) and #100 (P0/P1 senior-review bundle). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(gitleaks): bump application-integration.yml line refs after V39_5 yaml edit The previous commit added 6 lines of comment to the spring.flyway section of application-integration.yml, shifting the test-only jwt.secret / totp.enc-key lines from 55,62 to 61,68. Update the .gitleaksignore fingerprints to match so the scan job stays green. Secrets unchanged (both are pre-existing test-only fixtures, allowlisted since the file was added). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Four P0/P1 fixes from
/opt/projects/SECURITY_REVIEW_2026-05-12.mdand/opt/projects/BACKEND_REVIEW_2026-05-12.md.Bug 1 (P0) — JWT
audclaim not shipped (4th time)Memory has missed this fix three times (2026-04-20, 2026-05-02, 2026-05-12). On HEAD only
isswas bound on the parser;audwas never stamped or required.app.security.jwt.audienceproperty (single string for now; multi-audience parser supported via JJWTrequireAudience).JwtServicestamps the audience claim on every minted access token. Refresh tokens are opaque UUIDs in DB (V55/V60), not JWTs — no audience needed there.expectedIssuer).application-prod.ymlpinsaud=fivucsas-api.JwtService.assertProdAudienceIsConfiguredfails fast at boot if the prod profile loads with a blank audience.JwtServiceAudienceTest(9 tests) pins mint, parse, prod boot assertion, and the empty-config permissive path so this fix cannot disappear a fourth time.Bug 2 (P0) —
FaceVerifyMfaStepHandlercosine fallbackPR #83 removed the 0.7 cosine confidence fallback in
FaceAuthHandlerandAuthController.verify2FAMethodbut missed this N-step MFA handler — which is the one hosted login actually uses. INVESTIGATION_FAILOPEN_2026-05-07.md F3.Mirrors PR #83's pattern: trust only the bio processor's
verifiedfield, hard-reject when missing, cacheuser.getId()once to stay inside the UserDomainBoundary ratchet.Bug 3 (P0) — 5 MFA OTP call-sites swallowing
exhaustedNIST 800-63B 5-strike lockout is correct at the primitive level (
OtpService.validateWithResult) but all 5 MFA OTP call-sites used the booleanvalidate()overload and threw the flag away. Users had to wait 5min for TTL instead of getting an immediate "send another OTP."Call-sites switched to
validateWithResult:EmailOtpAuthHandlerSmsOtpAuthHandler(local-Redis fallback only — Twilio Verify path enforces its own counter)EmailOtpVerifyMfaStepHandlerAuthController.verify2FAMethodSMS_OTP armAuthController.verify2FAMethodEMAIL_OTP armOtpAttemptsExhaustedExceptionnow maps to HTTP 429 with{errorCode:"OTP_ATTEMPTS_EXHAUSTED", action:"resend", remainingAttempts:0}+Retry-After: 300(RFC 6585 §4) via a newGlobalExceptionHandlerarm.verify2FAMethod'scatch (Exception)now splits on this exception so it re-throws rather than being collapsed into a 500.Bug 4 (P1) — Revoke leaked HS512 kid
hs-2026-04Bundled into the JWT commit because both fixes touch the same
JwtService+application-prod.ymlpaths:app.security.jwt.allow-hs512: falsepinned explicitly inapplication-prod.yml(was relying on the application.yml default).app.security.jwt.revoked-kidsconfig (comma-separated). Any JWT carrying a kid in this set is rejected during validation regardless ofallow-hs512state. Defence-in-depth even if the rollback flag flips back on.application-prod.ymlrevokeshs-2026-04(the leakedf8ee668:.env.gcpkid).JwtServiceAudienceTest.revokedKidRejectedEvenIfHs512Allowedproves it: HS512 token signed withhs-2026-04is rejected even whenallowHs512=true.Commits
9b58250— fix(jwt): bind and validate aud claim (also revokes hs-2026-04 — same files)6e955c3— fix(mfa): close cosine fallback in FaceVerifyMfaStepHandler9e4e772— fix(otp): propagate exhausted state through 5 MFA call-sites(Per spec the 4th commit would have been
security(jwt): revoke hs-2026-04 kid, but the revoked-kid code logic andapplication-prod.ymllines were so tightly coupled with the aud commit that splitting them would have created a no-op file split. The behavior is fully tested underJwtServiceAudienceTest.revokedKidRejected*.)Test plan
mvn -T 2C test— 1271 passed, 0 failed, 58 skipped (skipped are pre-existing Testcontainers-gated integration tests).permitAllinSecurityConfig.java— no changes (no new anonymous routes).git diff SecurityConfig.javareturns empty.application.yml+application-prod.yml(Python yaml.safe_load).UserDomainBoundaryTest— baseline refreshed for line shifts; net +1 violation inverify2FAMethodfor the newcatch (OtpAttemptsExhaustedException)arm (cached asString exhaustedUidso the arm contributes only onegetId()call).feedback_verify_completion_claims):grep "audience" src/main/java/com/fivucsas/identity/security/JwtService.java— present in mint + parse + boot assert.grep "validateWithResult" src/main/java/com/fivucsas/identity/application/service/{handler,mfa/handler}/*.java src/main/java/com/fivucsas/identity/controller/AuthController.java— all 5 call-sites flipped.Operator notes
APP_SECURITY_JWT_AUDIENCEenv var: empty inapplication.yml,fivucsas-apiinapplication-prod.yml. No.env.prodchange required — the prod profile literal is the boot-time default.audclaim. Frontends will see 401s and silently refresh via the refresh-token rotation flow (V55). No user-visible logout expected, but operator should monitor/refreshtraffic spike for ~15 min post-deploy.Cross-references
/opt/projects/SECURITY_REVIEW_2026-05-12.md— §aud-claim, §P1 revoked kid/opt/projects/BACKEND_REVIEW_2026-05-12.md— §FaceVerifyMfaStepHandler, §OTP-exhausted🤖 Generated with Claude Code