Skip to content

feat(db): V61 audit_logs.tenant_id NOT NULL (lock down P1 from senior reviews 2026-05-12)#99

Merged
ahmetabdullahgultekin merged 1 commit into
mainfrom
fix/2026-05-12-infra-hygiene
May 28, 2026
Merged

feat(db): V61 audit_logs.tenant_id NOT NULL (lock down P1 from senior reviews 2026-05-12)#99
ahmetabdullahgultekin merged 1 commit into
mainfrom
fix/2026-05-12-infra-hygiene

Conversation

@ahmetabdullahgultekin

Copy link
Copy Markdown
Contributor

Summary

  • V61 migration: ALTER TABLE audit_logs ALTER COLUMN tenant_id SET NOT NULL with a pre-flight DO block that fails loudly via RAISE EXCEPTION if any NULL rows slipped in between V59 (2026-05-11) and V61 application.
  • JPA mapping AuditLog.tenantId annotated with @Column(nullable = false) so Hibernate ddl-auto=validate stays consistent with the new DB constraint.

Why

V59 backfilled NULLs (140 -> 0 / 1168 total) and added the writer-side sentinel UUID for anonymous events. V59 deliberately left the column nullable so straggler writers would surface as audit.publish.failure metrics rather than 500s during the soak window. That window has now elapsed (one week clean, verified in the 2026-05-11 session notes). V61 closes the gate.

The migration does NOT set a column DEFAULT — sentinel writes are the writer's responsibility (AuditLogAdapter). A DEFAULT would hide writer regressions by silently substituting the sentinel instead of failing fast on NotNullViolation.

Senior-review trail

  • DB review 2026-05-12 (P1): audit_logs.tenant_id still nullable.
  • Backend review 2026-05-12: writer-side sentinel paired with NULL backfill but no constraint.

Test plan

  • Spin up a clean test DB pointing Flyway at the migration tree, run flyway info and confirm V61 is PENDING.
  • flyway migrate, confirm success=t for V61 in flyway_schema_history.
  • \d+ audit_logs in psql — confirm tenant_id column is NOT NULL.
  • Negative path: on a test DB pre-V61, manually UPDATE audit_logs SET tenant_id = NULL WHERE id = (SELECT id FROM audit_logs LIMIT 1); then run V61 — confirm Flyway fails with the RAISE EXCEPTION message and the column stays nullable.
  • Boot identity-core-api with ddl-auto=validate against a post-V61 DB — confirm no schema-mismatch warnings on AuditLog.tenantId.
  • Re-apply V61 to an already-migrated DB — confirm it is a no-op (constraint already in place).

Operator note

Companion items requiring operator hands are in the parent monorepo PR OPERATOR_ACTIONS_2026-05-12.md.

🤖 Generated with Claude Code

… reviews 2026-05-12)

V59 backfilled audit_logs.tenant_id (140 NULLs -> 0 / 1168 total) and added a
writer-side sentinel UUID for anonymous events. The column was deliberately
left nullable so any straggler writer would surface as a metric rather than a
500 during the soak window.

That window has elapsed (one week clean, verified 2026-05-11 session notes).
V61 closes the gate:

* Pre-flight DO block re-counts NULLs and RAISE EXCEPTION if any are present
  (Flyway fails the deploy rather than silently mutating prod inside a
  constraint migration).
* ALTER COLUMN ... SET NOT NULL. Metadata-only change on PG12+ when zero
  NULLs exist; no full table rewrite.
* No column DEFAULT — sentinel writes are the writer's responsibility
  (AuditLogAdapter). A DEFAULT would hide writer regressions.

JPA mapping AuditLog.tenantId is updated with @column(nullable = false) so
ddl-auto=validate stays consistent with the new DB constraint.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@ahmetabdullahgultekin

Copy link
Copy Markdown
Contributor Author

CI note (2026-05-12)

The failing Integration tests (Testcontainers) job is pre-existing baseline rot, not a regression from this PR. Root cause:

Script V40__partition_audit_logs.sql failed

V40 (audit_logs partman partitioning, 2026-04-20) requires the pg_partman extension. Production uses the bare pgvector/pgvector:pg17 image which lacks partman, so V40's first guard hits RAISE WARNING + RETURN and silently no-ops — that's exactly the silent-no-op the backend reviewer flagged today, and it's item 1 of OPERATOR_ACTIONS_2026-05-12.md (Rollingcat-Software/FIVUCSAS#67).

The Testcontainers postgres image used by CI is also bare, but Flyway in CI runs with stricter settings than prod (the RAISE WARNING path likely surfaces as a hard error in this configuration), so the migration fails the whole chain — including V61.

This is unblocked by either:

  • Operator action item 1 (custom postgres image with partman), or
  • A follow-up PR that wraps V40 in an IF EXISTS pg_extension('pg_partman') precheck

Unit tests (Maven test) pass; GitGuardian + scan pass.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@ahmetabdullahgultekin

Copy link
Copy Markdown
Contributor Author

FYI: #101 fixes the V40 audit_logs_pkey collision that's causing the Integration tests (Testcontainers) job to fail on this PR. Once #101 merges and you rebase, this PR's IT check should go green.

@ahmetabdullahgultekin

Copy link
Copy Markdown
Contributor Author

Update on #101: my fixture resolves the first V40 failure (the audit_logs_pkey collision), but CI then surfaces a SECOND V40 bug — || operator inside COMMENT ON TABLE on line 270, which is invalid PG grammar. See #101 description for details. Fully fixing the IT job for your PR needs operator approval to either (a) edit V40 in place + flyway:repair on prod, or (b) keep accepting continue-on-error. #101 lands progress and clear diagnostics either way.

ahmetabdullahgultekin added a commit that referenced this pull request May 28, 2026
…ision) (#101)

* test(flyway): V39_5 fixture renames audit_logs_pkey to unblock V40 in CI

V40 (partition audit_logs) renames audit_logs to audit_logs_legacy, then
creates a new partitioned audit_logs root with a composite PK named
audit_logs_pkey. On a fresh Testcontainers DB this fails with:

  ERROR: relation "audit_logs_pkey" already exists
  Location: db/migration/V40__partition_audit_logs.sql

because PG keeps the V5-declared inline PK index/constraint name attached
to the renamed (legacy) table. The new composite-PK ADD collides on the
shared name space.

Prod does not hit this: V40 was BASELINE-SKIPPED in 2026-04 (see V57
header comment + DB review 2026-04-30 §4), so the migration is recorded
as success=t in flyway_schema_history but never ran. Prod's audit_logs
is still relkind='r' and V57 (idempotent partman-aware partitioning) is
the prod-side conversion path; it's fail-soft when partman is missing
(current pgvector/pgvector:pg17 image) so deploys are safe.

CI runs every migration end-to-end on a fresh DB, so it hits the V40 PG
name collision. The Integration tests (Testcontainers) CI job has been
red on every PR (including #99 V61 and #100 P0 bundle) since pre-existing
2026-05-04 baseline; it's continue-on-error so it does not block merges
but is permanent noise on every PR's checks page.

Fix: test-only fixture V39_5 loaded only by the integration profile via
spring.flyway.locations=classpath:db/migration,classpath:db/test-fixtures.
It renames audit_logs_pkey -> audit_logs_v5_pkey before V40 runs, freeing
the global name. Idempotent: only renames when audit_logs is still a
plain heap with the V5-default constraint name.

Matches the V28_5__align_default_login_flow_for_v29.sql precedent exactly
(test-only fixture pre-fixing state for a checksum-locked production
migration). Zero prod surface area: the file lives in
src/test/resources/db/test-fixtures/ which is never on the prod classpath.

Verified locally against pgvector/pgvector:pg17:
  - V5-style CREATE TABLE + V40-style RENAME+CREATE+ADD-CONSTRAINT
    reproduces ERROR exactly.
  - Same sequence with V39_5 inserted between V5 and V40 succeeds; both
    the legacy table (audit_logs_v5_pkey) and the new partitioned root
    (audit_logs_pkey) coexist with distinct constraint names.
  - Idempotency confirmed: re-running V39_5 against a post-V40 state
    falls through the IF guards and is a no-op.

Why not edit V40 in place: three previous in-place V40 fixes (commits
2b00a7b, a30287b, 81101f0) were each reverted (43d0050, 614a94f,
f404d12) because V40's checksum is locked in prod flyway_schema_history.

Why not a new V62 production migration: V57 already provides the
prod-side path (idempotent heap->partitioned conversion + partman setup,
fail-soft when partman absent). A V62 would duplicate V57. The
CI-side problem is purely the V40 in-place hard failure, addressed
surgically by this test fixture.

Unblocks the Integration tests check on PR #99 (V61 audit_logs.tenant_id
NOT NULL) and #100 (P0/P1 senior-review bundle).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(gitleaks): bump application-integration.yml line refs after V39_5 yaml edit

The previous commit added 6 lines of comment to the spring.flyway section
of application-integration.yml, shifting the test-only jwt.secret /
totp.enc-key lines from 55,62 to 61,68. Update the .gitleaksignore
fingerprints to match so the scan job stays green. Secrets unchanged
(both are pre-existing test-only fixtures, allowlisted since the file
was added).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ahmetabdullahgultekin ahmetabdullahgultekin merged commit 3430330 into main May 28, 2026
9 of 12 checks passed
ahmetabdullahgultekin added a commit that referenced this pull request May 28, 2026
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants