Skip to content

fix(db): V39_5 test fixture — unblock V40 in CI (audit_logs_pkey collision)#101

Merged
ahmetabdullahgultekin merged 2 commits into
mainfrom
fix/2026-05-12-v40-partman-precheck
May 28, 2026
Merged

fix(db): V39_5 test fixture — unblock V40 in CI (audit_logs_pkey collision)#101
ahmetabdullahgultekin merged 2 commits into
mainfrom
fix/2026-05-12-v40-partman-precheck

Conversation

@ahmetabdullahgultekin

@ahmetabdullahgultekin ahmetabdullahgultekin commented May 12, 2026

Copy link
Copy Markdown
Contributor

Summary (UPDATED 2026-05-12 after first CI run)

V40 (partition_audit_logs.sql) has two latent bugs that surface when run end-to-end on a fresh Testcontainers DB. This PR resolves the first one structurally; the second is documented for operator decision because the brief's constraints do not permit a code-side fix.

Bug #1: audit_logs_pkey name collision (FIXED in this PR)

V5 declares audit_logs.id UUID PRIMARY KEY inline → PG auto-names the PK constraint and index audit_logs_pkey. V40 then:

  1. ALTER TABLE audit_logs RENAME TO audit_logs_legacy (constraint name is NOT renamed, stays attached to legacy)
  2. CREATE TABLE audit_logs ... PARTITION BY RANGE (created_at) (new heap)
  3. ADD CONSTRAINT audit_logs_pkey PRIMARY KEY (id, created_at) (collision)

ERROR: relation "audit_logs_pkey" already exists

Fix: test-only fixture V39_5__rename_audit_logs_pkey_for_v40.sql (loaded only by integration profile via spring.flyway.locations=classpath:db/migration,classpath:db/test-fixtures) renames audit_logs_pkeyaudit_logs_v5_pkey before V40 runs. Idempotent. Matches the V28_5 precedent exactly.

CI log from this PR's first run (commit 20d61ea) confirms the fixture executes correctly:

2026-05-12T21:05:03.115Z INFO ... DB: V39_5: renamed audit_logs_pkey -> audit_logs_v5_pkey to free the name for V40.

Bug #2: || operator inside COMMENT ON TABLE (NOT fixed; operator decision required)

After my fix freed the pkey name, V40 progressed further and surfaced a SECOND bug at line 270:

COMMENT ON TABLE audit_logs IS
    'Range-partitioned by created_at (monthly). Pre-created partitions: ... '
    || 'Schedule ensure_audit_logs_partition() ... '
    || 'PK is (id, created_at) — semantic change from V5 ...';

PG grammar for COMMENT ON ... IS requires a single string literal, not an expression. The || concatenation operator is invalid in DDL context:

ERROR: syntax error at or near "||"
Position: 172
Line: 270

I reproduced this locally against pgvector/pgvector:pg17:

COMMENT ON TABLE ttest IS 'a' || 'b' || 'c';
-- ERROR: syntax error at or near "||"

This means V40 cannot succeed on any clean DB anywhere. Prod escaped this because V40 was BASELINE-SKIPPED in 2026-04 (per V57 header comment + BACKEND_REVIEW_2026-04-30 §4); the migration is recorded as success=t in prod's flyway_schema_history but its body never ran.

Why this PR cannot also fix Bug #2

The task brief explicitly forbids:

  1. "Don't rewrite V40 in place (locked checksum in prod)" — would mismatch prod's recorded checksum + require manual flyway repair.
  2. "Don't change the flyway_schema_history directly in prod or in test fixtures" — the only way to mark V40 as success without running it.

Bug #2 has no purely additive fix:

  • Test fixture pre-creating partitioned state: V40's guards IF NOT EXISTS (audit_logs heap) THEN RAISE EXCEPTION would fire → V40 marked failure → subsequent migrations don't run. Same blocker.
  • Flyway baselineVersion=40: would also baseline V1-V39 (single point), leaving the test DB schemaless for V41+.
  • Flyway target=39: skips V40 AND V41+; loses the entire subsequent migration chain.
  • Flyway callbacks (beforeMigrate.sql): would need to manipulate flyway_schema_history, forbidden by brief.

Recommended next step (operator decision)

The narrowly-scoped fix is to edit V40 line 270-273 in place to use a single string literal (no ||), commit it, then run mvn flyway:repair against prod to refresh V40's checksum in flyway_schema_history. This is a 2-line code change + 1 operator command. The brief's "don't edit V40 in place" rule was written before Bug #2 was discovered and likely needs explicit operator approval to relax.

Alternative: accept current state — Integration tests stays continue-on-error indefinitely, but Bug #1 is now off the critical path. When the operator deploys Option A (custom pgvector+partman image, OPERATOR_ACTIONS_2026-05-12.md item 1) and rebuilds api, V57 will idempotently convert prod's heap → partitioned regardless of V40's state.

Why not edit V40 in place in THIS PR

Three previous in-place V40 patches were each reverted:

The reverts predate today's brief and may not have known about Bug #2. The brief's explicit "Don't rewrite V40 in place" is what I am honoring here. If the operator chooses to relax that on review, a 2-line follow-up patch + flyway repair clears CI fully.

Why not a new V62 production migration

V57 (audit_logs_pg_partman.sql, merged in PR #68) already provides the prod-side conversion path: idempotent detection of relkind='r', full conversion when pg_partman is available, fail-soft when it isn't. A V62 duplicating that logic would be dead weight. And V62 would never RUN in CI anyway, because V40 fails first.

Verification

Local SQL simulation (pre-CI):

NOTICE: V39_5: renamed audit_logs_pkey -> audit_logs_v5_pkey to free the name for V40.
ALTER TABLE  -- V40 RENAME TO audit_logs_legacy
CREATE TABLE -- V40 partitioned root
ALTER TABLE  -- V40 ADD CONSTRAINT audit_logs_pkey PRIMARY KEY (id, created_at) — succeeds
 result
--------
 PASS

CI run on this PR (commit 20d61ea):

Idempotency:

  • V39_5's IF guards check relkind='r' + constraint name match. Re-running against a post-V40 DB falls through and no-ops. Verified locally.

Coordination

Deployment notes (operator)

  • No prod migration changes. Test-resource-only change.
  • No image rebuild required for this PR specifically. Memory rule feedback_audit_delta_before_rebuild still applies whenever the next prod rebuild is scheduled.
  • The pre-existing prod-side work item — Option A custom pgvector+partman image (OPERATOR_ACTIONS_2026-05-12.md item 1) — is independent of this fix. V57's heap→partitioned conversion remains the prod-side path.

Test plan

🤖 Generated with Claude Code

V40 (partition audit_logs) renames audit_logs to audit_logs_legacy, then
creates a new partitioned audit_logs root with a composite PK named
audit_logs_pkey. On a fresh Testcontainers DB this fails with:

  ERROR: relation "audit_logs_pkey" already exists
  Location: db/migration/V40__partition_audit_logs.sql

because PG keeps the V5-declared inline PK index/constraint name attached
to the renamed (legacy) table. The new composite-PK ADD collides on the
shared name space.

Prod does not hit this: V40 was BASELINE-SKIPPED in 2026-04 (see V57
header comment + DB review 2026-04-30 §4), so the migration is recorded
as success=t in flyway_schema_history but never ran. Prod's audit_logs
is still relkind='r' and V57 (idempotent partman-aware partitioning) is
the prod-side conversion path; it's fail-soft when partman is missing
(current pgvector/pgvector:pg17 image) so deploys are safe.

CI runs every migration end-to-end on a fresh DB, so it hits the V40 PG
name collision. The Integration tests (Testcontainers) CI job has been
red on every PR (including #99 V61 and #100 P0 bundle) since pre-existing
2026-05-04 baseline; it's continue-on-error so it does not block merges
but is permanent noise on every PR's checks page.

Fix: test-only fixture V39_5 loaded only by the integration profile via
spring.flyway.locations=classpath:db/migration,classpath:db/test-fixtures.
It renames audit_logs_pkey -> audit_logs_v5_pkey before V40 runs, freeing
the global name. Idempotent: only renames when audit_logs is still a
plain heap with the V5-default constraint name.

Matches the V28_5__align_default_login_flow_for_v29.sql precedent exactly
(test-only fixture pre-fixing state for a checksum-locked production
migration). Zero prod surface area: the file lives in
src/test/resources/db/test-fixtures/ which is never on the prod classpath.

Verified locally against pgvector/pgvector:pg17:
  - V5-style CREATE TABLE + V40-style RENAME+CREATE+ADD-CONSTRAINT
    reproduces ERROR exactly.
  - Same sequence with V39_5 inserted between V5 and V40 succeeds; both
    the legacy table (audit_logs_v5_pkey) and the new partitioned root
    (audit_logs_pkey) coexist with distinct constraint names.
  - Idempotency confirmed: re-running V39_5 against a post-V40 state
    falls through the IF guards and is a no-op.

Why not edit V40 in place: three previous in-place V40 fixes (commits
2b00a7b, a30287b, 81101f0) were each reverted (43d0050, 614a94f,
f404d12) because V40's checksum is locked in prod flyway_schema_history.

Why not a new V62 production migration: V57 already provides the
prod-side path (idempotent heap->partitioned conversion + partman setup,
fail-soft when partman absent). A V62 would duplicate V57. The
CI-side problem is purely the V40 in-place hard failure, addressed
surgically by this test fixture.

Unblocks the Integration tests check on PR #99 (V61 audit_logs.tenant_id
NOT NULL) and #100 (P0/P1 senior-review bundle).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…_5 yaml edit

The previous commit added 6 lines of comment to the spring.flyway section
of application-integration.yml, shifting the test-only jwt.secret /
totp.enc-key lines from 55,62 to 61,68. Update the .gitleaksignore
fingerprints to match so the scan job stays green. Secrets unchanged
(both are pre-existing test-only fixtures, allowlisted since the file
was added).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@ahmetabdullahgultekin ahmetabdullahgultekin merged commit 22dc2ff into main May 28, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants