[Just for sharing] Local Spark-driven repro of concurrent-insert snapshot loss by dushyantk1509 · Pull Request #614 · linkedin/openhouse

dushyantk1509 · 2026-05-28T17:38:37Z

Mirrors TablesConcurrencyTest.concurrentInsertTest from the LinkedIn acceptance suite, but runs fully in-process against the embedded OpenHouse server via OpenHouseSparkITest. Two futures fire 5 INSERT INTO statements each against a fresh table; each returns HTTP 200, so the client-side committedRecord stays at 10. The final SELECT row count must equal committedRecord — any mismatch means a snapshot was silently dropped by the catalog's subtractive-merge commit path during the race.

Reproduces on first run today (rows=9, expected=10), matching the production data-loss signature observed on WAR (BaseTransaction:466 WARN, "Failed to load metadata for a committed snapshot, skipping clean-up") and the chronic failures of TablesConcurrencyTest on WAR DR.

Summary

Issue] Briefly discuss the summary of the changes made in this
pull request in 2-3 lines.

Changes

For all the boxes checked, please include additional details of the changes made in this pull request.

Testing Done

Manually Tested on local docker setup. Please include commands ran, and their output.
Added new tests for the changes made.
Updated existing tests to reflect the changes made.
No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
Some other form of testing like staging or soak time in production. Please explain.

For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.

Additional Information

Breaking Changes
Deprecations
Large PR broken into smaller PRs, and PR plan linked in the description.

For all the boxes checked, include additional details of the changes made in this pull request.

Mirrors TablesConcurrencyTest.concurrentInsertTest from the LinkedIn acceptance suite, but runs fully in-process against the embedded OpenHouse server via OpenHouseSparkITest. Two futures fire 5 INSERT INTO statements each against a fresh table; each returns HTTP 200, so the client-side committedRecord stays at 10. The final SELECT row count must equal committedRecord — any mismatch means a snapshot was silently dropped by the catalog's subtractive-merge commit path during the race. Reproduces on first run today (rows=9, expected=10), matching the production data-loss signature observed on WAR (BaseTransaction:466 WARN, "Failed to load metadata for a committed snapshot, skipping clean-up") and the chronic failures of TablesConcurrencyTest on WAR DR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Replace the hand-crafted constructed-metadata unit tests with PR linkedin#614's SparkConcurrentInsertFunctionalTest as the canonical regression target. The test drives concurrent INSERT statements through the embedded OpenHouseSparkITest server and asserts row_count == reported_success. Honest validation against the canonical test: linkedin/main (pre-fix): FAIL — rows=9, expected=10 this branch (with fix): FAIL — rows=9, expected=10 The abortIfWriterBaseDivergedFromCatalog check is invoked 11 times during the test run (one per insert) and never throws — in every invocation, writerClaimedBase (COMMIT_KEY) matches base.metadataFileLocation() modulo URI scheme. The race fires below the doCommit layer: both racing server threads have already loaded the same cached `base` view by the time their doCommits run, so the COMMIT_KEY-vs-base comparison can't detect the race. HTS-layer CAS catches one writer (returns 409) but one row is still permanently dropped — likely in the loser's client-side retry path or in a second-tier race. The fix as written addresses the BaseTransaction.applyUpdates silent- rebase variant of the bug class (the production fingerprint observed on WAR DR on 2026-05-25). It does not address the variant PR linkedin#614 exposes, where both threads' applyUpdates' refresh-and-rebase sees the same v0 and both proceed to a metadata write at the same nominal base. That variant requires a different defense — likely at HouseTableRepositoryImpl .save() or at the metadata file pointer-CAS step. This commit lands what we have so the PR carries both the canonical behavior test and the partial fix; the racing-CREATE / writer-claims- INITIAL paths added in the encapsulated helper are still useful guards for other code paths but do not close PR linkedin#614's window. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This file used TableMetadataParser to write fake metadata.json bytes to disk and called doCommit directly with constructed metadata — the "direct file manipulation bs" the user explicitly rejected. PR linkedin#614's SparkConcurrentInsertFunctionalTest is the canonical behavior repro; delete this in favor of that. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@Version

PR linkedin#614's SparkConcurrentInsertFunctionalTest passes 4/4 with this change. Root cause identified via end-to-end instrumentation: - Two racing INSERTs reach houseTableRepository.save() with the same cached `tableVersion` (HTS-level CAS input). - Both saves succeed at the JPA layer because HouseTable had no @Version field, so Spring Data JPA's em.merge() did not emit a versioned UPDATE. - Result: last writer overwrites HTS pointer, loser's metadata.json is orphaned, loser's data file is unreachable. 10 inserts report success but only 9 rows visible. Adding @Version makes em.merge produce UPDATE house_table SET ..., version = ? WHERE id = ? AND version = ? which the H2 backend's MVCC enforces. The second writer's UPDATE matches 0 rows -> ObjectOptimisticLockingFailureException -> wrapped as 409 -> Iceberg client retries with fresh base -> data lands. Validated: Before: test fails (rows=9, expected=10) - reliable repro After: test passes 4 consecutive runs This complements PR linkedin#611's catalog-level abortIfWriterBaseDivergedFromCatalog check (the BaseTransaction.applyUpdates silent-rebase variant of the bug class). Together they close two distinct races in the same bug class: - Silent-rebase variant: covered by the doCommit-level CAS - HTS-save variant: covered by this @Version field Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…inkedin#614

mkuchenbecker · 2026-05-28T22:10:45Z

Why this test isn't useful for confirming a prod bug

The test's plumbing

OpenHouseSparkITest boots SpringH2TestApplication which does NOT scan com.linkedin.openhouse.housetables.* — the real HTS service is absent. In its place,
HouseTablesH2Repository (in tables-test-fixtures, @primary) extends HouseTableRepository directly and delegates to JPA. So houseTableRepository.save() in the test goes:

doCommit → HouseTablesH2Repository.save() → JPA.merge() on H2 → done

Production goes:

doCommit → HouseTableRepositoryImpl.save() → HTTP PUT → HTS service →
UserTablesServiceImpl.putUserTable() → UserTableVersionMapper.toVersion() (string-CAS on metadataLocation) →
UserTableRow JPA.merge() on MySQL (with @Version optimistic lock) → done

Two completely different code paths past houseTableRepository.save(). The test skips the HTS service, the version mapper, the HTTP boundary, the real DB.

What the race in the test actually proves

Two threads both reach HouseTablesH2Repository.save() at the same (tableId, databaseId). JPA merges both rows. No CAS, last writer wins. The first writer's metadata.json is
orphaned. That's the 8-of-10 you see.

This proves: the H2 fixture is missing the CAS that production's HTS service has. It does not prove that the production path has a race — because the production path has
UserTableVersionMapper.toVersion() plus @Version on UserTableRow, both of which are in the call chain only when the real HTS service runs.

@Version

…remove dead @Version code PR linkedin#614's test reproduces a race against the H2 test fixture (HouseTablesH2Repository, @primary in tests). The fixture short-circuits the production HTS service: no UserTableVersionMapper, no UserTableRow @Version. The race it catches doesn't exist on the prod MySQL+HTS path. Removing. Restore testDoCommitAbortsOnStaleClaimedBase from PR linkedin#611 (production-realistic shape, writes a real metadata.json on disk so the URI-normalized path comparison runs). Remove the commented-out @Version block on HouseTable; that path was the wrong layer.

## Summary Adds a catalog-level CAS in `OpenHouseInternalTableOperations.doCommit` that aborts when the writer's declared base (`COMMIT_KEY`) does not match the catalog's current persisted base — closing the `BaseTransaction.applyUpdates` silent-rebase variant of the stale-base lost-update bug (incident-12185). ## Mechanism 1. A writer's transaction stages `COMMIT_KEY = T_X` and a `SNAPSHOTS_JSON_KEY` payload computed against `T_X`. 2. A racing commit advances the catalog `T_X → T_Y`, adding a snapshot. 3. `BaseTransaction.applyUpdates` silently refreshes the in-flight base to `T_Y` and re-applies the staged update — re-stamping `COMMIT_KEY = T_X` on top of `T_Y` while leaving the stale `SNAPSHOTS_JSON_KEY`. 4. `doCommit` runs with `base = T_Y` but `COMMIT_KEY = T_X`. Without the check, the subtractive snapshot merge computes `toRemove = T_Y.snapshots() − stale payload = {racing snapshot}` and silently drops it. The fix reads `COMMIT_KEY` before `failIfRetryUpdate` strips it, URI-normalizes both paths via Hadoop `Path`, and throws `CommitFailedException` on mismatch so Iceberg retries against the fresh base. ## Scope of the check - **Aborts** when `COMMIT_KEY` is a concrete location differing from the catalog base, or is `INITIAL_VERSION`. - **Does not defend** commits that leave `COMMIT_KEY` unset — wholesale replace/create (`replaceTable`, stage-create, stage-replace) are authoritative over the snapshot set, so there is no stale base to compare against. ## Changes Bug fix + unit test, internal catalog only (2 files). `doCommit` may now throw `CommitFailedException` on a stale-base commit that previously silently dropped a racing snapshot. ## Testing Unit test `testDoCommitMustAbortStaleBaseRebaseToPreventSnapshotLoss` in `OpenHouseInternalTableOperationsTest` round-trips the post-refresh `TableMetadata` through `TableMetadataParser` so `base.metadataFileLocation()` is non-null (matching a loaded-from-disk base) and the URI-normalized comparison runs. Asserts `CommitFailedException` is thrown and `houseTableRepository.save` is never invoked. ``` ./gradlew :iceberg:openhouse:internalcatalog:test --tests "*OpenHouseInternalTableOperationsTest" ``` ### Not covered A Spark concurrent-insert behavior test (`SparkConcurrentInsertFunctionalTest`, PR #614) was explored and removed: it only reproduces against the H2 test fixture, not production MySQL+HTS. A prod-realistic black-box repro would need the real HTS app or a deployed instance. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: mkuchenbecker <mkuchenbecker@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

mkuchenbecker mentioned this pull request May 28, 2026

fix(catalog): abort doCommit on stale-base divergence #612

Merged

mkuchenbecker added a commit to mkuchenbecker/openhouse that referenced this pull request May 28, 2026

remove abortIfWriterBaseDivergedFromCatalog; @Version alone closes PR l…

adb2b0d

…inkedin#614

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Just for sharing] Local Spark-driven repro of concurrent-insert snapshot loss#614

[Just for sharing] Local Spark-driven repro of concurrent-insert snapshot loss#614
dushyantk1509 wants to merge 1 commit into
linkedin:mainfrom
dushyantk1509:dushyantk1509/local-repro-concurrent-insert-test

dushyantk1509 commented May 28, 2026

Uh oh!

mkuchenbecker commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dushyantk1509 commented May 28, 2026

Summary

Changes

Testing Done

Additional Information

Uh oh!

mkuchenbecker commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants