[Just for sharing] Local Spark-driven repro of concurrent-insert snapshot loss#614
Conversation
Mirrors TablesConcurrencyTest.concurrentInsertTest from the LinkedIn acceptance suite, but runs fully in-process against the embedded OpenHouse server via OpenHouseSparkITest. Two futures fire 5 INSERT INTO statements each against a fresh table; each returns HTTP 200, so the client-side committedRecord stays at 10. The final SELECT row count must equal committedRecord — any mismatch means a snapshot was silently dropped by the catalog's subtractive-merge commit path during the race. Reproduces on first run today (rows=9, expected=10), matching the production data-loss signature observed on WAR (BaseTransaction:466 WARN, "Failed to load metadata for a committed snapshot, skipping clean-up") and the chronic failures of TablesConcurrencyTest on WAR DR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace the hand-crafted constructed-metadata unit tests with PR linkedin#614's SparkConcurrentInsertFunctionalTest as the canonical regression target. The test drives concurrent INSERT statements through the embedded OpenHouseSparkITest server and asserts row_count == reported_success. Honest validation against the canonical test: linkedin/main (pre-fix): FAIL — rows=9, expected=10 this branch (with fix): FAIL — rows=9, expected=10 The abortIfWriterBaseDivergedFromCatalog check is invoked 11 times during the test run (one per insert) and never throws — in every invocation, writerClaimedBase (COMMIT_KEY) matches base.metadataFileLocation() modulo URI scheme. The race fires below the doCommit layer: both racing server threads have already loaded the same cached `base` view by the time their doCommits run, so the COMMIT_KEY-vs-base comparison can't detect the race. HTS-layer CAS catches one writer (returns 409) but one row is still permanently dropped — likely in the loser's client-side retry path or in a second-tier race. The fix as written addresses the BaseTransaction.applyUpdates silent- rebase variant of the bug class (the production fingerprint observed on WAR DR on 2026-05-25). It does not address the variant PR linkedin#614 exposes, where both threads' applyUpdates' refresh-and-rebase sees the same v0 and both proceed to a metadata write at the same nominal base. That variant requires a different defense — likely at HouseTableRepositoryImpl .save() or at the metadata file pointer-CAS step. This commit lands what we have so the PR carries both the canonical behavior test and the partial fix; the racing-CREATE / writer-claims- INITIAL paths added in the encapsulated helper are still useful guards for other code paths but do not close PR linkedin#614's window. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file used TableMetadataParser to write fake metadata.json bytes to disk and called doCommit directly with constructed metadata — the "direct file manipulation bs" the user explicitly rejected. PR linkedin#614's SparkConcurrentInsertFunctionalTest is the canonical behavior repro; delete this in favor of that. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR linkedin#614's SparkConcurrentInsertFunctionalTest passes 4/4 with this change. Root cause identified via end-to-end instrumentation: - Two racing INSERTs reach houseTableRepository.save() with the same cached `tableVersion` (HTS-level CAS input). - Both saves succeed at the JPA layer because HouseTable had no @Version field, so Spring Data JPA's em.merge() did not emit a versioned UPDATE. - Result: last writer overwrites HTS pointer, loser's metadata.json is orphaned, loser's data file is unreachable. 10 inserts report success but only 9 rows visible. Adding @Version makes em.merge produce UPDATE house_table SET ..., version = ? WHERE id = ? AND version = ? which the H2 backend's MVCC enforces. The second writer's UPDATE matches 0 rows -> ObjectOptimisticLockingFailureException -> wrapped as 409 -> Iceberg client retries with fresh base -> data lands. Validated: Before: test fails (rows=9, expected=10) - reliable repro After: test passes 4 consecutive runs This complements PR linkedin#611's catalog-level abortIfWriterBaseDivergedFromCatalog check (the BaseTransaction.applyUpdates silent-rebase variant of the bug class). Together they close two distinct races in the same bug class: - Silent-rebase variant: covered by the doCommit-level CAS - HTS-save variant: covered by this @Version field Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Why this test isn't useful for confirming a prod bug The test's plumbing OpenHouseSparkITest boots SpringH2TestApplication which does NOT scan com.linkedin.openhouse.housetables.* — the real HTS service is absent. In its place, doCommit → HouseTablesH2Repository.save() → JPA.merge() on H2 → done Production goes: doCommit → HouseTableRepositoryImpl.save() → HTTP PUT → HTS service → Two completely different code paths past houseTableRepository.save(). The test skips the HTS service, the version mapper, the HTTP boundary, the real DB. What the race in the test actually proves Two threads both reach HouseTablesH2Repository.save() at the same (tableId, databaseId). JPA merges both rows. No CAS, last writer wins. The first writer's metadata.json is This proves: the H2 fixture is missing the CAS that production's HTS service has. It does not prove that the production path has a race — because the production path has |
…remove dead @Version code PR linkedin#614's test reproduces a race against the H2 test fixture (HouseTablesH2Repository, @primary in tests). The fixture short-circuits the production HTS service: no UserTableVersionMapper, no UserTableRow @Version. The race it catches doesn't exist on the prod MySQL+HTS path. Removing. Restore testDoCommitAbortsOnStaleClaimedBase from PR linkedin#611 (production-realistic shape, writes a real metadata.json on disk so the URI-normalized path comparison runs). Remove the commented-out @Version block on HouseTable; that path was the wrong layer.
## Summary
Adds a catalog-level CAS in `OpenHouseInternalTableOperations.doCommit`
that aborts when the writer's declared base (`COMMIT_KEY`) does not
match the catalog's current persisted base — closing the
`BaseTransaction.applyUpdates` silent-rebase variant of the stale-base
lost-update bug (incident-12185).
## Mechanism
1. A writer's transaction stages `COMMIT_KEY = T_X` and a
`SNAPSHOTS_JSON_KEY` payload computed against `T_X`.
2. A racing commit advances the catalog `T_X → T_Y`, adding a snapshot.
3. `BaseTransaction.applyUpdates` silently refreshes the in-flight base
to `T_Y` and re-applies the staged update — re-stamping `COMMIT_KEY =
T_X` on top of `T_Y` while leaving the stale `SNAPSHOTS_JSON_KEY`.
4. `doCommit` runs with `base = T_Y` but `COMMIT_KEY = T_X`. Without the
check, the subtractive snapshot merge computes `toRemove =
T_Y.snapshots() − stale payload = {racing snapshot}` and silently drops
it.
The fix reads `COMMIT_KEY` before `failIfRetryUpdate` strips it,
URI-normalizes both paths via Hadoop `Path`, and throws
`CommitFailedException` on mismatch so Iceberg retries against the fresh
base.
## Scope of the check
- **Aborts** when `COMMIT_KEY` is a concrete location differing from the
catalog base, or is `INITIAL_VERSION`.
- **Does not defend** commits that leave `COMMIT_KEY` unset — wholesale
replace/create (`replaceTable`, stage-create, stage-replace) are
authoritative over the snapshot set, so there is no stale base to
compare against.
## Changes
Bug fix + unit test, internal catalog only (2 files). `doCommit` may now
throw `CommitFailedException` on a stale-base commit that previously
silently dropped a racing snapshot.
## Testing
Unit test `testDoCommitMustAbortStaleBaseRebaseToPreventSnapshotLoss` in
`OpenHouseInternalTableOperationsTest` round-trips the post-refresh
`TableMetadata` through `TableMetadataParser` so
`base.metadataFileLocation()` is non-null (matching a loaded-from-disk
base) and the URI-normalized comparison runs. Asserts
`CommitFailedException` is thrown and `houseTableRepository.save` is
never invoked.
```
./gradlew :iceberg:openhouse:internalcatalog:test --tests "*OpenHouseInternalTableOperationsTest"
```
### Not covered
A Spark concurrent-insert behavior test
(`SparkConcurrentInsertFunctionalTest`, PR #614) was explored and
removed: it only reproduces against the H2 test fixture, not production
MySQL+HTS. A prod-realistic black-box repro would need the real HTS app
or a deployed instance.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: mkuchenbecker <mkuchenbecker@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirrors TablesConcurrencyTest.concurrentInsertTest from the LinkedIn acceptance suite, but runs fully in-process against the embedded OpenHouse server via OpenHouseSparkITest. Two futures fire 5 INSERT INTO statements each against a fresh table; each returns HTTP 200, so the client-side committedRecord stays at 10. The final SELECT row count must equal committedRecord — any mismatch means a snapshot was silently dropped by the catalog's subtractive-merge commit path during the race.
Reproduces on first run today (rows=9, expected=10), matching the production data-loss signature observed on WAR (BaseTransaction:466 WARN, "Failed to load metadata for a committed snapshot, skipping clean-up") and the chronic failures of TablesConcurrencyTest on WAR DR.
Summary
Issue] Briefly discuss the summary of the changes made in this
pull request in 2-3 lines.
Changes
For all the boxes checked, please include additional details of the changes made in this pull request.
Testing Done
For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request.
Additional Information
For all the boxes checked, include additional details of the changes made in this pull request.