[BDP-102028] feat(optimizer): [1/N] Optimizer Database#530
Merged
Conversation
Introduces the optimizer service module with: - MySQL/H2 schema for table_operations, table_stats, table_stats_history, and table_operations_history - JPA entities with JSON column support (vladmihalcea hibernate-types) - All model/DTO/enum types: OperationType, OperationStatus, TableStats, CompleteOperationRequest, JobResult, OperationMetrics, etc. - JPA AttributeConverters for JobResult and OperationMetrics JSON columns - MapStruct mapper (OptimizerMapper) for entity→DTO conversion - Spring Boot application shell and build wiring (settings.gradle, build.gradle dockerPrereqs) No repositories, controllers, or service layer yet — those follow in subsequent PRs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove OperationMetrics class and converter; stats are read directly from table_stats instead of duplicating into operations - Remove orphanFilesDeleted/orphanBytesDeleted from history entity, DTO, and schema; operation-specific data belongs in the result JSON - Add addedSizeBytes to CommitDelta for tracking write volume - Fix OperationType javadoc to describe current state, not roadmap - Fix TableOperationsHistoryRow javadoc: written on operation complete, not by Spark app directly - Add field comments to all DTOs and request objects Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Spring Data JPA repositories for all four optimizer tables with filtered query support. Includes tests exercising save/find, filtered queries, upsert semantics, and append-only history. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
17 tasks
17 tasks
mkuchenbecker
commented
Apr 6, 2026
mkuchenbecker
commented
Apr 6, 2026
mkuchenbecker
commented
Apr 6, 2026
mkuchenbecker
commented
Apr 6, 2026
mkuchenbecker
commented
Apr 6, 2026
Address PR review comments: rename findFiltered → find across all repos, remove redundant findByTableUuid/findByTableUuidSince from history repos, add explicit assertion to context test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Shared JPA entities and repositories for optimizer apps (analyzer, scheduler). All repos expose a single find method with optional filters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
These fields never belonged in the data model — remove them at the source rather than adding then deleting in a later PR. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Propagate CompleteOperationRequest orphan field removal. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This was referenced Apr 7, 2026
- Widen-to-tighten: VARCHAR(255) -> VARCHAR(128) for database_name and table_name across all entities and the schema, aligning with prod conventions (can always be widened later, not tightened). - Rename databaseId -> databaseName in TableStatsRow, TableStatsHistoryRow, TableStatsDto, TableStatsHistoryDto, and UpsertTableStatsRequest for consistency with the operations entities and DTOs. - Drop the unused metrics field from TableOperationsRow, TableOperationsDto, and the schema. Add a TODO note in the schema that per-operation metric columns will be added as operations are onboarded. - Rename submittedAt -> completedAt in TableOperationsHistoryRow, TableOperationsHistoryDto, and the schema (column submitted_at -> completed_at, index idx_submitted_at -> idx_completed_at). The history row is written when the complete endpoint is called, so the timestamp captures completion; submission time is already on table_operations.scheduled_at. - Change TableStatsHistoryRow.id from BIGINT auto-increment to VARCHAR(36) UUID, set by the caller, matching the other id-bearing entities. - Add @JsonIgnoreProperties(ignoreUnknown = true) to CommitDelta for consistency with TableStats and SnapshotMetrics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tion markSchedulingBatch returns only a count of rows transitioned; callers that need to know *which* rows they own must re-query. findClaimedIds takes the same id list + scheduledAt watermark passed to the UPDATE and returns the subset whose SCHEDULING transition matches that watermark — i.e. the rows this caller actually claimed in this call. Used by the scheduler to subset its bin to actually-claimed operations before submitting the Spark job; without this the scheduler can launch a job for ids another instance already owns and then incorrectly mark all of them SCHEDULED. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…review) Per @abhisheknath2011 review comment 3262776356: > "We could change all the internal model add Dto suffix something like > TableOperationsDto. This aligns with the existing services codebase." Renames (suffix added): - CompleteOperationRequest -> CompleteOperationRequestDto - UpsertTableStatsRequest -> UpsertTableStatsRequestDto - OperationType (enum) -> OperationTypeDto - OperationStatus (enum) -> OperationStatusDto - HistoryStatus (enum) -> HistoryStatusDto - TableStats (inner payload) -> TableStatsPayloadDto - TableStats.SnapshotMetrics -> TableStatsPayloadDto.SnapshotMetricsDto - TableStats.CommitDelta -> TableStatsPayloadDto.CommitDeltaDto Cross-reference updates inside api/model. Internal model layer (services/optimizer/.../model/) is intentionally unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…view) Per @abhisheknath2011 review comment 3262769497: > "Can we change the client side API to api.spec instead of api.model? > This also aligns with existing services." Mechanical package rename. The 10 api wire types move from services/optimizer/.../api/model/ to services/optimizer/.../api/spec/. No type or signature changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reversal of an earlier inconsistency surfaced by abhisheknath2011 in the PR #527 review thread on api/spec/HistoryStatusDto.java. The api wire types are the canonical contract; they should carry the canonical name. The internal-model types are transfer objects between layers and now carry the Dto suffix. api/spec/ — Dto stripped from class + filename (10 files): CompleteOperationRequestDto -> CompleteOperationRequest HistoryStatusDto -> HistoryStatus OperationStatusDto -> OperationStatus OperationTypeDto -> OperationType TableOperationsDto -> TableOperations TableOperationsHistoryDto -> TableOperationsHistory TableStatsDto -> TableStats TableStatsHistoryDto -> TableStatsHistory TableStatsPayloadDto -> TableStatsPayload UpsertTableStatsRequestDto -> UpsertTableStatsRequest model/ — Dto added to class + filename (8 files): HistoryStatus -> HistoryStatusDto OperationStatus -> OperationStatusDto OperationType -> OperationTypeDto Table -> TableDto TableOperation -> TableOperationDto TableOperationsHistory -> TableOperationsHistoryDto TableStats -> TableStatsDto TableStatsHistory -> TableStatsHistoryDto Both renames land on opt-0 because opt-0 owns api/spec/ and model/. Cascade up the stack in follow-up commits. Out of scope here: HistoryStatus enum value additions (CANCELED, QUEUED) also raised in the same review thread; separate semantic change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
# Conflicts: # services/optimizer/src/main/java/com/linkedin/openhouse/optimizer/model/TableOperationsHistoryDto.java
model.TableOperationDto grows a jobId field; api.TableOperations conversions copy it across the api ↔ model boundary. The api DTO already had the field; the model side was missing it. Relocated from opt-5 to its proper owner per the model-layer rule. Model ↔ db plumbing for the same field lands on opt-1 in a follow-up. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Companion to the opt-0 jobId field addition: now that model.TableOperationDto carries jobId, wire it through toRow/fromRow so the db row's job_id column round-trips through the model layer. Relocated from opt-5. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…nRequest Symbol rename only. The HistoryStatus enum (SUCCESS/FAILED) and the once-terminal semantics are unchanged; the endpoint's behavior is the same. Future broadening (CANCELED/QUEUED, idempotency, mid-lifecycle status changes) is a separate concern. Method names + URL path will follow on opt-2; Spark-app caller + docs follow on opt-5. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
mkuchenbecker
added a commit
that referenced
this pull request
May 20, 2026
…527) ## Optimizer Stack | PR | Content | |---|---| | #527 **(this)** | API and internal models | | #530 | Database Repos | | #531 | REST service | | #533 | Analyzer app | | #534 | Scheduler app | | #tbd | Spark BatchedOFD app | | #tbd | Infra, docker-compose, smoke test | ## Summary PR 0 of N in the optimizer stack. [Overall Project](https://docs.google.com/document/d/1oGQWkmlVw0HG-D4Nx37q0oUEQd53Ni4q5Q7h1voaZIQ/edit?tab=t.0#heading=h.vtl818e4m9f7) [Service Design doc](https://docs.google.com/document/d/1xYOD7iuPjZO05UWfT1Dkmf4lYNw4iOjqY10UT2X1FAg/edit?tab=t.0). Introduces the optimizer service API and internal model <img width="631" height="410" alt="image" src="https://github.com/user-attachments/assets/8833471d-069a-49a2-9a10-5eb7f3b96a72" /> ## Changes - [ ] Client-facing API Changes - [x] Internal API Changes - [ ] Bug Fixes - [x] New Features - [ ] Performance Improvements - [ ] Code Style - [ ] Refactoring - [ ] Documentation - [ ] Tests ## Testing Done - [ ] Manually Tested on local docker setup. Please include commands ran, and their output. - [ ] Added new tests for the changes made. - [ ] Updated existing tests to reflect the changes made. - [x] No tests added or updated. Please explain why. If unsure, please feel free to ask for help. - [ ] Some other form of testing like staging or soak time in production. Please explain. This PR contains only the data model (entities, DTOs, converters). Repository tests follow in PR 1. Verified: - `./gradlew :services:optimizer:compileJava` passes - `./gradlew compileJava` (full project) passes with no regressions - Spotless formatting passes # Additional Information - [ ] Breaking Changes - [ ] Deprecations - [x] Large PR broken into smaller PRs, and PR plan linked in the description. --------- Co-authored-by: mkuchenbecker <mkuchenbecker@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
abhisheknath2011
previously approved these changes
May 20, 2026
Member
abhisheknath2011
left a comment
There was a problem hiding this comment.
We need to create the new my schemas in all the DBs before before deploying this version internally.
Repo public API now:
- find(...) with Optional<T> filters + required Pageable, on all four repos
- updateBatch(ids, fromStatus, toStatus, Optional<Instant> scheduledAt,
Optional<String> jobId) — replaces markSchedulingBatch,
markScheduledBatch, markPendingBatch
- cancel(ids) — replaces cancelDuplicatePendingBatch; deletes by-id with
a defensive PENDING-only gate
- findLatest(opType, Pageable) — was findLatestPerTable
- history.find(tableUuid, Pageable) — was findByTableUuidOrderByCompletedAtDesc
Side-effect columns on updateBatch use COALESCE with Optional.empty() →
leave-unchanged. scheduledAt is not cleared on SCHEDULING → PENDING revert;
status is the source of truth and the watermark is overwritten on the next
claim.
@Modifying queries get flushAutomatically + clearAutomatically so the L1
cache reflects the change immediately (caught by the unit tests).
Spring Data @query can't share an "IS NULL OR IN :list" pattern (Hibernate
expands the list inline and the IS NULL check turns ungrammatical). The
find path uses two internal queries dispatched by the default method —
one with the ids predicate, one without.
Callers (service, analyzer, scheduler) update on opt-2..opt-4 in
follow-up commits.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Instant.now() on Linux CI carries nanoseconds; MySQL TIMESTAMP(6) and H2 in MySQL mode store microseconds. The scheduledAt = :scheduledAt predicate in find(...) compared nano-resolution param against micro-resolution stored value and missed. Local (macOS, micro-only) hid the bug. Truncate to ChronoUnit.MICROS at write time in the one repo test that exercises the watermark round-trip. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
abhisheknath2011
approved these changes
May 21, 2026
Member
Discussed offline. We should be good for now as we don't have k8s deployment spec for the new service and newly added code does not touch existing codepath. |
Collaborator
Author
|
We talked offline. Because there is no k8s deployment spec for the optimizer the mysql changes will not be deployed. This assumption would break deployments to test clusters if it is wrong, which will be detected automatically post-commit. This is strictly preferable because we can get to a working state before deploying and impacting production. |
mkuchenbecker
added a commit
that referenced
this pull request
May 22, 2026
## Optimizer Stack | PR | Content | |---|---| | #527 | Data Model | | #530 | Database Repos | | #531 **(this)** | REST service | | #533 | Analyzer app | | #534 | Scheduler app | | #tbd | Spark BatchedOFD app | | #tbd | Infra, docker-compose, smoke test | ## Summary PR 2 of N in the optimizer stack. Service layer and REST controllers for the optimizer service, plus the `apps/optimizer` shared module providing lightweight entity/repo copies for the analyzer and scheduler apps. ## Changes - [ ] Client-facing API Changes - [x] Internal API Changes - [ ] Bug Fixes - [x] New Features - [ ] Performance Improvements - [ ] Code Style - [ ] Refactoring - [ ] Documentation - [x] Tests **Service layer**: `OptimizerDataService` interface and `OptimizerDataServiceImpl` — CRUD operations, complete-operation lifecycle, stats upsert with history double-write, filtered queries. **Controllers**: `TableOperationsController`, `TableOperationsHistoryController`, `TableStatsController` — REST endpoints per the design doc API spec. **Shared module** (`apps/optimizer`): Lightweight entity and repository copies used by the analyzer and scheduler apps to read optimizer state directly from MySQL. ## Testing Done - [ ] Manually Tested on local docker setup. Please include commands ran, and their output. - [x] Added new tests for the changes made. - [ ] Updated existing tests to reflect the changes made. - [ ] No tests added or updated. Please explain why. If unsure, please feel free to ask for help. - [ ] Some other form of testing like staging or soak time in production. Please explain. H2 integration tests in `OptimizerDataServiceImplTest` (5 tests): - `completeOperation_writesHistoryFromOperationRow` — saves SCHEDULED row, completes it, asserts history DTO fields - `completeOperation_notFound_returnsEmpty` — completes nonexistent ID, asserts empty - `upsertTableStats_createsNewRow` — upserts new table, asserts DTO and repo row - `upsertTableStats_updatesExistingRow` — upserts twice, asserts overwrite with single row - `upsertTableStats_appendsHistoryOnEveryCall` — upserts twice, asserts 2 history rows ``` ./gradlew :services:optimizer:test # BUILD SUCCESSFUL — all 25 tests pass (repo tests from PR 1 + 5 new service tests) ``` # Additional Information - [ ] Breaking Changes - [ ] Deprecations - [x] Large PR broken into smaller PRs, and PR plan linked in the description. --------- Co-authored-by: mkuchenbecker <mkuchenbecker@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
mkuchenbecker
added a commit
that referenced
this pull request
May 27, 2026
## Optimizer Stack | PR | Content | |---|---| | #527 | Data Model | | #530 | Database Repos | | #531 | REST service | | #533 **(this)** | Analyzer app | | #534 | Scheduler app | | #599 | Spark BatchedOFD app | | #tbd | Infra, docker-compose, smoke test | ## Summary PR 3 of N in the optimizer stack. Introduces `apps/optimizer-analyzer`, a Spring Boot CommandLineRunner that evaluates every table in `table_stats` against pluggable `OperationAnalyzer` strategies. The first strategy, `OrphanFilesDeletionAnalyzer`, schedules OFD operations with 24h success / 1h failure retry cadence, a 6h SCHEDULED timeout, and a 5-strike circuit breaker. Key design choices: - Bulk-loads operations and history into maps (one query per type), then iterates the stats list — O(types) queries, not O(tables). - Uses the existing generic `find()` repository methods with null params. - Pure unit tests with Mockito — no Spring context needed. ## Changes - [ ] Client-facing API Changes - [ ] Internal API Changes - [ ] Bug Fixes - [x] New Features - [ ] Performance Improvements - [ ] Code Style - [ ] Refactoring - [ ] Documentation - [x] Tests **Core**: `AnalyzerRunner` — loads table_stats, pre-loads operations and history into maps, evaluates each table against all analyzers, circuit breaker logic. **Strategy interface**: `OperationAnalyzer` — `isEnabled(table)`, `shouldSchedule(table, currentOp, latestHistory)`, `getCircuitBreakerThreshold()`. **Cadence policy**: `CadencePolicy` — encapsulates time-based retry logic shared across operation types. **OFD analyzer**: `OrphanFilesDeletionAnalyzer` — enabled via `maintenance.optimizer.ofd.enabled` table property. ## Testing Done - [ ] Manually Tested on local docker setup. Please include commands ran, and their output. - [x] Added new tests for the changes made. - [ ] Updated existing tests to reflect the changes made. - [ ] No tests added or updated. Please explain why. If unsure, please feel free to ask for help. - [ ] Some other form of testing like staging or soak time in production. Please explain. 25 unit tests: - `AnalyzerRunnerTest` (7 tests) — eligible table insertion, cadence skip, disabled table, shouldSchedule=false, null UUID, circuit breaker trip, below-threshold pass - `OrphanFilesDeletionAnalyzerTest` (18 tests) — isEnabled variants, shouldSchedule for no-op/PENDING/SCHEDULING/SCHEDULED with history combinations ``` ./gradlew :apps:optimizer-analyzer:test # BUILD SUCCESSFUL — 25 tests pass ``` # Additional Information - [ ] Breaking Changes - [ ] Deprecations - [x] Large PR broken into smaller PRs, and PR plan linked in the description. --------- Co-authored-by: mkuchenbecker <mkuchenbecker@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Abhishek Nath <anath1@linkedin.com>
17 tasks
maluchari
added a commit
that referenced
this pull request
May 29, 2026
Reverts the 17 commits that landed on main after v0.5.417, bringing the tree back to exactly the v0.5.417 state. Squashed into a single revert commit for reviewability and to allow reinstating everything as one unit (revert this commit to bring all 17 changes back). Reverted commits (v0.5.417..main, newest first): - Revert #579 (HTS fields in table list api) (#610) - feat(optimizer): [3/N] Analyzer (#533) - [DataLoader] Handle Cast(Literal, TIMESTAMP/DATE/TIME) in scan optimizer (#569 follow-up) (#583) - Skip metadata.json parse in drop path (#589) - feat(optimizer): [2/N] Optimizer REST Service and Controller (#531) - [BDP-102028] feat(optimizer): [1/N] Optimizer Database (#530) - [RTAS]: Fix bug - remove fs scheme from tableLocation in commit (cont) (#594) - Trigger ELR process (#593) - [BDP-102028] feat(optimizer): [0/N] Optimizer API and internal model (#527) - Fail retention app when the columnPattern mismatch partition spec (#552) - [DataLoader] Drop OpenTelemetry minimum version to 1.38.0 (#590) - [DataLoader] Emit OpenTelemetry metrics for read operations (#582) - Cache iceberg metadata to reduce redundant requests to storage (#509) - bump iceberg 1.2 version to 1.2.0.17 (#587) - Support returning HTS fields in table list api (#579) - [DataLoader] Add unique id property to OpenHouseDataLoader (#580) - [DataLoader] Add OpenTelemetry metrics support (#575) ## Summary <!--- HINT: Replace #nnn with corresponding Issue number, if you are fixing an existing issue --> [Issue](https://github.com/linkedin/openhouse/issues/#nnn)] Briefly discuss the summary of the changes made in this pull request in 2-3 lines. ## Changes - [ ] Client-facing API Changes - [ ] Internal API Changes - [ ] Bug Fixes - [ ] New Features - [ ] Performance Improvements - [ ] Code Style - [ ] Refactoring - [ ] Documentation - [ ] Tests For all the boxes checked, please include additional details of the changes made in this pull request. ## Testing Done <!--- Check any relevant boxes with "x" --> - [ ] Manually Tested on local docker setup. Please include commands ran, and their output. - [ ] Added new tests for the changes made. - [ ] Updated existing tests to reflect the changes made. - [ ] No tests added or updated. Please explain why. If unsure, please feel free to ask for help. - [ ] Some other form of testing like staging or soak time in production. Please explain. For all the boxes checked, include a detailed description of the testing done for the changes made in this pull request. # Additional Information - [ ] Breaking Changes - [ ] Deprecations - [ ] Large PR broken into smaller PRs, and PR plan linked in the description. For all the boxes checked, include additional details of the changes made in this pull request. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Optimizer Stack
Summary
PR 1 of N in the optimizer stack.
Overall Project
Service Design doc.
Spring Data JPA repositories for all four optimizer tables with filtered query support, plus tests exercising save/find, filtered queries, upsert semantics, and append-only history.
Changes
Repositories:
TableOperationsRepository,TableOperationsHistoryRepository,TableStatsRepository,TableStatsHistoryRepository— each with JPQL filtered query methods.Tests: Repository tests for all four tables plus
OptimizerServiceContextTestverifying the Spring context loads.Testing Done
./gradlew :services:optimizer:test— all tests pass (H2 in MySQL mode).Additional Information