(wip) feat(tables): in-process Iceberg REST Catalog adapter#607
Draft
mkuchenbecker wants to merge 1 commit into
Draft
(wip) feat(tables): in-process Iceberg REST Catalog adapter#607mkuchenbecker wants to merge 1 commit into
mkuchenbecker wants to merge 1 commit into
Conversation
Adds an in-process Iceberg REST Catalog facade in front of the Tables Service.
The new `com.linkedin.openhouse.tables.rest` package is picked up by the
existing `TablesSpringApplication` component scan; no Spring-app wiring
changes are required.
Endpoints (all under `/iceberg/v1/...`):
GET /v1/config
GET /v1/{prefix}/namespaces
POST /v1/{prefix}/namespaces
GET /v1/{prefix}/namespaces/{namespace}
HEAD /v1/{prefix}/namespaces/{namespace}
DELETE /v1/{prefix}/namespaces/{namespace}
GET /v1/{prefix}/namespaces/{namespace}/tables
POST /v1/{prefix}/namespaces/{namespace}/tables
GET /v1/{prefix}/namespaces/{namespace}/tables/{table}
HEAD /v1/{prefix}/namespaces/{namespace}/tables/{table}
POST /v1/{prefix}/namespaces/{namespace}/tables/{table}
DELETE /v1/{prefix}/namespaces/{namespace}/tables/{table}
The commit endpoint replays the Iceberg `requirements + updates` payload via
`MetadataUpdate.applyTo(TableMetadata.Builder)`, then discriminates between
snapshot commits (route to `IcebergSnapshotsApiHandler.putIcebergSnapshots`)
and metadata-only commits (route to `TablesApiHandler.updateTable`).
Server-side metadata authorship and the existing two-stage CAS (path-string
version check plus HouseTables `@Version` JPA lock) are preserved unchanged:
REST clients reach the same `OpenHouseInternalTableOperations.doCommit` path
that OpenHouse's Java client already uses.
MVP scope:
- single-level namespaces (Iceberg-spec depth > 1 -> 400 BadRequest); rejection
chosen over flatten-encoding so a future multi-level migration is purely
additive and does not require HDFS path rewrites.
- no views, no multi-table transactions, no scan planning, no credential
vending, no remote signing. Out-of-spec features are simply not advertised.
- depends on iceberg-core 1.5.2 wire types (`UpdateTableRequest`,
`LoadTableResponse`, `ConfigResponse`, `ErrorResponse`, ...) already on the
Tables Service classpath; no new external dependencies.
A `@RestControllerAdvice(basePackages = "com.linkedin.openhouse.tables.rest")`
maps OpenHouse exceptions (`NoSuchUserTableException`, `AlreadyExistsException`,
`EntityConcurrentModificationException`, ...) to Iceberg's wire-format
`ErrorResponse`. The advice is scoped to the new package so OpenHouse's
existing exception handler for the native `/v1/databases/...` surface is
unaffected.
Smoke-tested end-to-end against the `oh-hadoop-spark` docker recipe: a Spark
3.1 spark-sql session configured with stock `org.apache.iceberg.spark.SparkCatalog`
+ `catalog-impl = org.apache.iceberg.rest.RESTCatalog` (no OpenHouse plugin
activated) successfully runs CREATE NAMESPACE, CREATE TABLE, INSERT, SELECT,
DROP TABLE round-trip against the Tables Service.
cbb330
previously requested changes
May 27, 2026
Collaborator
cbb330
left a comment
There was a problem hiding this comment.
lets integrate with the existing work since it has been reviewed and has known good architecture. it is read side only, next set of changes should be write side
then database
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an in-process Iceberg REST Catalog facade in front of the existing Tables Service. The new
com.linkedin.openhouse.tables.restpackage is picked up by the existingTablesSpringApplicationcomponent scan — no Spring-app wiring changes required. Any client that speaks the Apache Iceberg REST wire protocol (Spark, Trino, PyIceberg, Flink, …) can now read and write OpenHouse tables without an OpenHouse-specific plugin.The new package contributes ~960 lines of new Java, zero changes to existing files. It reuses
TablesApiHandler,IcebergSnapshotsApiHandler,DatabasesApiHandler, andOpenHouseInternalCatalogvia constructor injection. Server-side metadata authorship and the existing two-stage CAS (path-string version check + HouseTables@VersionJPA lock) are preserved unchanged — REST clients reach the sameOpenHouseInternalTableOperations.doCommitpath the existing OpenHouse Java client already uses.Changes
Client-facing API: new endpoints under
/iceberg/v1/...All endpoints accept and return Iceberg's standard REST wire format.
GET/v1/config?warehouse=back asoverrides.prefixGET/v1/{prefix}/namespacesDatabasesApiHandler.getAllDatabasesPOST/v1/{prefix}/namespacesGET/v1/{prefix}/namespaces/{namespace}DatabasesApiHandlerHEAD/v1/{prefix}/namespaces/{namespace}DELETE/v1/{prefix}/namespaces/{namespace}GET/v1/{prefix}/namespaces/{namespace}/tablesTablesApiHandler.searchTablesPOST/v1/{prefix}/namespaces/{namespace}/tablesTablesApiHandler.createTableGET/v1/{prefix}/namespaces/{namespace}/tables/{table}TablesApiHandler.getTable+OpenHouseInternalCatalog.loadTableHEAD/v1/{prefix}/namespaces/{namespace}/tables/{table}POST/v1/{prefix}/namespaces/{namespace}/tables/{table}(commit)IcebergSnapshotsApiHandler.putIcebergSnapshotsorTablesApiHandler.updateTableDELETE/v1/{prefix}/namespaces/{namespace}/tables/{table}TablesApiHandler.deleteTableThe commit endpoint replays the Iceberg
requirements + updatespayload viaMetadataUpdate.applyTo(TableMetadata.Builder), pre-checks eachUpdateRequirement, then discriminates: snapshot changes route toIcebergSnapshotsApiHandler.putIcebergSnapshots; metadata-only commits route toTablesApiHandler.updateTable.New Features
Lets any stock Iceberg REST client (Spark
org.apache.iceberg.rest.RESTCatalog, PyIcebergRestCatalog, Trinoiceberg-restconnector, Flink) talk to OpenHouse without per-engine catalog code.A
@RestControllerAdvice(basePackages = "com.linkedin.openhouse.tables.rest")maps OpenHouse internal exceptions to Iceberg's wire-formatErrorResponseJSON. The advice is package-scoped so OpenHouse's existing exception handler for the native/v1/databases/...surface is unaffected.MVP scope notes (intentional)
UpdateTableRequest,LoadTableResponse,ConfigResponse,ErrorResponse, …) come fromiceberg-core1.5.2 already on the Tables Service classpath.Testing Done
End-to-end smoke against the
oh-hadoop-sparkdocker recipe. Stock IcebergRESTCatalogclient; no OpenHouse plugin activated.```bash
./gradlew :services:tables:bootJar
cd infra/recipes/docker-compose/oh-hadoop-spark
docker compose build openhouse-tables
docker compose up -d
```
Spark session config (catalog
ohis stock Iceberg):```
spark.sql.catalog.oh = org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.oh.catalog-impl = org.apache.iceberg.rest.RESTCatalog
spark.sql.catalog.oh.uri = http://openhouse-tables:8080/iceberg/
spark.sql.catalog.oh.token =
spark.sql.catalog.oh.warehouse = oh
```
SQL script executed:
```sql
SHOW NAMESPACES IN oh;
CREATE NAMESPACE IF NOT EXISTS oh.smoke;
DROP TABLE IF EXISTS oh.smoke.t1;
CREATE TABLE oh.smoke.t1 (id bigint, name string) USING iceberg;
SHOW TABLES IN oh.smoke;
INSERT INTO oh.smoke.t1 VALUES (1,'alice'),(2,'bob'),(3,'carol');
SELECT count() FROM oh.smoke.t1;
SELECT * FROM oh.smoke.t1 ORDER BY id;
INSERT INTO oh.smoke.t1 VALUES (4,'dave');
SELECT count() FROM oh.smoke.t1;
SELECT * FROM oh.smoke.t1 ORDER BY id;
DROP TABLE oh.smoke.t1;
SHOW TABLES IN oh.smoke;
```
Result (trimmed):
```
smoke t1
Time taken: 0.595 seconds, Fetched 1 row(s)
Time taken: 6.497 seconds
3
1 alice
2 bob
3 carol
Time taken: 1.093 seconds, Fetched 3 row(s)
Time taken: 2.57 seconds
4
1 alice
2 bob
3 carol
4 dave
Time taken: 0.351 seconds, Fetched 4 row(s)
```
All commands succeed end-to-end. Spark uses the stock Iceberg
RESTCatalog; the adapter translates each request, delegates to existing OpenHouse handlers, and translates the response back. The newmetadata.jsonfiles are written server-side byOpenHouseInternalTableOperations.doCommit(unchanged), and Spark reads them back via the standard Iceberg path.Follow-up work intentionally not in this PR:
@WebMvcTestcoverage for each controller@SpringBootTestwith the upstreamRESTCatalogJava client against an H2 Tables ServiceLoadTableResponse.storage-credentials)Additional Information
No breaking changes. The new package is additive; the existing
/v1/databases/...surface is untouched. The new@RestControllerAdviceis scoped viabasePackages = "com.linkedin.openhouse.tables.rest"so OpenHouse's existing exception handler keeps owning everything else.