Federated remote-table support over Flight SQL#284
Merged
Conversation
Let an admin register a table that points at another Beacon instance and
push query work (filters, projection, limit, joins, aggregates) down to
the remote so only the reduced result crosses the network:
CREATE EXTERNAL TABLE remote_obs STORED AS REMOTE
LOCATION 'beacon://other-host:50051/obs'
OPTIONS ('username' 'admin', 'password' 'secret');
Built on datafusion-federation (pinned =0.5.3, the release targeting
datafusion ^53): its optimizer rule federates the largest sub-plan rooted
at remote tables and runs it on the remote via a SQLExecutor backed by an
Arrow Flight SQL client.
- beacon-datafusion-ext/src/remote: RemoteConnection (Flight SQL client +
handshake), BeaconFlightSqlExecutor (SQLExecutor), RemoteTableDefinition
(typetag-serde) + provider with schema pinned at registration.
- runtime: register default_optimizer_rules() and add FederatedPlanner to
BeaconQueryPlanner's extension planners.
- actions: route STORED AS REMOTE to the federated builder.
- schema_persistence: recover the definition from the registered provider
so it round-trips to table.json; reload uses the pinned schema (no remote
needed at startup).
Credentials are stored inline in table.json (admin-gated DDL).
Tests: end-to-end loopback federation (filter+aggregate pushdown, auth,
streaming, federated plan node), RemoteTableDefinition serde round-trip,
and parse_remote_location. Existing Flight SQL and beacon-core suites green.
Add a Remote Tables (Federation) page covering STORED AS REMOTE: the beacon:// LOCATION format, OPTIONS (username/password/tls), how filter/projection/limit/join/aggregate pushdown works over Flight SQL, schema pinning at creation, restart behavior, and limitations. Wire it into the data-lake sidebar and cross-link from the external-tables page.
Contributor
There was a problem hiding this comment.
Pull request overview
Adds federated “remote tables” backed by Arrow Flight SQL, allowing DataFusion plans rooted at remote Beacon tables to be pushed down and executed on a remote Beacon instance via datafusion-federation.
Changes:
- Introduces a new
beacon-datafusion-ext::remotemodule (connection, executor, table definition/provider adaptor) to run pushed SQL over Flight SQL and stream results back. - Wires
datafusion-federationinto runtime planning (optimizer rules +FederatedPlanner) and adds DDL routing forCREATE EXTERNAL TABLE … STORED AS REMOTE. - Adds persistence support (recovering remote definitions from registered providers) and an end-to-end loopback federation test over Flight SQL.
Reviewed changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| Cargo.toml | Adds workspace dependencies for federation + tonic transport. |
| Cargo.lock | Locks new dependency graph entries for federation/tonic/flight-sql usage. |
| beacon-datafusion-ext/src/remote/mod.rs | Remote-table module entrypoint + definition recovery from federated providers. |
| beacon-datafusion-ext/src/remote/executor.rs | Implements federation SQLExecutor that executes pushed SQL remotely over Flight SQL. |
| beacon-datafusion-ext/src/remote/definition.rs | Adds persisted RemoteTableDefinition and builds federated providers with pinned schema. |
| beacon-datafusion-ext/src/remote/connection.rs | Adds Flight SQL client connection + handshake logic for remote Beacon instances. |
| beacon-datafusion-ext/src/lib.rs | Exposes the new remote module publicly. |
| beacon-datafusion-ext/Cargo.toml | Adds crate-level deps for federation + flight-sql client support. |
| beacon-data-lake/src/table_runtime/schema_persistence.rs | Persists remote table definitions by recovering them from registered providers. |
| beacon-core/src/statement_plan/query_planner.rs | Registers FederatedPlanner to lower federation extension nodes. |
| beacon-core/src/statement_plan/actions.rs | Routes STORED AS REMOTE DDL and parses beacon://… remote locations/options. |
| beacon-core/src/runtime.rs | Enables federation optimizer rules in the DataFusion session. |
| beacon-core/Cargo.toml | Adds datafusion-federation dependency to core. |
| beacon-api/src/flight_sql/tests.rs | Adds loopback end-to-end federation test validating pushdown + streaming. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+22
to
+25
| /// Maps any displayable error into a DataFusion external error. | ||
| fn remote_err<E: std::fmt::Display>(error: E) -> DataFusionError { | ||
| DataFusionError::External(format!("remote beacon: {error}").into()) | ||
| } |
Comment on lines
+11
to
+18
| #[derive(Clone, Debug)] | ||
| pub struct RemoteConnection { | ||
| /// gRPC endpoint of the remote Flight SQL server, e.g. `http://host:50051`. | ||
| pub url: String, | ||
| pub username: Option<String>, | ||
| pub password: Option<String>, | ||
| } | ||
|
|
Comment on lines
+34
to
+53
| pub async fn connect(&self) -> anyhow::Result<FlightSqlServiceClient<Channel>> { | ||
| let channel = Endpoint::from_shared(self.url.clone()) | ||
| .with_context(|| format!("invalid remote beacon endpoint '{}'", self.url))? | ||
| .connect() | ||
| .await | ||
| .with_context(|| format!("failed to connect to remote beacon at '{}'", self.url))?; | ||
|
|
||
| let mut client = FlightSqlServiceClient::new(channel); | ||
|
|
||
| if let Some(username) = &self.username { | ||
| let password = self.password.as_deref().unwrap_or_default(); | ||
| client | ||
| .handshake(username, password) | ||
| .await | ||
| .with_context(|| format!("Flight SQL handshake with '{}' failed", self.url))?; | ||
| } | ||
|
|
||
| Ok(client) | ||
| } | ||
| } |
Comment on lines
+23
to
+39
| #[derive(Clone, Debug, serde::Serialize, serde::Deserialize)] | ||
| pub struct RemoteTableDefinition { | ||
| /// Local logical table name. | ||
| pub name: String, | ||
| /// gRPC endpoint of the remote Flight SQL server, e.g. `http://host:50051`. | ||
| pub url: String, | ||
| /// Table name on the remote instance. | ||
| pub remote_table: String, | ||
| #[serde(default)] | ||
| pub username: Option<String>, | ||
| #[serde(default)] | ||
| pub password: Option<String>, | ||
| /// Pinned output schema. An empty schema means "fetch from the remote when | ||
| /// building the provider" (and the resolved schema is then pinned). | ||
| pub schema: SchemaRef, | ||
| } | ||
|
|
Comment on lines
+161
to
+164
| let tls = tls_option | ||
| .map(|v| v.eq_ignore_ascii_case("true")) | ||
| .unwrap_or(false); | ||
| let scheme = if tls { "https" } else { "http" }; |
The /api/table-config endpoint serializes the full table definition and is
reachable unauthenticated, so a remote table's inline username/password
would leak. Add TableDefinition::sensitive_keys() (default none), declare
["username", "password"] for RemoteTableDefinition, and mask those fields
in TableConfigView. Also hand-write RemoteTableDefinition's Debug so creds
can't reach logs via {:?}. Persistence (table.json) still keeps the real
values so the table can reconnect.
Test: TableConfigView masks credentials while keeping non-secret fields.
Instead of storing username/password in table.json, remote tables now connect to the remote Flight SQL server anonymously (no handshake, no token). The remote must allow anonymous Flight SQL access (BEACON_FLIGHT_SQL_ALLOW_ANONYMOUS=true), which is read-only — exactly what federation needs. This removes the secret-at-rest entirely, so the earlier credential-redaction machinery is no longer needed and is reverted: - RemoteConnection/RemoteTableDefinition: drop username/password. - actions: drop the username/password OPTIONS (keep tls). - Revert TableDefinition::sensitive_keys() and TableConfigView redaction. - Federation loopback test now runs against an anonymous remote with no creds. - Docs updated: anonymous-access requirement, no credential OPTIONS.
Open
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds federated remote tables: an admin registers a table that points at a table on another Beacon instance, and queries push as much work as possible (filters, projection, limit, joins, aggregates) down to the remote so only the reduced result crosses the network.
Built on
datafusion-federation(pinned=0.5.3, the release targetingdatafusion ^53, matching our53.1.0). Its optimizer rule federates the largest sub-plan rooted at remote tables and runs it on the remote via aSQLExecutorbacked by an Arrow Flight SQL client — the same transport Beacon already serves.Changes
beacon-datafusion-ext/src/remote/RemoteConnection— Flight SQL client + Basic→Bearer handshakeBeaconFlightSqlExecutor— implements federation'sSQLExecutor; runs pushed SQL on the remote and streams Arrow batches back (async→sync bridge)RemoteTableDefinition(typetag-serde) +build_provider— pins the schema via aLIMIT 0fetch and builds the federated providerdefault_optimizer_rules()and add federation'sFederatedPlannertoBeaconQueryPlanner's extension plannersSTORED AS REMOTEbranches to the federated builder increate_external_tabletable.json; reload pins the stored schema, so a down remote doesn't block startupDesign decisions
CREATE EXTERNAL TABLE … STORED AS REMOTE(no new endpoint/parser)table.json(plaintext). Creation is admin-gated DDL; flagged here for visibilitydatafusion-federation, not hand-rolled filter pushdownTesting
obs, aremote_obstable federates back to it:SELECT count(*),sum(val) WHERE id>1returns the correct result, andEXPLAINconfirms a federated/virtual scan node. Exercises auth, schema fetch, pushdown, and streaming.RemoteTableDefinitionserde round-trip,parse_remote_location.beacon-corelib tests pass.Notes
OPTIONSkeys lacking a.withformat., so option lookups check both forms.