Split per-format crates and remove beacon-formats#275
Merged
Conversation
Extract the GeoParquet write integration into beacon-arrow-geoparquet and the BBF DataFusion integration into beacon-arrow-bbf, following the established beacon-arrow-* crate pattern (types under a `datafusion` submodule). beacon-formats keeps thin re-export shims for geo_parquet and bbf so existing consumers compile unchanged. The shared rlimit helpers (max_open_fd/file_open_parallelism) move to beacon-common::file_descriptors so the new crates and the remaining arrow/csv/parquet wrappers can share them.
Move the remaining arrow/csv/parquet wrappers into dedicated crates (beacon-arrow-ipc, beacon-arrow-csv, beacon-arrow-parquet) and relocate the file_formats() registration into beacon-data-lake (its only caller). Re-point all consumers in beacon-core and beacon-functions at the individual format crates and delete the beacon-formats aggregator crate. Also update the Dockerfile COPY list, the beacon-api tracing filter, and a stale doc comment in beacon-datafusion-ext.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR completes the workspace-wide split of the former beacon-formats aggregator into dedicated per-format beacon-arrow-* crates, and updates Beacon’s consumers to depend on/route through those new crates. It also relocates shared helpers and centralizes file-format registration in beacon-data-lake.
Changes:
- Removes
beacon-formatsand introduces new per-format crates (beacon-arrow-{ipc,csv,parquet,geoparquet,bbf}), updating workspace membership and dependency wiring. - Moves file-format registration (
file_formats()) intobeacon-data-lakeand repointsbeacon-core/beacon-functionsimports accordingly. - Extracts FD-limit helpers into
beacon_common::file_descriptorsfor reuse across format crates; updates Docker build context and tracing filters.
Reviewed changes
Copilot reviewed 38 out of 44 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| Dockerfile | Updates build context COPY list to include new per-format crates and remove beacon-formats. |
| Cargo.toml | Updates workspace members to drop beacon-formats and include new beacon-arrow-* crates. |
| Cargo.lock | Removes beacon-formats and adds lock entries for new per-format crates. |
| beacon-functions/src/file_formats/read_zarr.rs | Switches Zarr format import to beacon_arrow_zarr::datafusion. |
| beacon-functions/src/file_formats/read_schema.rs | Switches schema-reader imports from beacon-formats to per-format crates. |
| beacon-functions/src/file_formats/read_parquet.rs | Switches Parquet format import to beacon_arrow_parquet::datafusion. |
| beacon-functions/src/file_formats/read_csv.rs | Switches CSV format import to beacon_arrow_csv::datafusion. |
| beacon-functions/src/file_formats/read_bbf.rs | Switches BBF format import to beacon_arrow_bbf::datafusion. |
| beacon-functions/src/file_formats/read_arrow.rs | Switches Arrow IPC format import to beacon_arrow_ipc::datafusion. |
| beacon-functions/Cargo.toml | Removes beacon-formats dependency and adds per-format crate deps. |
| beacon-file-formats/beacon-formats/src/lib.rs | Deletes the former aggregator crate implementation. |
| beacon-file-formats/beacon-formats/Cargo.toml | Deletes the former aggregator crate manifest. |
| beacon-file-formats/beacon-arrow-parquet/src/lib.rs | Adds new crate root exposing the datafusion module for Parquet. |
| beacon-file-formats/beacon-arrow-parquet/src/datafusion/mod.rs | Updates imports to use beacon_common::file_descriptors and beacon_datafusion_ext::FileFormatFactoryExt. |
| beacon-file-formats/beacon-arrow-parquet/Cargo.toml | Adds new crate manifest for Parquet format integration. |
| beacon-file-formats/beacon-arrow-ipc/src/lib.rs | Adds new crate root exposing the datafusion module for Arrow IPC. |
| beacon-file-formats/beacon-arrow-ipc/src/datafusion/mod.rs | Updates imports to use beacon_common::file_descriptors and beacon_datafusion_ext::FileFormatFactoryExt. |
| beacon-file-formats/beacon-arrow-ipc/Cargo.toml | Adds new crate manifest for Arrow IPC format integration. |
| beacon-file-formats/beacon-arrow-geoparquet/src/lib.rs | Adds new crate root exposing the datafusion module for GeoParquet output. |
| beacon-file-formats/beacon-arrow-geoparquet/src/datafusion/sink.rs | Adds GeoParquet sink implementation (lon/lat → geometry mapping + write path). |
| beacon-file-formats/beacon-arrow-geoparquet/src/datafusion/mod.rs | Adds GeoParquet FileFormat / factory implementation for write integration. |
| beacon-file-formats/beacon-arrow-geoparquet/Cargo.toml | Adds new crate manifest for GeoParquet integration. |
| beacon-file-formats/beacon-arrow-csv/src/lib.rs | Adds new crate root exposing the datafusion module for CSV. |
| beacon-file-formats/beacon-arrow-csv/src/datafusion/mod.rs | Updates imports to use beacon_common::file_descriptors and beacon_datafusion_ext::FileFormatFactoryExt. |
| beacon-file-formats/beacon-arrow-csv/Cargo.toml | Adds new crate manifest for CSV format integration. |
| beacon-file-formats/beacon-arrow-bbf/src/lib.rs | Adds new crate root exposing the datafusion module for BBF. |
| beacon-file-formats/beacon-arrow-bbf/src/datafusion/stream_share.rs | Adds OnceCell-based stream sharing helper for BBF reader integration. |
| beacon-file-formats/beacon-arrow-bbf/src/datafusion/source.rs | Updates module paths for BBF source implementation under the new crate layout. |
| beacon-file-formats/beacon-arrow-bbf/src/datafusion/opener.rs | Updates module paths for BBF opener implementation under the new crate layout. |
| beacon-file-formats/beacon-arrow-bbf/src/datafusion/mod.rs | Updates BBF DataFusion integration wiring and imports for new crate boundaries. |
| beacon-file-formats/beacon-arrow-bbf/src/datafusion/metrics.rs | Adds BBF global metrics wrapper. |
| beacon-file-formats/beacon-arrow-bbf/Cargo.toml | Adds new crate manifest for BBF format integration. |
| beacon-datafusion-ext/src/table_ext.rs | Updates doc comment to reflect removal of beacon-formats dependency. |
| beacon-data-lake/src/lib.rs | Exposes new file_formats module and re-exports file_formats() from beacon-data-lake. |
| beacon-data-lake/src/file_formats.rs | Adds centralized registration of per-format factories into a DataFusion session. |
| beacon-data-lake/Cargo.toml | Removes beacon-formats dependency and adds per-format crate deps required for registration. |
| beacon-core/src/query/output.rs | Updates output format factory imports to the new per-format crates. |
| beacon-core/src/query/from.rs | Updates format implementations/imports (Arrow/CSV/Parquet/Zarr/BBF) to new per-format crates. |
| beacon-core/Cargo.toml | Removes beacon-formats dependency and adds per-format crate deps. |
| beacon-common/src/lib.rs | Exposes the new file_descriptors module. |
| beacon-common/src/file_descriptors.rs | Adds shared FD-budget helpers (max_open_fd, file_open_parallelism). |
| beacon-common/Cargo.toml | Adds rlimit dependency needed by file_descriptors. |
| beacon-api/src/main.rs | Updates default tracing filter targets to include new per-format crate names. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
23
to
+25
| futures-util = { workspace=true} | ||
| tracing = { workspace = true } No newline at end of file | ||
| tracing = { workspace = true } | ||
| rlimit = "0.11.0" No newline at end of file |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Finishes the per-format crate split:
beacon-formatsis removed and its contents are extracted into dedicatedbeacon-arrow-*crates, following the pattern established bybeacon-arrow-tiff/netcdf/atlas/zarr(each exposes its types under adatafusionsubmodule).beacon-arrow-geoparquetbeacon-formats::geo_parquetbeacon-arrow-bbfbeacon-formats::bbfbeacon-arrow-ipcbeacon-formats::arrowbeacon-arrow-csvbeacon-formats::csvbeacon-arrow-parquetbeacon-formats::parquetSupporting moves:
max_open_fd/file_open_parallelism) →beacon_common::file_descriptors, so every format crate can share them.file_formats()registration →beacon-data-lake(its only caller), now pulling factories from the individual crates.beacon-coreandbeacon-functionsre-pointed;Cargo.tomldeps,DockerfileCOPY list, thebeacon-apitracing filter, and a stale doc comment inbeacon-datafusion-extupdated.Commits (phased)
beacon-formatskept thin re-export shims so consumers compiled unchanged (independently shippable).file_formats(), delete the aggregator crate.Verification
cargo check --workspace— cleancargo test --workspace --no-run— all test targets compilecargo tree -d— no duplicatearrow/object_storeversions (thearrow-58/object-store-13feature flags carried over)beacon_formats/beacon-formatsreferences in source; removed fromCargo.lockNotes
beacon-binary-formatgit submodule must be initialized for thebeacon-arrow-bbfcrate to build (pre-existing requirement).beacon-formats/test-files/gridded-example.ncwas removed with the crate; no surviving test referenced it.