Skip to content

Split per-format crates and remove beacon-formats#275

Merged
robinskil merged 3 commits into
mainfrom
features/cleanup-formats-crate
Jun 18, 2026
Merged

Split per-format crates and remove beacon-formats#275
robinskil merged 3 commits into
mainfrom
features/cleanup-formats-crate

Conversation

@robinskil

Copy link
Copy Markdown
Collaborator

Summary

Finishes the per-format crate split: beacon-formats is removed and its contents are extracted into dedicated beacon-arrow-* crates, following the pattern established by beacon-arrow-tiff/netcdf/atlas/zarr (each exposes its types under a datafusion submodule).

New crate Was
beacon-arrow-geoparquet beacon-formats::geo_parquet
beacon-arrow-bbf beacon-formats::bbf
beacon-arrow-ipc beacon-formats::arrow
beacon-arrow-csv beacon-formats::csv
beacon-arrow-parquet beacon-formats::parquet

Supporting moves:

  • Shared rlimit helpers (max_open_fd / file_open_parallelism) → beacon_common::file_descriptors, so every format crate can share them.
  • file_formats() registration → beacon-data-lake (its only caller), now pulling factories from the individual crates.
  • All consumers in beacon-core and beacon-functions re-pointed; Cargo.toml deps, Dockerfile COPY list, the beacon-api tracing filter, and a stale doc comment in beacon-datafusion-ext updated.

Commits (phased)

  1. Split geoparquet and bbf into dedicated file-format cratesbeacon-formats kept thin re-export shims so consumers compiled unchanged (independently shippable).
  2. Remove beacon-formats; finish per-format crate split — split arrow/csv/parquet, relocate file_formats(), delete the aggregator crate.

Verification

  • cargo check --workspace — clean
  • cargo test --workspace --no-run — all test targets compile
  • cargo tree -d — no duplicate arrow/object_store versions (the arrow-58/object-store-13 feature flags carried over)
  • No remaining beacon_formats/beacon-formats references in source; removed from Cargo.lock

Notes

  • The beacon-binary-format git submodule must be initialized for the beacon-arrow-bbf crate to build (pre-existing requirement).
  • A stray fixture beacon-formats/test-files/gridded-example.nc was removed with the crate; no surviving test referenced it.

Extract the GeoParquet write integration into beacon-arrow-geoparquet and the BBF DataFusion integration into beacon-arrow-bbf, following the established beacon-arrow-* crate pattern (types under a `datafusion` submodule).

beacon-formats keeps thin re-export shims for geo_parquet and bbf so existing consumers compile unchanged. The shared rlimit helpers (max_open_fd/file_open_parallelism) move to beacon-common::file_descriptors so the new crates and the remaining arrow/csv/parquet wrappers can share them.
Move the remaining arrow/csv/parquet wrappers into dedicated crates
(beacon-arrow-ipc, beacon-arrow-csv, beacon-arrow-parquet) and relocate the
file_formats() registration into beacon-data-lake (its only caller). Re-point
all consumers in beacon-core and beacon-functions at the individual format
crates and delete the beacon-formats aggregator crate.

Also update the Dockerfile COPY list, the beacon-api tracing filter, and a
stale doc comment in beacon-datafusion-ext.
Copilot AI review requested due to automatic review settings June 17, 2026 14:19

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR completes the workspace-wide split of the former beacon-formats aggregator into dedicated per-format beacon-arrow-* crates, and updates Beacon’s consumers to depend on/route through those new crates. It also relocates shared helpers and centralizes file-format registration in beacon-data-lake.

Changes:

  • Removes beacon-formats and introduces new per-format crates (beacon-arrow-{ipc,csv,parquet,geoparquet,bbf}), updating workspace membership and dependency wiring.
  • Moves file-format registration (file_formats()) into beacon-data-lake and repoints beacon-core / beacon-functions imports accordingly.
  • Extracts FD-limit helpers into beacon_common::file_descriptors for reuse across format crates; updates Docker build context and tracing filters.

Reviewed changes

Copilot reviewed 38 out of 44 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
Dockerfile Updates build context COPY list to include new per-format crates and remove beacon-formats.
Cargo.toml Updates workspace members to drop beacon-formats and include new beacon-arrow-* crates.
Cargo.lock Removes beacon-formats and adds lock entries for new per-format crates.
beacon-functions/src/file_formats/read_zarr.rs Switches Zarr format import to beacon_arrow_zarr::datafusion.
beacon-functions/src/file_formats/read_schema.rs Switches schema-reader imports from beacon-formats to per-format crates.
beacon-functions/src/file_formats/read_parquet.rs Switches Parquet format import to beacon_arrow_parquet::datafusion.
beacon-functions/src/file_formats/read_csv.rs Switches CSV format import to beacon_arrow_csv::datafusion.
beacon-functions/src/file_formats/read_bbf.rs Switches BBF format import to beacon_arrow_bbf::datafusion.
beacon-functions/src/file_formats/read_arrow.rs Switches Arrow IPC format import to beacon_arrow_ipc::datafusion.
beacon-functions/Cargo.toml Removes beacon-formats dependency and adds per-format crate deps.
beacon-file-formats/beacon-formats/src/lib.rs Deletes the former aggregator crate implementation.
beacon-file-formats/beacon-formats/Cargo.toml Deletes the former aggregator crate manifest.
beacon-file-formats/beacon-arrow-parquet/src/lib.rs Adds new crate root exposing the datafusion module for Parquet.
beacon-file-formats/beacon-arrow-parquet/src/datafusion/mod.rs Updates imports to use beacon_common::file_descriptors and beacon_datafusion_ext::FileFormatFactoryExt.
beacon-file-formats/beacon-arrow-parquet/Cargo.toml Adds new crate manifest for Parquet format integration.
beacon-file-formats/beacon-arrow-ipc/src/lib.rs Adds new crate root exposing the datafusion module for Arrow IPC.
beacon-file-formats/beacon-arrow-ipc/src/datafusion/mod.rs Updates imports to use beacon_common::file_descriptors and beacon_datafusion_ext::FileFormatFactoryExt.
beacon-file-formats/beacon-arrow-ipc/Cargo.toml Adds new crate manifest for Arrow IPC format integration.
beacon-file-formats/beacon-arrow-geoparquet/src/lib.rs Adds new crate root exposing the datafusion module for GeoParquet output.
beacon-file-formats/beacon-arrow-geoparquet/src/datafusion/sink.rs Adds GeoParquet sink implementation (lon/lat → geometry mapping + write path).
beacon-file-formats/beacon-arrow-geoparquet/src/datafusion/mod.rs Adds GeoParquet FileFormat / factory implementation for write integration.
beacon-file-formats/beacon-arrow-geoparquet/Cargo.toml Adds new crate manifest for GeoParquet integration.
beacon-file-formats/beacon-arrow-csv/src/lib.rs Adds new crate root exposing the datafusion module for CSV.
beacon-file-formats/beacon-arrow-csv/src/datafusion/mod.rs Updates imports to use beacon_common::file_descriptors and beacon_datafusion_ext::FileFormatFactoryExt.
beacon-file-formats/beacon-arrow-csv/Cargo.toml Adds new crate manifest for CSV format integration.
beacon-file-formats/beacon-arrow-bbf/src/lib.rs Adds new crate root exposing the datafusion module for BBF.
beacon-file-formats/beacon-arrow-bbf/src/datafusion/stream_share.rs Adds OnceCell-based stream sharing helper for BBF reader integration.
beacon-file-formats/beacon-arrow-bbf/src/datafusion/source.rs Updates module paths for BBF source implementation under the new crate layout.
beacon-file-formats/beacon-arrow-bbf/src/datafusion/opener.rs Updates module paths for BBF opener implementation under the new crate layout.
beacon-file-formats/beacon-arrow-bbf/src/datafusion/mod.rs Updates BBF DataFusion integration wiring and imports for new crate boundaries.
beacon-file-formats/beacon-arrow-bbf/src/datafusion/metrics.rs Adds BBF global metrics wrapper.
beacon-file-formats/beacon-arrow-bbf/Cargo.toml Adds new crate manifest for BBF format integration.
beacon-datafusion-ext/src/table_ext.rs Updates doc comment to reflect removal of beacon-formats dependency.
beacon-data-lake/src/lib.rs Exposes new file_formats module and re-exports file_formats() from beacon-data-lake.
beacon-data-lake/src/file_formats.rs Adds centralized registration of per-format factories into a DataFusion session.
beacon-data-lake/Cargo.toml Removes beacon-formats dependency and adds per-format crate deps required for registration.
beacon-core/src/query/output.rs Updates output format factory imports to the new per-format crates.
beacon-core/src/query/from.rs Updates format implementations/imports (Arrow/CSV/Parquet/Zarr/BBF) to new per-format crates.
beacon-core/Cargo.toml Removes beacon-formats dependency and adds per-format crate deps.
beacon-common/src/lib.rs Exposes the new file_descriptors module.
beacon-common/src/file_descriptors.rs Adds shared FD-budget helpers (max_open_fd, file_open_parallelism).
beacon-common/Cargo.toml Adds rlimit dependency needed by file_descriptors.
beacon-api/src/main.rs Updates default tracing filter targets to include new per-format crate names.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread beacon-common/Cargo.toml
Comment on lines 23 to +25
futures-util = { workspace=true}
tracing = { workspace = true } No newline at end of file
tracing = { workspace = true }
rlimit = "0.11.0" No newline at end of file
@robinskil robinskil merged commit 80b5967 into main Jun 18, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants