Skip to content

Explore optional GPU acceleration via libcudf-rs #231

@robinskil

Description

@robinskil

Summary

Investigate integrating libcudf-rs as an optional GPU execution backend for Beacon queries.

Beacon already focuses on high-performance scientific and tabular data access with Arrow + DataFusion interoperability across formats such as Zarr, NetCDF, Parquet, Arrow IPC, CSV, and BBF. libcudf-rs may provide a useful path to accelerate eligible DataFusion physical plans using NVIDIA GPUs through RAPIDS cuDF.

This issue proposes a prototype to determine whether libcudf-rs can be integrated cleanly, safely, and optionally without changing Beacon’s default CPU execution path.

Motivation

Some Beacon workloads may be GPU-friendly, especially:

  • large tabular scans from Parquet / Arrow IPC / CSV
  • projection-heavy queries
  • filter-heavy queries
  • aggregation workloads
  • sort / group-by workloads
  • repeated analytical queries over large datasets

libcudf-rs includes a libcudf-datafusion crate that integrates with Apache DataFusion by applying physical optimizer rules that replace eligible DataFusion execution nodes with cuDF-backed GPU variants.

For Beacon, this could provide an optional acceleration path for supported query plans while retaining the existing CPU/DataFusion path as the default and fallback.

Goals

  • Evaluate whether libcudf-rs can be used as an optional GPU backend for Beacon.
  • Identify which Beacon query paths are compatible with libcudf-datafusion.
  • Prototype GPU acceleration for a minimal subset of query plans.
  • Preserve current CPU behavior when GPU support is disabled or unavailable.
  • Define a clean feature flag / runtime configuration model.
  • Measure performance and correctness against the existing execution engine.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions