Skip to content

Consider Arrow Schema for data #4141

@v1gnesh

Description

@v1gnesh

Feature

Firstly, thank you for building this as an open source product!

One of the key things that (in my simplistic opinion) helps for Buf adoption is the Kafka API compatibility.
You bring Kafka compat to a protobuf-based data movement system.

Have you had a look at the Apache Arrow ecosystem, and the Arrow in-memory format?
The Arrow-verse is quite vast now, with wide adoption and Arrow-compat with dataframes, query engines, databases, and other data systems.

Arrow has something called Flight. It's based on gRPC, and is used for high throughput, ad-hoc data movement between dataframes and query engines / databases.
On the other hand, I think it's fair to say that systems like Kafka / Buf are (kind of) for sustained data movement... at least more so than the adhoc nature of data movement over Flight.

If Buf were to offer schema, validation, data typing all based on the Arrow spec, that will make it even more of a compelling option for both data interconnect and the interconnect for RPC-style systems.
The sending & receiving ends both have easy interop with Arrow schema, just like they do with protobuf.
Doing this will vastly expand Buf's playing field, from protobuf-using sites to basically any site that has data movement needs.

This will allow benefitting from, and boosting all the hard work that has already gone into (and continues to go into) the Arrow spec and Arrow-readyness in a whole variety of data systems.

What do you think?
Tagging (for comments) @alamb and @mbrobbel as they'r awesome Arrow folks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    FeatureNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions