Consider Arrow Schema for data

### Feature

Firstly, thank you for building this as an open source product!

One of the key things that (in my simplistic opinion) helps for Buf adoption is the Kafka API compatibility.
You bring Kafka compat to a protobuf-based data movement system.

Have you had a look at the Apache Arrow ecosystem, and the Arrow in-memory format?
The Arrow-verse is quite vast now, with wide adoption and Arrow-compat with dataframes, query engines, databases, and other data systems.

Arrow has something called Flight. It's based on gRPC, and is used for high throughput, ad-hoc data movement between dataframes and query engines / databases.
On the other hand, I think it's fair to say that systems like Kafka / Buf are (kind of) for sustained data movement... at least more so than the adhoc nature of data movement over Flight.

If Buf were to offer schema, validation, data typing all based on the Arrow spec, that will make it even more of a compelling option for both data interconnect and the interconnect for RPC-style systems.
The sending & receiving ends both have easy interop with Arrow schema, just like they do with protobuf.
Doing this will vastly expand Buf's playing field, from protobuf-using sites to basically any site that has data movement needs.

This will allow benefitting from, and boosting all the hard work that has already gone into (and continues to go into) the Arrow spec and Arrow-readyness in a whole variety of data systems.

What do you think?
Tagging (for comments) @alamb and @mbrobbel as they'r awesome Arrow folks.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider Arrow Schema for data #4141

Feature

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider Arrow Schema for data #4141

Description

Feature

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions