spike(performance): zero-copy Arrow C-Data export via FFM#72
Open
dfa1 wants to merge 1 commit into
Open
Conversation
Demonstrates ADR 0016 Option B: export a Vortex column as an Apache Arrow array through the Arrow C-Data Interface using only java.lang.foreign — no arrow-vector on the producer side, and zero copy. ArrowSchema/ArrowArray ABI structs are hand-built off-heap with FFM upcall stubs as their release callbacks, and the Arrow values buffer pointer is the Vortex MemorySegment's own address. main() round-trips it: imports the FFM-built structs back through arrow-c-data (Data.importVector adopts the buffer by address) and asserts the imported BigIntVector's data-buffer address equals the original Vortex segment address — confirming no copy occurs. Lives in the performance module (the only one allowed to depend on Arrow / Unsafe); the producer half is Unsafe-free FFM and could move to a future vortex-arrow module. Scope (spike): non-nullable Int64 only. Not handled — validity bitmap, var-len offsets, nested/dictionary children, other ptypes, and the release/arena-pin lifetime contract a long-lived consumer would require. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
A runnable spike proving ADR 0016 Option B: export a Vortex column as an Apache Arrow array through the Arrow C-Data Interface using only
java.lang.foreign— noarrow-vectoron the producer side, zero copy.ArrowCDataExport:ArrowSchema/ArrowArrayABI structs off-heap with FFM, installing FFM upcall stubs as theirreleasecallbacks.MemorySegment's own address — no element copied.main()round-trips it: imports the FFM-built structs back througharrow-c-data(Data.importVector, which adopts the buffer by address) and checks the importedBigIntVector's data-buffer address equals the original Vortex segment address.Output:
The
materialize()from the prior PR is the producer: its contiguous LE i64 output is Arrow's primitive values buffer.Why a spike (not a feature)
Lives in
performance/(the only module allowed to depend on Arrow /sun.misc.Unsafe). The producer half is Unsafe-free FFM and could move to a futurevortex-arrowmodule without touching core/reader.Not handled (deliberately out of scope)
releasejust nulls its slot. A real export to a longer-lived consumer must pin the reader's Arena viaprivate_dataand drop it inrelease, or the mmap unmaps under the consumer → native segfault. This is the main thing a productionized version must solve.Intended for discussion / as the seed for a real
vortex-arrowmodule — not for merge as-is.🤖 Generated with Claude Code