Skip to content

spike(performance): zero-copy Arrow C-Data export via FFM#72

Open
dfa1 wants to merge 1 commit into
mainfrom
worktree-arrow-cdata-export
Open

spike(performance): zero-copy Arrow C-Data export via FFM#72
dfa1 wants to merge 1 commit into
mainfrom
worktree-arrow-cdata-export

Conversation

@dfa1

@dfa1 dfa1 commented Jun 19, 2026

Copy link
Copy Markdown
Owner

What

A runnable spike proving ADR 0016 Option B: export a Vortex column as an Apache Arrow array through the Arrow C-Data Interface using only java.lang.foreignno arrow-vector on the producer side, zero copy.

ArrowCDataExport:

  • Hand-builds the ArrowSchema / ArrowArray ABI structs off-heap with FFM, installing FFM upcall stubs as their release callbacks.
  • Points the Arrow values-buffer pointer at the Vortex MemorySegment's own address — no element copied.
  • main() round-trips it: imports the FFM-built structs back through arrow-c-data (Data.importVector, which adopts the buffer by address) and checks the imported BigIntVector's data-buffer address equals the original Vortex segment address.

Output:

Vortex values segment address = 0xca0e26000
Imported Arrow vector  = [100, 200, 300, 400, 500, 600, 700, 800]
Arrow data buffer addr = 0xca0e26000
zero-copy (addresses equal) = true

The materialize() from the prior PR is the producer: its contiguous LE i64 output is Arrow's primitive values buffer.

Why a spike (not a feature)

Lives in performance/ (the only module allowed to depend on Arrow / sun.misc.Unsafe). The producer half is Unsafe-free FFM and could move to a future vortex-arrow module without touching core/reader.

Not handled (deliberately out of scope)

  • Non-nullable Int64 only — no validity bitmap, var-len offsets, nested/dictionary children, or other ptypes (the full ADR 0016 buffer table).
  • Lifetime/pin contract. Memory is Arena-owned and the round-trip is synchronous, so release just nulls its slot. A real export to a longer-lived consumer must pin the reader's Arena via private_data and drop it in release, or the mmap unmaps under the consumer → native segfault. This is the main thing a productionized version must solve.

Intended for discussion / as the seed for a real vortex-arrow module — not for merge as-is.

🤖 Generated with Claude Code

Demonstrates ADR 0016 Option B: export a Vortex column as an Apache Arrow array
through the Arrow C-Data Interface using only java.lang.foreign — no arrow-vector
on the producer side, and zero copy. ArrowSchema/ArrowArray ABI structs are
hand-built off-heap with FFM upcall stubs as their release callbacks, and the
Arrow values buffer pointer is the Vortex MemorySegment's own address.

main() round-trips it: imports the FFM-built structs back through arrow-c-data
(Data.importVector adopts the buffer by address) and asserts the imported
BigIntVector's data-buffer address equals the original Vortex segment address —
confirming no copy occurs.

Lives in the performance module (the only one allowed to depend on Arrow / Unsafe);
the producer half is Unsafe-free FFM and could move to a future vortex-arrow module.

Scope (spike): non-nullable Int64 only. Not handled — validity bitmap, var-len
offsets, nested/dictionary children, other ptypes, and the release/arena-pin
lifetime contract a long-lived consumer would require.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant