Fix Arrow FFI hash-to-string resolution for fingerprint terms aggregations#126
Fix Arrow FFI hash-to-string resolution for fingerprint terms aggregations#126
Conversation
…t terms aggregations
The Arrow FFI aggregation path discarded the hash_resolution_map built by
Phase 3 touchup, causing terms aggregations on string fingerprint fields
to return raw U64 hash values (e.g. "18264002321125465436") instead of
the original string values ("alice", "bob").
Thread hash_resolution_map through aggregation_result_to_record_batch and
all nested bucket flattening functions. Apply resolution in all three
Arrow FFI JNI methods: single-split, multi-split, and multi-split-multi-agg.
Bump version to 0.32.4.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Claude finished @schenksj's task —— View job Code Review Complete ✅Summary: This PR correctly fixes the Arrow FFI hash resolution regression. The implementation properly threads the Review Results:
Key Changes Reviewed:1. Enhanced Arrow FFI functions (
2. JNI Arrow FFI entry points (
3. Hash resolution flow verification:
No Issues Found:
The fix correctly addresses the root cause where Arrow FFI paths were discarding the hash resolution map that JNI paths were correctly using. |
Two new tests in AggregationArrowFfiTest: - testArrowFfiTermsAggHashResolution: terms agg on "category" field with companion hash opt, verifies keys are "cat_X" strings not U64 hashes - testArrowFfiTermsSubAggHashResolution: terms agg on "name" field with avg(score) sub-aggregation, verifies keys are "item_X" strings Both create a companion split with withStringHashOptimization(true) and exercise the Arrow FFI path (aggregateArrowFfi + nativeReadAggArrowColumnsAsJson) to catch any future regression in hash resolution. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Claude finished @schenksj's task —— View job PR Code Review ✅Reviewing the PR for:
Review Results: ✅ No Issues FoundSummary: This PR correctly fixes the Arrow FFI hash resolution regression. The implementation properly threads the Key Changes Reviewed:1. Enhanced Arrow FFI functions (
2. JNI Arrow FFI entry points (
3. Hash resolution flow verification:
4. Comprehensive test coverage (
Specific Findings:✅ Correctness:
✅ Security:
✅ Build Issues:
✅ Error Propagation:
✅ API Consistency:
The fix correctly addresses the root cause where Arrow FFI paths were discarding the hash resolution map that JNI paths were correctly using. Hash-to-string resolution now works consistently across all aggregation export paths. |
Summary
hash_resolution_mapfrom Phase 3 touchup through all Arrow FFI conversion pathsRoot Cause
perform_search_async_impl_leaf_response_with_aggregationscorrectly rewrites terms aggs to use_phash_<field>and builds a hash→string resolution map (Phase 3). The JNI object path used this map, but the Arrow FFI path discarded it —key_to_string(Key::U64(v))just didv.to_string(), producing raw hashes like"18264002321125465436"instead of"alice".Test plan
🤖 Generated with Claude Code