feat(c++): add LRU chunk cache to Arrow chunk readers to avoid redundandt file IO#861
feat(c++): add LRU chunk cache to Arrow chunk readers to avoid redundandt file IO#861SYaoJun wants to merge 1 commit intoapache:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #861 +/- ##
============================================
+ Coverage 80.60% 80.73% +0.13%
Complexity 615 615
============================================
Files 94 95 +1
Lines 10707 10792 +85
Branches 1055 1059 +4
============================================
+ Hits 8630 8713 +83
- Misses 1837 1839 +2
Partials 240 240
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
before benchmark releaseRun ./graph_info_benchmark
2026-02-26T17:27:12+00:00
Running ./graph_info_benchmark
Run on (4 X 3493.21 MHz CPU s)
CPU Caches:
L1 Data 48 KiB (x2)
L1 Instruction 32 KiB (x2)
L2 Unified 1280 KiB (x2)
L3 Unified 49152 KiB (x1)
Load Average: 3.38, 2.01, 0.90
----------------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------------
BenchmarkFixture/InitialGraphInfo 202838 ns 202816 ns 3188
2026-02-26T17:27:13+00:00
Running ./arrow_chunk_reader_benchmark
Run on (4 X 3490.97 MHz CPU s)
CPU Caches:
L1 Data 48 KiB (x2)
L1 Instruction 32 KiB (x2)
L2 Unified 1280 KiB (x2)
L3 Unified 49152 KiB (x1)
Load Average: 3.38, 2.01, 0.90
-----------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------------------------------------------------------------
BenchmarkFixture/CreateVertexPropertyArrowChunkReader 6966 ns 6965 ns 100206
BenchmarkFixture/CreateAdjListArrowChunkReader 3578 ns 3577 ns 196996
BenchmarkFixture/CreateAdjListOffsetArrowChunkReader 3551 ns 3550 ns 197172
BenchmarkFixture/AdjListPropertyArrowChunkReaderReadChunk 325547 ns 166224 ns 3893
BenchmarkFixture/AdjListArrowChunkReaderReadChunk 271021 ns 152848 ns 4235
BenchmarkFixture/AdjListOffsetArrowChunkReaderReadChunk 228655 ns 136690 ns 5239
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_AllColumns_V1 357058 ns 189591 ns 3335
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_TwoColumns_V1 361736 ns 201011 ns 3760
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_OneColumns_V1 308693 ns 179371 ns 4042
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_AllColumns_V2 152241 ns 133497 ns 5182
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_TwoColumns_V2 125877 ns 108899 ns 6275
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_OneColumns_V2 97200 ns 82066 ns 8745
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_AllColumns_V1 1234745 ns 396302 ns 1797
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_TwoColumns_V1 821340 ns 301181 ns 2272
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_OneColumns_V1 438876 ns 191486 ns 3756
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_AllColumns_V2 805788 ns 775914 ns 890
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_TwoColumns_V2 502274 ns 462362 ns 1506
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_OneColumns_V2 218477 ns 188513 ns 3677
2026-02-26T17:27:40+00:00
Running ./label_filter_benchmark
Run on (4 X 2820.27 MHz CPU s)
CPU Caches:
L1 Data 48 KiB (x2)
L1 Instruction 32 KiB (x2)
L2 Unified 1280 KiB (x2)
L3 Unified 49152 KiB (x1)
Load Average: 2.52, 1.93, 0.91
--------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------------------------------------
BenchmarkFixture/SingleLabelFilter/iterations:10 111665 ns 111611 ns 10
BenchmarkFixture/SingleLabelFilterbyAcero/iterations:10 1237938 ns 658114 ns 10
BenchmarkFixture/MultiLabelFilter/iterations:10 92731 ns 92712 ns 10
BenchmarkFixture/MultiLabelFilterbyAcero/iterations:10 719147 ns 465075 ns 10
BenchmarkFixture/LabelFilterFromSet/iterations:10 49065 ns 48985 ns 10after benchmark releaseRun ./graph_info_benchmark
2026-02-27T00:34:22+00:00
Running ./graph_info_benchmark
Run on (4 X 3243.65 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x2)
L1 Instruction 32 KiB (x2)
L2 Unified 512 KiB (x2)
L3 Unified 32768 KiB (x1)
Load Average: 3.64, 2.29, 1.06
----------------------------------------------------------------------------
Benchmark Time CPU Iterations
----------------------------------------------------------------------------
BenchmarkFixture/InitialGraphInfo 231562 ns 231549 ns 2534
2026-02-27T00:34:23+00:00
Running ./arrow_chunk_reader_benchmark
Run on (4 X 3241.99 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x2)
L1 Instruction 32 KiB (x2)
L2 Unified 512 KiB (x2)
L3 Unified 32768 KiB (x1)
Load Average: 3.64, 2.29, 1.06
-----------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------------------------------------------------------------------
BenchmarkFixture/CreateVertexPropertyArrowChunkReader 13837 ns 13836 ns 50724
BenchmarkFixture/CreateAdjListArrowChunkReader 7077 ns 7076 ns 98705
BenchmarkFixture/CreateAdjListOffsetArrowChunkReader 7103 ns 7103 ns 98224
BenchmarkFixture/AdjListPropertyArrowChunkReaderReadChunk 256 ns 255 ns 2743767
BenchmarkFixture/AdjListArrowChunkReaderReadChunk 448 ns 448 ns 1645296
BenchmarkFixture/AdjListOffsetArrowChunkReaderReadChunk 248 ns 248 ns 2853157
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_AllColumns_V1 900 ns 899 ns 788047
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_TwoColumns_V1 617 ns 617 ns 1135186
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_OneColumns_V1 434 ns 434 ns 1567110
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_AllColumns_V2 868 ns 868 ns 789657
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_TwoColumns_V2 600 ns 600 ns 1149521
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_OneColumns_V2 418 ns 418 ns 1691556
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_AllColumns_V1 892 ns 890 ns 802783
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_TwoColumns_V1 600 ns 600 ns 1160415
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_OneColumns_V1 430 ns 430 ns 1683496
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_AllColumns_V2 819 ns 819 ns 845382
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_TwoColumns_V2 588 ns 588 ns 1166540
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_OneColumns_V2 420 ns 420 ns 1695935
2026-02-27T00:34:39+00:00
Running ./label_filter_benchmark
Run on (4 X 3263.15 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x2)
L1 Instruction 32 KiB (x2)
L2 Unified 512 KiB (x2)
L3 Unified 32768 KiB (x1)
Load Average: 2.89, 2.21, 1.05
--------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------------------------------------------
BenchmarkFixture/SingleLabelFilter/iterations:10 148347 ns 148227 ns 10
BenchmarkFixture/SingleLabelFilterbyAcero/iterations:10 1292426 ns 766637 ns 10
BenchmarkFixture/MultiLabelFilter/iterations:10 109273 ns 109286 ns 10
BenchmarkFixture/MultiLabelFilterbyAcero/iterations:10 818002 ns 539672 ns 10
BenchmarkFixture/LabelFilterFromSet/iterations:10 65665 ns 65433 ns 10summaryPerformance Improvements
Minor Performance Changes
Performance Regressions
|
|
@Sober7135 @yangxk1 I have implemented a basic version of the LRU Cache and summarized the benchmark comparison results in the comments above. Do you have any suggestions or feedback on the code implementation? |
There was a problem hiding this comment.
Pull request overview
Adds an in-memory LRU cache layer to C++ Arrow chunk readers to reduce redundant Parquet reads during seek/backtracking workloads (issue #860), plus introduces a generic LRUCache utility and unit tests.
Changes:
- Introduce
graphar::LRUCache(andPairHash) with Catch2 unit tests. - Wire chunk-level caching into all four Arrow chunk reader implementations to reuse previously loaded
arrow::Tables. - Register the new cache unit test in the C++ CMake test suite.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
cpp/src/graphar/lru_cache.h |
New generic LRU cache + pair hash utility used by chunk readers. |
cpp/src/graphar/arrow/chunk_reader.h |
Adds per-reader cache members for chunk tables. |
cpp/src/graphar/arrow/chunk_reader.cc |
Uses the cache on seek/next_chunk and populates it after reading tables. |
cpp/test/test_lru_cache.cc |
New unit tests covering LRU cache behavior (eviction, update, edge cases). |
cpp/CMakeLists.txt |
Adds test_lru_cache to the test suite. |
Comments suppressed due to low confidence (2)
cpp/src/graphar/arrow/chunk_reader.h:394
- New
chunk_cache_state is introduced, butAdjListArrowChunkReaderdefines a copy assignment operator. Inchunk_reader.cc,operator=currently does not clear or copychunk_cache_, so an instance can retain stale cached tables after assignment (potentially returning data from the previous reader configuration). Please update the assignment operator to reset the cache (and consider whether copy-ctor/assignment should copy or clear cached entries consistently).
std::shared_ptr<arrow::Table> chunk_table_;
LRUCache<std::pair<IdType, IdType>, std::shared_ptr<arrow::Table>, PairHash>
chunk_cache_{4};
IdType vertex_chunk_num_, chunk_num_;
std::string base_dir_;
std::shared_ptr<FileSystem> fs_;
};
cpp/src/graphar/arrow/chunk_reader.h:647
- New
chunk_cache_state is introduced, butAdjListPropertyArrowChunkReaderdefines a copy assignment operator. Inchunk_reader.cc,operator=currently does not clear or copychunk_cache_, so an instance can retain stale cached tables after assignment. Please update the assignment operator to reset the cache (and ensure copy semantics are consistent across ctor/assignment).
std::shared_ptr<arrow::Schema> schema_;
std::shared_ptr<arrow::Table> chunk_table_;
LRUCache<std::pair<IdType, IdType>, std::shared_ptr<arrow::Table>, PairHash>
chunk_cache_{4};
util::FilterOptions filter_options_;
IdType vertex_chunk_num_, chunk_num_;
std::string base_dir_;
std::shared_ptr<FileSystem> fs_;
};
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cpp/src/graphar/lru_cache.h
Outdated
| auto h1 = std::hash<T1>{}(p.first); | ||
| auto h2 = std::hash<T2>{}(p.second); | ||
| return h1 ^ (h2 << 32); |
There was a problem hiding this comment.
PairHash combines hashes via h1 ^ (h2 << 32). Shifting by 32 is undefined behavior when size_t is 32-bit, and even on 64-bit this is a weak combiner (high collision risk). Please switch to a portable hash-combine that doesn't assume size_t width (e.g., h1 ^ (h2 + 0x9e3779b97f4a7c15ULL + (h1<<6) + (h1>>2)) or an equivalent width-agnostic approach).
| auto h1 = std::hash<T1>{}(p.first); | |
| auto h2 = std::hash<T2>{}(p.second); | |
| return h1 ^ (h2 << 32); | |
| size_t h1 = std::hash<T1>{}(p.first); | |
| size_t h2 = std::hash<T2>{}(p.second); | |
| // Width-agnostic hash combine (inspired by boost::hash_combine) | |
| constexpr size_t kMul = | |
| static_cast<size_t>(0x9e3779b97f4a7c15ULL); | |
| h1 ^= h2 + kMul + (h1 << 6) + (h1 >> 2); | |
| return h1; |
yangxk1
left a comment
There was a problem hiding this comment.
Looking forward to a benchmark
|
|
||
| void VertexPropertyArrowChunkReader::Filter(util::Filter filter) { | ||
| filter_options_.filter = filter; | ||
| chunk_table_ = nullptr; |
There was a problem hiding this comment.
Should it be initialized in this xxxFilter function?
There was a problem hiding this comment.
Yes, After filter the cache is outdated.
|
|
||
| void VertexPropertyArrowChunkReader::Select(util::ColumnNames column_names) { | ||
| filter_options_.columns = column_names; | ||
| chunk_table_ = nullptr; |
There was a problem hiding this comment.
Should it be initialized in this xxxSelect function?
cpp/src/graphar/arrow/chunk_reader.h
Outdated
| IdType vertex_num_; | ||
| std::shared_ptr<arrow::Schema> schema_; | ||
| std::shared_ptr<arrow::Table> chunk_table_; | ||
| LRUCache<IdType, std::shared_ptr<arrow::Table>> chunk_cache_{4}; |
There was a problem hiding this comment.
chunk_cache size can judge based on memory or let the user control it will be better?
There was a problem hiding this comment.
chunk_cache size can judge based on memory or let the user control it will be better?
Actually, I don't know which place is suitable for keep this parameters(size). Do you have any suggestions?
There was a problem hiding this comment.
Putting it in the options and having a default value is a good temporary solution. Maybe we can open a new issue to track it until we can come up with a solution that balances memory, user experience, and efficiency.
933738f to
f719ebe
Compare
Added LRU CacheSummaryAfter adding the cache, the initialization and creation times have increased. Since filtering and selection operations affect the cache, I currently invalidate all caches aggressively. This has caused performance regressions in filter operations. Performance Improvements
Minor Performance Changes
Performance Regressions
|
issue: #860
Reason for this PR
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?