Skip to content

feat(c++): add LRU chunk cache to Arrow chunk readers to avoid redundandt file IO#861

Open
SYaoJun wants to merge 1 commit intoapache:mainfrom
SYaoJun:0214_lru
Open

feat(c++): add LRU chunk cache to Arrow chunk readers to avoid redundandt file IO#861
SYaoJun wants to merge 1 commit intoapache:mainfrom
SYaoJun:0214_lru

Conversation

@SYaoJun
Copy link
Contributor

@SYaoJun SYaoJun commented Feb 14, 2026

issue: #860

Reason for this PR

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@codecov-commenter
Copy link

codecov-commenter commented Feb 14, 2026

Codecov Report

❌ Patch coverage is 91.07143% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.73%. Comparing base (d1fe7f9) to head (306679d).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
cpp/src/graphar/arrow/chunk_reader.cc 92.20% 6 Missing ⚠️
cpp/src/graphar/lru_cache.h 87.09% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #861      +/-   ##
============================================
+ Coverage     80.60%   80.73%   +0.13%     
  Complexity      615      615              
============================================
  Files            94       95       +1     
  Lines         10707    10792      +85     
  Branches       1055     1059       +4     
============================================
+ Hits           8630     8713      +83     
- Misses         1837     1839       +2     
  Partials        240      240              
Flag Coverage Δ
cpp 71.32% <91.07%> (+0.42%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@SYaoJun SYaoJun marked this pull request as ready for review February 27, 2026 00:27
@SYaoJun
Copy link
Contributor Author

SYaoJun commented Feb 27, 2026

before benchmark release

Run ./graph_info_benchmark
2026-02-26T17:27:12+00:00
Running ./graph_info_benchmark
Run on (4 X 3493.21 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 1280 KiB (x2)
  L3 Unified 49152 KiB (x1)
Load Average: 3.38, 2.01, 0.90
----------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations
----------------------------------------------------------------------------
BenchmarkFixture/InitialGraphInfo     202838 ns       202816 ns         3188
2026-02-26T17:27:13+00:00
Running ./arrow_chunk_reader_benchmark
Run on (4 X 3490.97 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 1280 KiB (x2)
  L3 Unified 49152 KiB (x1)
Load Average: 3.38, 2.01, 0.90
-----------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                   Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------------------
BenchmarkFixture/CreateVertexPropertyArrowChunkReader                                    6966 ns         6965 ns       100206
BenchmarkFixture/CreateAdjListArrowChunkReader                                           3578 ns         3577 ns       196996
BenchmarkFixture/CreateAdjListOffsetArrowChunkReader                                     3551 ns         3550 ns       197172
BenchmarkFixture/AdjListPropertyArrowChunkReaderReadChunk                              325547 ns       166224 ns         3893
BenchmarkFixture/AdjListArrowChunkReaderReadChunk                                      271021 ns       152848 ns         4235
BenchmarkFixture/AdjListOffsetArrowChunkReaderReadChunk                                228655 ns       136690 ns         5239
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_AllColumns_V1      357058 ns       189591 ns         3335
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_TwoColumns_V1      361736 ns       201011 ns         3760
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_OneColumns_V1      308693 ns       179371 ns         4042
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_AllColumns_V2      152241 ns       133497 ns         5182
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_TwoColumns_V2      125877 ns       108899 ns         6275
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_OneColumns_V2       97200 ns        82066 ns         8745
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_AllColumns_V1    1234745 ns       396302 ns         1797
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_TwoColumns_V1     821340 ns       301181 ns         2272
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_OneColumns_V1     438876 ns       191486 ns         3756
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_AllColumns_V2     805788 ns       775914 ns          890
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_TwoColumns_V2     502274 ns       462362 ns         1506
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_OneColumns_V2     218477 ns       188513 ns         3677
2026-02-26T17:27:40+00:00
Running ./label_filter_benchmark
Run on (4 X 2820.27 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 1280 KiB (x2)
  L3 Unified 49152 KiB (x1)
Load Average: 2.52, 1.93, 0.91
--------------------------------------------------------------------------------------------------
Benchmark                                                        Time             CPU   Iterations
--------------------------------------------------------------------------------------------------
BenchmarkFixture/SingleLabelFilter/iterations:10            111665 ns       111611 ns           10
BenchmarkFixture/SingleLabelFilterbyAcero/iterations:10    1237938 ns       658114 ns           10
BenchmarkFixture/MultiLabelFilter/iterations:10              92731 ns        92712 ns           10
BenchmarkFixture/MultiLabelFilterbyAcero/iterations:10      719147 ns       465075 ns           10
BenchmarkFixture/LabelFilterFromSet/iterations:10            49065 ns        48985 ns           10

after benchmark release

Run ./graph_info_benchmark
2026-02-27T00:34:22+00:00
Running ./graph_info_benchmark
Run on (4 X 3243.65 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 512 KiB (x2)
  L3 Unified 32768 KiB (x1)
Load Average: 3.64, 2.29, 1.06
----------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations
----------------------------------------------------------------------------
BenchmarkFixture/InitialGraphInfo     231562 ns       231549 ns         2534
2026-02-27T00:34:23+00:00
Running ./arrow_chunk_reader_benchmark
Run on (4 X 3241.99 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 512 KiB (x2)
  L3 Unified 32768 KiB (x1)
Load Average: 3.64, 2.29, 1.06
-----------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                   Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------------------
BenchmarkFixture/CreateVertexPropertyArrowChunkReader                                   13837 ns        13836 ns        50724
BenchmarkFixture/CreateAdjListArrowChunkReader                                           7077 ns         7076 ns        98705
BenchmarkFixture/CreateAdjListOffsetArrowChunkReader                                     7103 ns         7103 ns        98224
BenchmarkFixture/AdjListPropertyArrowChunkReaderReadChunk                                 256 ns          255 ns      2743767
BenchmarkFixture/AdjListArrowChunkReaderReadChunk                                         448 ns          448 ns      1645296
BenchmarkFixture/AdjListOffsetArrowChunkReaderReadChunk                                   248 ns          248 ns      2853157
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_AllColumns_V1         900 ns          899 ns       788047
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_TwoColumns_V1         617 ns          617 ns      1135186
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_OneColumns_V1         434 ns          434 ns      1567110
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_AllColumns_V2         868 ns          868 ns       789657
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_TwoColumns_V2         600 ns          600 ns      1149521
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_firstGraph_OneColumns_V2         418 ns          418 ns      1691556
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_AllColumns_V1        892 ns          890 ns       802783
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_TwoColumns_V1        600 ns          600 ns      1160415
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_OneColumns_V1        430 ns          430 ns      1683496
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_AllColumns_V2        819 ns          819 ns       845382
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_TwoColumns_V2        588 ns          588 ns      1166540
BenchmarkFixture/VertexPropertyArrowChunkReaderReadChunk_secondGraph_OneColumns_V2        420 ns          420 ns      1695935
2026-02-27T00:34:39+00:00
Running ./label_filter_benchmark
Run on (4 X 3263.15 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 512 KiB (x2)
  L3 Unified 32768 KiB (x1)
Load Average: 2.89, 2.21, 1.05
--------------------------------------------------------------------------------------------------
Benchmark                                                        Time             CPU   Iterations
--------------------------------------------------------------------------------------------------
BenchmarkFixture/SingleLabelFilter/iterations:10            148347 ns       148227 ns           10
BenchmarkFixture/SingleLabelFilterbyAcero/iterations:10    1292426 ns       766637 ns           10
BenchmarkFixture/MultiLabelFilter/iterations:10             109273 ns       109286 ns           10
BenchmarkFixture/MultiLabelFilterbyAcero/iterations:10      818002 ns       539672 ns           10
BenchmarkFixture/LabelFilterFromSet/iterations:10            65665 ns        65433 ns           10

summary

Performance Improvements

Benchmark Before (ns) After (ns) Improvement
Vertex_second_All_V2 775,914 819 947.39x
Vertex_second_Two_V2 462,362 588 786.33x
AdjListPropertyArrowChunkReaderReadChunk 166,224 255 651.86x
AdjListOffsetArrowChunkReaderReadChunk 136,690 248 551.17x
Vertex_second_Two_V1 301,181 600 501.97x
Vertex_second_One_V2 188,513 420 448.84x
Vertex_second_One_V1 191,486 430 445.32x
Vertex_second_All_V1 396,302 890 445.28x
Vertex_first_One_V1 179,371 434 413.30x
AdjListArrowChunkReaderReadChunk 152,848 448 341.18x
Vertex_first_Two_V1 201,011 617 325.79x
Vertex_first_All_V1 189,591 899 210.89x
Vertex_first_One_V2 82,066 418 196.33x
Vertex_first_Two_V2 108,899 600 181.50x
Vertex_first_All_V2 133,497 868 153.80x

Minor Performance Changes

Benchmark Before (ns) After (ns) Ratio
InitialGraphInfo 202,816 231,549 0.88x
MultiLabelFilterByAcero 465,075 539,672 0.86x
SingleLabelFilterByAcero 658,114 766,637 0.86x
MultiLabelFilter 92,712 109,286 0.85x

Performance Regressions

Benchmark Before (ns) After (ns) Regression
SingleLabelFilter 111,611 148,227 0.75x
LabelFilterFromSet 48,985 65,433 0.75x
CreateAdjListArrowChunkReader 3,577 7,076 0.51x
CreateVertexPropertyArrowChunkReader 6,965 13,836 0.50x
CreateAdjListOffsetArrowChunkReader 3,550 7,103 0.50x

@SYaoJun
Copy link
Contributor Author

SYaoJun commented Feb 27, 2026

@Sober7135 @yangxk1 I have implemented a basic version of the LRU Cache and summarized the benchmark comparison results in the comments above. Do you have any suggestions or feedback on the code implementation?

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an in-memory LRU cache layer to C++ Arrow chunk readers to reduce redundant Parquet reads during seek/backtracking workloads (issue #860), plus introduces a generic LRUCache utility and unit tests.

Changes:

  • Introduce graphar::LRUCache (and PairHash) with Catch2 unit tests.
  • Wire chunk-level caching into all four Arrow chunk reader implementations to reuse previously loaded arrow::Tables.
  • Register the new cache unit test in the C++ CMake test suite.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
cpp/src/graphar/lru_cache.h New generic LRU cache + pair hash utility used by chunk readers.
cpp/src/graphar/arrow/chunk_reader.h Adds per-reader cache members for chunk tables.
cpp/src/graphar/arrow/chunk_reader.cc Uses the cache on seek/next_chunk and populates it after reading tables.
cpp/test/test_lru_cache.cc New unit tests covering LRU cache behavior (eviction, update, edge cases).
cpp/CMakeLists.txt Adds test_lru_cache to the test suite.
Comments suppressed due to low confidence (2)

cpp/src/graphar/arrow/chunk_reader.h:394

  • New chunk_cache_ state is introduced, but AdjListArrowChunkReader defines a copy assignment operator. In chunk_reader.cc, operator= currently does not clear or copy chunk_cache_, so an instance can retain stale cached tables after assignment (potentially returning data from the previous reader configuration). Please update the assignment operator to reset the cache (and consider whether copy-ctor/assignment should copy or clear cached entries consistently).
  std::shared_ptr<arrow::Table> chunk_table_;
  LRUCache<std::pair<IdType, IdType>, std::shared_ptr<arrow::Table>, PairHash>
      chunk_cache_{4};
  IdType vertex_chunk_num_, chunk_num_;
  std::string base_dir_;
  std::shared_ptr<FileSystem> fs_;
};

cpp/src/graphar/arrow/chunk_reader.h:647

  • New chunk_cache_ state is introduced, but AdjListPropertyArrowChunkReader defines a copy assignment operator. In chunk_reader.cc, operator= currently does not clear or copy chunk_cache_, so an instance can retain stale cached tables after assignment. Please update the assignment operator to reset the cache (and ensure copy semantics are consistent across ctor/assignment).
  std::shared_ptr<arrow::Schema> schema_;
  std::shared_ptr<arrow::Table> chunk_table_;
  LRUCache<std::pair<IdType, IdType>, std::shared_ptr<arrow::Table>, PairHash>
      chunk_cache_{4};
  util::FilterOptions filter_options_;
  IdType vertex_chunk_num_, chunk_num_;
  std::string base_dir_;
  std::shared_ptr<FileSystem> fs_;
};

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +80 to +82
auto h1 = std::hash<T1>{}(p.first);
auto h2 = std::hash<T2>{}(p.second);
return h1 ^ (h2 << 32);
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PairHash combines hashes via h1 ^ (h2 << 32). Shifting by 32 is undefined behavior when size_t is 32-bit, and even on 64-bit this is a weak combiner (high collision risk). Please switch to a portable hash-combine that doesn't assume size_t width (e.g., h1 ^ (h2 + 0x9e3779b97f4a7c15ULL + (h1<<6) + (h1>>2)) or an equivalent width-agnostic approach).

Suggested change
auto h1 = std::hash<T1>{}(p.first);
auto h2 = std::hash<T2>{}(p.second);
return h1 ^ (h2 << 32);
size_t h1 = std::hash<T1>{}(p.first);
size_t h2 = std::hash<T2>{}(p.second);
// Width-agnostic hash combine (inspired by boost::hash_combine)
constexpr size_t kMul =
static_cast<size_t>(0x9e3779b97f4a7c15ULL);
h1 ^= h2 + kMul + (h1 << 6) + (h1 >> 2);
return h1;

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@yangxk1 yangxk1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward to a benchmark


void VertexPropertyArrowChunkReader::Filter(util::Filter filter) {
filter_options_.filter = filter;
chunk_table_ = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be initialized in this xxxFilter function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, After filter the cache is outdated.


void VertexPropertyArrowChunkReader::Select(util::ColumnNames column_names) {
filter_options_.columns = column_names;
chunk_table_ = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be initialized in this xxxSelect function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

IdType vertex_num_;
std::shared_ptr<arrow::Schema> schema_;
std::shared_ptr<arrow::Table> chunk_table_;
LRUCache<IdType, std::shared_ptr<arrow::Table>> chunk_cache_{4};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunk_cache size can judge based on memory or let the user control it will be better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunk_cache size can judge based on memory or let the user control it will be better?

Actually, I don't know which place is suitable for keep this parameters(size). Do you have any suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting it in the options and having a default value is a good temporary solution. Maybe we can open a new issue to track it until we can come up with a solution that balances memory, user experience, and efficiency.

@SYaoJun SYaoJun force-pushed the 0214_lru branch 2 times, most recently from 933738f to f719ebe Compare March 2, 2026 12:56
@SYaoJun
Copy link
Contributor Author

SYaoJun commented Mar 2, 2026

Added LRU Cache

benchmark

Summary

After adding the cache, the initialization and creation times have increased. Since filtering and selection operations affect the cache, I currently invalidate all caches aggressively. This has caused performance regressions in filter operations.
I believe this issue requires a more careful design—there is still significant room for substantial improvement.

Performance Improvements

Benchmark Before (ns) After (ns) Improvement
AdjListPropertyArrowChunkReaderReadChunk 166224 3927 42.33x
AdjListArrowChunkReaderReadChunk 152848 6095 25.08x
AdjListOffsetArrowChunkReaderReadChunk 136690 3808 35.90x
VertexPropertyArrowChunkReaderReadChunk_firstGraph_AllColumns_V1 189591 10307 18.39x
VertexPropertyArrowChunkReaderReadChunk_firstGraph_TwoColumns_V1 201011 7756 25.92x
VertexPropertyArrowChunkReaderReadChunk_firstGraph_OneColumns_V1 179371 6139 29.22x
VertexPropertyArrowChunkReaderReadChunk_firstGraph_AllColumns_V2 133497 9883 13.51x
VertexPropertyArrowChunkReaderReadChunk_firstGraph_TwoColumns_V2 108899 7876 13.83x
VertexPropertyArrowChunkReaderReadChunk_firstGraph_OneColumns_V2 82066 5830 14.08x
VertexPropertyArrowChunkReaderReadChunk_secondGraph_AllColumns_V1 396302 10402 38.10x
VertexPropertyArrowChunkReaderReadChunk_secondGraph_TwoColumns_V1 301181 8218 36.65x
VertexPropertyArrowChunkReaderReadChunk_secondGraph_OneColumns_V1 191486 5875 32.59x
VertexPropertyArrowChunkReaderReadChunk_secondGraph_AllColumns_V2 775914 9723 79.80x
VertexPropertyArrowChunkReaderReadChunk_secondGraph_TwoColumns_V2 462362 8328 55.52x
VertexPropertyArrowChunkReaderReadChunk_secondGraph_OneColumns_V2 188513 5899 31.96x

Minor Performance Changes

Benchmark Before (ns) After (ns) Ratio
CreateAdjListArrowChunkReader 3577 38150 0.09x
CreateAdjListOffsetArrowChunkReader 3550 38338 0.09x

Performance Regressions

Benchmark Before (ns) After (ns) Regression
InitialGraphInfo 202816 1824996 0.11x
CreateVertexPropertyArrowChunkReader 6965 79185 0.09x
SingleLabelFilter 111611 1710060 0.07x
SingleLabelFilterbyAcero 658114 2487584 0.26x
MultiLabelFilter 92712 1271051 0.07x
MultiLabelFilterbyAcero 465075 1909993 0.24x
LabelFilterFromSet 48985 824111 0.06x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants