Skip to content

refactor: Remove idIndexer::insert_safe, add param bool insert_safe for insert#175

Open
zhanglei1949 wants to merge 1 commit intoalibaba:mainfrom
zhanglei1949:zl/fix-str-pk
Open

refactor: Remove idIndexer::insert_safe, add param bool insert_safe for insert#175
zhanglei1949 wants to merge 1 commit intoalibaba:mainfrom
zhanglei1949:zl/fix-str-pk

Conversation

@zhanglei1949
Copy link
Copy Markdown
Member

Try to unify insert method.

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Unify insert API and fix varchar buffer allocation bug

✨ Enhancement 🐞 Bug fix

Grey Divider

Walkthroughs

Description
• Unified insert() method by removing insert_safe() and adding insert_safe parameter
• Fixed varchar key buffer allocation bug in StringColumn::resize()
• Added comprehensive test coverage for varchar reserve-then-insert scenarios
• Updated all call sites to use new unified insert() API
Diagram
flowchart LR
  A["idIndexer::insert_safe<br/>removed"] -->|"consolidate into"| B["insert with<br/>insert_safe param"]
  C["StringColumn::resize<br/>bug: data_buffer not allocated"] -->|"fix: pre-allocate<br/>data_buffer"| D["reserve-then-insert<br/>now works safely"]
  B -->|"propagate param"| E["VertexTable::insert_vertex_pk"]
  D -->|"enables"| F["New test cases<br/>for varchar edge cases"]
Loading

Grey Divider

File Changes

1. include/neug/utils/id_indexer.h ✨ Enhancement +12/-16

Consolidate insert_safe into insert method

• Removed insert_safe() method entirely
• Modified insert() to accept optional bool insert_safe = false parameter
• When insert_safe=true, automatically reserves capacity if needed
• When insert_safe=false, throws exception if capacity exceeded
• Moved capacity check before incrementing num_elements_ to prevent race conditions
• Pass insert_safe parameter to keys_->set_any() call

include/neug/utils/id_indexer.h


2. include/neug/storages/graph/vertex_table.h ✨ Enhancement +2/-1

Add insert_safe parameter to insert_vertex_pk

• Updated insert_vertex_pk() signature to accept bool insert_safe = false parameter
• Default parameter value maintains backward compatibility

include/neug/storages/graph/vertex_table.h


3. src/storages/graph/vertex_table.cc ✨ Enhancement +4/-3

Forward insert_safe parameter through call chain

• Updated AddVertex() to pass insert_safe parameter to insert_vertex_pk()
• Updated insert_vertex_pk() implementation to accept and forward insert_safe parameter
• Pass insert_safe to indexer_.insert() call

src/storages/graph/vertex_table.cc


View more (2)
4. tests/storage/test_edge_table.cc 🧪 Tests +8/-8

Update edge table tests to use new insert API

• Replaced 8 calls to insert_safe() with insert(..., true) in test methods
• Updated TestAddEdgeAndDelete, TestAddEdgeDeleteUnbundled, TestEdgeTableCompaction, and
 TestUpdateEdgeData test cases
• Maintains same test behavior with new unified API

tests/storage/test_edge_table.cc


5. tests/unittest/test_indexer.cc 🧪 Tests +348/-4

Add comprehensive varchar buffer allocation tests

• Replaced 4 existing insert_safe() calls with insert(..., true) in core tests
• Added 6 new comprehensive test cases for varchar key buffer allocation bug fix
• Tests cover reserve-then-insert, max-width strings, multiple reserves, shrinking reserves, rehash,
 and dump/reload scenarios
• Added 3 additional tests for short-string dump/reopen/insert-long-string paths (in-memory and
 SyncToFile backends)
• Total of 9 new test functions validating the varchar buffer allocation fix

tests/unittest/test_indexer.cc


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review bot commented Apr 7, 2026

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX Issues (0)

Grey Divider


Action required

1. Insert capacity race 🐞 Bug ≡ Correctness
Description
LFIndexer::insert() checks capacity using num_elements_.load() but then allocates the real slot with
fetch_add(1) without validating that returned index against capacity(). Under concurrent inserts
near capacity, insert_safe=true can still reach keys_->set_any() with an out-of-range index and
throw, while num_elements_ has already been incremented (leaving holes / inconsistent state).
Code

include/neug/utils/id_indexer.h[R343-356]

+  INDEX_T insert(const Property& oid, bool insert_safe = false) {
    assert(oid.type() == get_type());
-    INDEX_T ind = static_cast<INDEX_T>(num_elements_.fetch_add(1));
-    if (!NEUG_LIKELY(ind >= 0 && ind < capacity())) {
-      THROW_INTERNAL_EXCEPTION(
-          "Reserved size is not enough: " + std::to_string(capacity()) +
-          " vs " + std::to_string(ind));
+    INDEX_T ind = static_cast<INDEX_T>(num_elements_.load());
+    if (NEUG_UNLIKELY(ind >= capacity())) {
+      if (!insert_safe) {
+        THROW_INTERNAL_EXCEPTION(
+            "Reserved size is not enough: " + std::to_string(capacity()) +
+            " vs " + std::to_string(ind));
+      } else {
+        reserve(capacity() + (capacity() >> 2));
+      }
    }
-    keys_->set_any(ind, oid);
+    ind = static_cast<INDEX_T>(num_elements_.fetch_add(1));
+    keys_->set_any(ind, oid, insert_safe);
Evidence
The storage docs explicitly state LFIndexer “supports concurrent, lock-free insertion”, so
correctness under concurrent writers is an expected contract. The new insert implementation makes
the capacity decision based on a potentially stale load(), and then uses fetch_add() to obtain the
actual index; another thread can consume remaining capacity in between, producing an index >=
keys_->size(). Column implementations reject out-of-range indices (throw), so insert_safe=true is
not actually safe under concurrency and num_elements_ is still advanced.

include/neug/utils/id_indexer.h[343-368]
include/neug/storages/README.md[23-30]
include/neug/utils/property/column.h[139-145]
include/neug/utils/property/column.h[376-395]
src/utils/property/column.cc[84-114]
include/neug/transaction/README.md[35-39]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`LFIndexer::insert()` decides whether to `reserve()`/throw based on `num_elements_.load()`, but then allocates the slot with `num_elements_.fetch_add(1)` and never checks whether the returned index is still within `capacity()`. With concurrent inserts near capacity, `insert_safe=true` can still produce an out-of-range index and throw inside `keys_->set_any()`, while `num_elements_` has already advanced.

### Issue Context
The docs state LFIndexer supports concurrent, lock-free insertion, and the implementation uses atomic `fetch_add` and CAS, so multi-writer correctness matters.

### Fix Focus Areas
- include/neug/utils/id_indexer.h[343-368]

### Suggested fix approach
Implement slot acquisition in a way that is consistent with capacity under concurrency:
- Prefer a `compare_exchange` loop that increments `num_elements_` only if `current < capacity()` (and if `insert_safe==true`, grow first and retry).
- Alternatively, validate the *actual* `ind` returned by `fetch_add` against `capacity()` and, for `insert_safe==true`, `reserve()` to at least `ind + 1` before calling `keys_->set_any(ind, ...)` (and document/handle the fact that `num_elements_` may advance even on failure).

Also add/adjust a unit test that runs concurrent inserts near capacity and asserts no out-of-range exceptions when `insert_safe==true`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Comment on lines +343 to +356
INDEX_T insert(const Property& oid, bool insert_safe = false) {
assert(oid.type() == get_type());
INDEX_T ind = static_cast<INDEX_T>(num_elements_.fetch_add(1));
if (!NEUG_LIKELY(ind >= 0 && ind < capacity())) {
THROW_INTERNAL_EXCEPTION(
"Reserved size is not enough: " + std::to_string(capacity()) +
" vs " + std::to_string(ind));
INDEX_T ind = static_cast<INDEX_T>(num_elements_.load());
if (NEUG_UNLIKELY(ind >= capacity())) {
if (!insert_safe) {
THROW_INTERNAL_EXCEPTION(
"Reserved size is not enough: " + std::to_string(capacity()) +
" vs " + std::to_string(ind));
} else {
reserve(capacity() + (capacity() >> 2));
}
}
keys_->set_any(ind, oid);
ind = static_cast<INDEX_T>(num_elements_.fetch_add(1));
keys_->set_any(ind, oid, insert_safe);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Insert capacity race 🐞 Bug ≡ Correctness

LFIndexer::insert() checks capacity using num_elements_.load() but then allocates the real slot with
fetch_add(1) without validating that returned index against capacity(). Under concurrent inserts
near capacity, insert_safe=true can still reach keys_->set_any() with an out-of-range index and
throw, while num_elements_ has already been incremented (leaving holes / inconsistent state).
Agent Prompt
### Issue description
`LFIndexer::insert()` decides whether to `reserve()`/throw based on `num_elements_.load()`, but then allocates the slot with `num_elements_.fetch_add(1)` and never checks whether the returned index is still within `capacity()`. With concurrent inserts near capacity, `insert_safe=true` can still produce an out-of-range index and throw inside `keys_->set_any()`, while `num_elements_` has already advanced.

### Issue Context
The docs state LFIndexer supports concurrent, lock-free insertion, and the implementation uses atomic `fetch_add` and CAS, so multi-writer correctness matters.

### Fix Focus Areas
- include/neug/utils/id_indexer.h[343-368]

### Suggested fix approach
Implement slot acquisition in a way that is consistent with capacity under concurrency:
- Prefer a `compare_exchange` loop that increments `num_elements_` only if `current < capacity()` (and if `insert_safe==true`, grow first and retry).
- Alternatively, validate the *actual* `ind` returned by `fetch_add` against `capacity()` and, for `insert_safe==true`, `reserve()` to at least `ind + 1` before calling `keys_->set_any(ind, ...)` (and document/handle the fact that `num_elements_` may advance even on failure).

Also add/adjust a unit test that runs concurrent inserts near capacity and asserts no out-of-range exceptions when `insert_safe==true`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

@zhanglei1949 zhanglei1949 changed the title refacotr: Remove idIndexer::insert_safe, add param bool insert_safe for insert refactor: Remove idIndexer::insert_safe, add param bool insert_safe for insert Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant