refactor: Remove idIndexer::insert_safe, add param bool insert_safe for insert#175
refactor: Remove idIndexer::insert_safe, add param bool insert_safe for insert#175zhanglei1949 wants to merge 1 commit intoalibaba:mainfrom
Conversation
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
Review Summary by QodoUnify insert API and fix varchar buffer allocation bug
WalkthroughsDescription• Unified insert() method by removing insert_safe() and adding insert_safe parameter • Fixed varchar key buffer allocation bug in StringColumn::resize() • Added comprehensive test coverage for varchar reserve-then-insert scenarios • Updated all call sites to use new unified insert() API Diagramflowchart LR
A["idIndexer::insert_safe<br/>removed"] -->|"consolidate into"| B["insert with<br/>insert_safe param"]
C["StringColumn::resize<br/>bug: data_buffer not allocated"] -->|"fix: pre-allocate<br/>data_buffer"| D["reserve-then-insert<br/>now works safely"]
B -->|"propagate param"| E["VertexTable::insert_vertex_pk"]
D -->|"enables"| F["New test cases<br/>for varchar edge cases"]
File Changes1. include/neug/utils/id_indexer.h
|
Code Review by Qodo
1. Insert capacity race
|
| INDEX_T insert(const Property& oid, bool insert_safe = false) { | ||
| assert(oid.type() == get_type()); | ||
| INDEX_T ind = static_cast<INDEX_T>(num_elements_.fetch_add(1)); | ||
| if (!NEUG_LIKELY(ind >= 0 && ind < capacity())) { | ||
| THROW_INTERNAL_EXCEPTION( | ||
| "Reserved size is not enough: " + std::to_string(capacity()) + | ||
| " vs " + std::to_string(ind)); | ||
| INDEX_T ind = static_cast<INDEX_T>(num_elements_.load()); | ||
| if (NEUG_UNLIKELY(ind >= capacity())) { | ||
| if (!insert_safe) { | ||
| THROW_INTERNAL_EXCEPTION( | ||
| "Reserved size is not enough: " + std::to_string(capacity()) + | ||
| " vs " + std::to_string(ind)); | ||
| } else { | ||
| reserve(capacity() + (capacity() >> 2)); | ||
| } | ||
| } | ||
| keys_->set_any(ind, oid); | ||
| ind = static_cast<INDEX_T>(num_elements_.fetch_add(1)); | ||
| keys_->set_any(ind, oid, insert_safe); |
There was a problem hiding this comment.
1. Insert capacity race 🐞 Bug ≡ Correctness
LFIndexer::insert() checks capacity using num_elements_.load() but then allocates the real slot with fetch_add(1) without validating that returned index against capacity(). Under concurrent inserts near capacity, insert_safe=true can still reach keys_->set_any() with an out-of-range index and throw, while num_elements_ has already been incremented (leaving holes / inconsistent state).
Agent Prompt
### Issue description
`LFIndexer::insert()` decides whether to `reserve()`/throw based on `num_elements_.load()`, but then allocates the slot with `num_elements_.fetch_add(1)` and never checks whether the returned index is still within `capacity()`. With concurrent inserts near capacity, `insert_safe=true` can still produce an out-of-range index and throw inside `keys_->set_any()`, while `num_elements_` has already advanced.
### Issue Context
The docs state LFIndexer supports concurrent, lock-free insertion, and the implementation uses atomic `fetch_add` and CAS, so multi-writer correctness matters.
### Fix Focus Areas
- include/neug/utils/id_indexer.h[343-368]
### Suggested fix approach
Implement slot acquisition in a way that is consistent with capacity under concurrency:
- Prefer a `compare_exchange` loop that increments `num_elements_` only if `current < capacity()` (and if `insert_safe==true`, grow first and retry).
- Alternatively, validate the *actual* `ind` returned by `fetch_add` against `capacity()` and, for `insert_safe==true`, `reserve()` to at least `ind + 1` before calling `keys_->set_any(ind, ...)` (and document/handle the fact that `num_elements_` may advance even on failure).
Also add/adjust a unit test that runs concurrent inserts near capacity and asserts no out-of-range exceptions when `insert_safe==true`.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Try to unify insert method.