Fix HNSW index crashes and incorrect query results#181
Open
grparry wants to merge 1 commit intoruvnet:mainfrom
Open
Fix HNSW index crashes and incorrect query results#181grparry wants to merge 1 commit intoruvnet:mainfrom
grparry wants to merge 1 commit intoruvnet:mainfrom
Conversation
Six bugs fixed in the HNSW access method: 1. SIGSEGV on repeated queries: beginscan must allocate xs_orderbyvals and xs_orderbynulls arrays (like GiST/SP-GiST do). Without this, the executor writes distance values to stale palloc memory, causing segfaults on the second+ query in the same backend. 2. Empty HNSW graph: connect_node_to_neighbors was a no-op TODO stub. Implemented bidirectional connections (new→neighbors, neighbor→new) with pruning at M/M0 capacity limits. 3. Wrong distance metric: hnsw_build hardcoded DistanceMetric::Euclidean regardless of the operator class used (e.g. ruvector_cosine_ops). Added metric_from_index() to read the metric from the opclass support function via index_getprocid + get_func_name. 4. Wrong result ordering: BinaryHeap::into_iter().take(k) returns k arbitrary items, not the k closest. Removed .take(k) before sort. 5. xs_recheckorderby must be false: setting it to true triggers PG17's IndexNextWithReorder distance comparison, which errors on harmless floating-point differences between index-stored and heap vectors. 6. Use-after-free in endscan: added null check and null-out of scan->opaque to prevent double-free across rescans. Also applied the same xs_orderbyvals fix to ivfflat_ambeginscan. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi! I've been evaluating ruvector as a pgvector alternative for a project and ran into a few issues with the HNSW index access method on PostgreSQL 17. I traced them down and put together fixes — hope these are helpful.
Summary
Six bugs fixed in the HNSW access method (
hnsw_am.rs), plus one shared fix inivfflat_am.rs:SIGSEGV on repeated queries —
beginscanmust allocatexs_orderbyvalsandxs_orderbynullsarrays (like GiST and SP-GiST do).RelationGetIndexScanallocates the scan descriptor but not these arrays, so the executor was writing distance values to stale/uninitialized memory. First query in a backend often worked; second+ crashed at addresses like0x1.Empty HNSW graph —
connect_node_to_neighborswas a TODO stub. Implemented bidirectional connections (new node → neighbors, each neighbor → new node) with pruning when neighbor count exceeds M (upper layers) or M0 (layer 0).Wrong distance metric —
hnsw_buildusedHnswConfig::default()which hardcodesDistanceMetric::Euclidean, regardless of the operator class (e.g.ruvector_cosine_ops). Addedmetric_from_index()that reads the opclass support function name viaindex_getprocid+get_func_name.Wrong result ordering —
BinaryHeap::into_iter().take(k)returns k arbitrary items from the heap's backing array, not the k smallest. Removed.take(k)before the sort+truncate.xs_recheckorderbymust be false — Setting it totruetriggers PG17'sIndexNextWithReorderdistance comparison, which raises "index returned tuples in wrong order" on harmless floating-point differences between index-computed and operator-computed distances.Use-after-free in
endscan— Added null check and null-out ofscan->opaqueto prevent double-free ifendscanis called after a rescan that resets the state.Testing
Test plan
cargo pgrx install --releaseruvector(N)column and HNSW index usingruvector_cosine_opsSELECT ... ORDER BY embedding <=> query::ruvector LIMIT kmultiple times in the same psql sessionSET enable_indexscan = off)🤖 Generated with Claude Code