fix: reliability, security, and test coverage improvements by jaywyawhare · Pull Request #29 · jaywyawhare/GigaVector

jaywyawhare · 2026-04-27T19:02:32Z

Summary

Fix NULL deref in HNSW filter, WAL CRC missing on insert, PKCS7 padding truncation
Fix JWT weak RNG, search result vector leaks, batch query size uncapped, snprintf truncation
WAL replay now applies DELETE/UPDATE records; auth key lookup O(log n) via bsearch
Multi-key filter search uses filter expression; CRC-32 deduplicated into utils.h
CMake sanitizer flags moved from global to per-target INTERFACE library
Add concurrent HNSW insert+search test; shell corruption tests registered in CTest
Ship py.typed in wheel for PEP 561 type checker support

Test plan

cmake -DBUILD_TESTS=ON .. && ctest all C tests pass
ctest -R corrupt_wal skips cleanly (no WAL file)
mypy gigavector resolves types via py.typed

…meta_count

…uery size cap

…bject

…er buffer size

…f 16

…ng with stdlib

…fields

… state

…ntly skipping

…ssion

…of linear scan

…event leaks

…ectness assertions

…ACE library

…tations

greptile-apps · 2026-04-27T19:05:28Z

Greptile Summary

This PR addresses a broad set of reliability and security issues — NULL deref in HNSW filter, missing WAL CRC on insert, PKCS7 padding truncation, weak JWT RNG, and batch query size cap — and extends WAL replay to apply DELETE/UPDATE records. Two critical bugs were introduced alongside the fixes:

src/storage/wal.c: buf, keys, and values are freed before being passed to the new on_update callback (use-after-free + double-free on every WAL with UPDATE records).
src/api/rest_handlers.c: unclamped snprintf return value causes a stack buffer overflow in the multi-key filter expression builder; filter values are also interpolated without escaping.

Confidence Score: 2/5

Not safe to merge — two P0 bugs (use-after-free in WAL replay and stack overflow in REST handler) were introduced in this PR.

Two P0 issues are introduced: (1) freed pointers passed to on_update during WAL replay cause UB/memory corruption on any database with UPDATE records; (2) unclamped snprintf return value overflows a stack buffer in the multi-key filter path. A P1 filter-injection issue also exists in the same code path.

src/storage/wal.c (update callback use-after-free) and src/api/rest_handlers.c (snprintf overflow + expression injection)

Security Review

Use-after-free / dangling-pointer exposure (src/storage/wal.c): freed buf, keys, and values are passed into the on_update callback during WAL replay.
Stack buffer overflow (src/api/rest_handlers.c): snprintf return value added to pos without clamping; on truncation pos exceeds the 2048-byte expr stack array.
Filter expression injection (src/api/rest_handlers.c): metadata values from the request body are interpolated into a filter expression string without escaping \" or \\.

Important Files Changed

Filename	Overview
src/storage/wal.c	WAL replay extended to apply DELETE/UPDATE records; contains a critical use-after-free: buf/keys/values are freed before being passed to on_update callback
src/api/rest_handlers.c	Multi-key filter support added and vector leaks fixed; snprintf truncation in filter expression builder can overflow stack buffer; filter values are not sanitized before interpolation into expression
src/security/auth.c	JWT subject escaping added, getrandom used on Linux, and bsearch replaces linear scan; escaped_subject buffer may silently truncate long subjects
src/security/crypto.c	PKCS7 padding truncation fixed: loop now iterates over total_len (including padding block) instead of stopping at plaintext_len
src/index/hnsw.c	NULL deref in filter search fixed; per-index rand_r seed added for thread-safe level calculation on non-Windows platforms
include/core/utils.h	CRC-32 implementation deduplicated into shared inline helpers, removing duplication from database.c and wal.c
tests/index/test_hnsw_concurrent.c	New concurrent insert+search test using pthreads to exercise HNSW under parallel load
CMakeLists.txt	Sanitizer/coverage flags moved from global variables to a per-target INTERFACE library; shell corruption tests registered in CTest with SKIP_RETURN_CODE 77
src/storage/database.c	CRC renamed to shared utils helpers; WAL replay now passes delete/update callbacks; snprintf truncation guarded in db_build_wal_path

Comments Outside Diff (1)

src/security/auth.c, line 619-624 (link)

JWT subject silently truncated when escaped form exceeds 255 bytes

The escaped_subject buffer is 256 bytes. The loop guard ei + 2 < sizeof(escaped_subject) stops appending characters once ei reaches 253, silently emitting a truncated subject without any error. A subject that is 240+ characters long (or shorter with many escapable chars) will produce a JWT whose sub claim differs from the requested subject.

_{Reviews (1): Last reviewed commit: "build: ship py.typed marker in wheel so ..." | Re-trigger Greptile}

greptile-apps · 2026-04-27T19:05:32Z

            free(keys);
            free(values);
            free(buf);
-            /* Skip update records in replay - they modify already-inserted vectors */
+            if (on_update != NULL) {
+                int cb_res = on_update(ctx, (size_t)index_u64, buf, dim,
+                    (const char *const *)keys, (const char *const *)values, meta_count);
+                if (cb_res != 0) {
+                    free(buf); free(keys); free(values); fclose(f); return -1;
+                }


Use-after-free in WAL on_update callback

buf, keys, and values are freed on lines 772–774, then immediately passed to on_update on line 776–777. The callback receives dangling pointers, causing undefined behavior on every WAL replay containing an UPDATE record. The error path on line 779 then attempts to free them a second time, compounding the issue with a double-free.

The three free() calls must be moved to after the on_update callback (and its error path) completes.

greptile-apps · 2026-04-27T19:05:33Z

+                int n = snprintf(expr + pos, sizeof(expr) - pos,
+                                 "%s%s == \"%s\"",
+                                 fi > 0 ? " AND " : "",
+                                 entries[fi].key, val);
+                if (n > 0) pos += (size_t)n;
+            }
+            expr[pos] = '\0';


Stack buffer overflow when filter expression is truncated

snprintf returns the number of characters that would have been written (not what was actually written) when the buffer is too small. Adding that unclamped value to pos on line 620 will push pos beyond the 2048-byte expr array, and the unconditional expr[pos] = '\0' on line 622 then writes out of bounds on the stack.

The fix is to cap pos at the actual number of bytes written: size_t written = (size_t)n < sizeof(expr) - pos ? (size_t)n : sizeof(expr) - pos - 1; pos += written;

greptile-apps · 2026-04-27T19:05:33Z

+            /* Build "key == "val" AND key2 == "val2" ..." expression */
+            char expr[2048];
+            size_t pos = 0;
+            for (size_t fi = 0; fi < nfilter && pos < sizeof(expr) - 1; fi++) {
+                const char *val = json_get_string(entries[fi].value);
+                if (!val) continue;
+                int n = snprintf(expr + pos, sizeof(expr) - pos,
+                                 "%s%s == \"%s\"",
+                                 fi > 0 ? " AND " : "",
+                                 entries[fi].key, val);
+                if (n > 0) pos += (size_t)n;
+            }
+            expr[pos] = '\0';
+            found = db_search_with_filter_expr(ctx->db, query, k, results, distance, expr);


Filter expression injection via unescaped metadata values

Metadata values from the JSON request body are interpolated directly into the filter expression string with == "%s" without escaping embedded " or \ characters. A value like x" OR 1==1 OR tag == "y would produce a malformed or manipulated filter expression, potentially bypassing the intended filter constraint in db_search_with_filter_expr.

…T truncation

jaywyawhare added 22 commits April 28, 2026 00:11

fix: WAL CRC missing in simple insert path, wal_dump ENOENT, bounded …

f68d037

…meta_count

fix: null deref when filter_value is NULL in hnsw search

d607401

fix: search result vector leaks, range response missing data, batch q…

a3be5ab

…uery size cap

fix: use getrandom on Linux, warn on weak RNG fallback, escape JWT su…

8192971

…bject

fix: check fwrite return values in backup header write, fix placehold…

fb78af3

…er buffer size

fix: PKCS7 padding missing final block when input is exact multiple o…

32de511

…f 16

fix: return NULL on snprintf truncation in db_build_wal_path

7ba377c

fix: remove erroneous free() declaration from public header conflicti…

9d4b2ac

…ng with stdlib

fix: use db_open_with_ivfpq_config instead of poking internal struct …

a5d68e7

…fields

fix: gitignore missing build dirs, remove stray scratch file

4a3a163

fix: remove duplicate doc comment on db_add_vectors

6c3531d

fix: use per-index rand_r seed in hnsw to avoid race on shared rand()…

3ea9bf6

… state

refactor: consolidate duplicate CRC-32 impls into utils.h

3fdd805

fix: WAL replay now applies DELETE and UPDATE records instead of sile…

d53310a

…ntly skipping

fix: multi-key filter in search now applies all keys via filter expre…

db06c7e

…ssion

fix: auth key lookup now O(log n) via sorted array + bsearch instead …

ba7a248

…of linear scan

feat: add gv_search_results_free() to clarify result ownership and pr…

522922d

…event leaks

test: use temp dir paths in WAL/HNSW tests, add nearest-neighbor corr…

20f00be

…ectness assertions

test: add concurrent insert+search test for HNSW thread safety

f62e239

test: register shell corruption tests in CTest with skip-if-absent

2f324fa

build: replace global sanitizer/coverage flags with per-target INTERF…

64387e3

…ACE library

build: ship py.typed marker in wheel so type checkers see inline anno…

bd7cf1f

…tations

greptile-apps Bot reviewed Apr 27, 2026

View reviewed changes

jaywyawhare added 2 commits April 28, 2026 00:40

fix: WAL use-after-free, REST snprintf overflow, filter injection, JW…

6e6965d

…T truncation

fix: return -1 not NULL in auth_generate_jwt, zero-init bsearch key

4c9ffcb

jaywyawhare merged commit a186fc3 into master Apr 27, 2026
5 checks passed

jaywyawhare deleted the fix/reliability-security-tests branch April 27, 2026 19:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reliability, security, and test coverage improvements#29

fix: reliability, security, and test coverage improvements#29
jaywyawhare merged 24 commits into
masterfrom
fix/reliability-security-tests

jaywyawhare commented Apr 27, 2026

Uh oh!

greptile-apps Bot commented Apr 27, 2026 •

edited

Loading

Greptile Summary

Confidence Score: 2/5

Security Review

Important Files Changed

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Apr 27, 2026

Uh oh!

greptile-apps Bot Apr 27, 2026

Uh oh!

greptile-apps Bot Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jaywyawhare commented Apr 27, 2026

Summary

Test plan

Uh oh!

greptile-apps Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 2/5

Security Review

Important Files Changed

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented Apr 27, 2026 •

edited

Loading