fix: reliability, security, and test coverage improvements#29
Conversation
…ectness assertions
|
| Filename | Overview |
|---|---|
| src/storage/wal.c | WAL replay extended to apply DELETE/UPDATE records; contains a critical use-after-free: buf/keys/values are freed before being passed to on_update callback |
| src/api/rest_handlers.c | Multi-key filter support added and vector leaks fixed; snprintf truncation in filter expression builder can overflow stack buffer; filter values are not sanitized before interpolation into expression |
| src/security/auth.c | JWT subject escaping added, getrandom used on Linux, and bsearch replaces linear scan; escaped_subject buffer may silently truncate long subjects |
| src/security/crypto.c | PKCS7 padding truncation fixed: loop now iterates over total_len (including padding block) instead of stopping at plaintext_len |
| src/index/hnsw.c | NULL deref in filter search fixed; per-index rand_r seed added for thread-safe level calculation on non-Windows platforms |
| include/core/utils.h | CRC-32 implementation deduplicated into shared inline helpers, removing duplication from database.c and wal.c |
| tests/index/test_hnsw_concurrent.c | New concurrent insert+search test using pthreads to exercise HNSW under parallel load |
| CMakeLists.txt | Sanitizer/coverage flags moved from global variables to a per-target INTERFACE library; shell corruption tests registered in CTest with SKIP_RETURN_CODE 77 |
| src/storage/database.c | CRC renamed to shared utils helpers; WAL replay now passes delete/update callbacks; snprintf truncation guarded in db_build_wal_path |
Comments Outside Diff (1)
-
src/security/auth.c, line 619-624 (link)JWT subject silently truncated when escaped form exceeds 255 bytes
The
escaped_subjectbuffer is 256 bytes. The loop guardei + 2 < sizeof(escaped_subject)stops appending characters onceeireaches 253, silently emitting a truncated subject without any error. A subject that is 240+ characters long (or shorter with many escapable chars) will produce a JWT whosesubclaim differs from the requested subject.
Reviews (1): Last reviewed commit: "build: ship py.typed marker in wheel so ..." | Re-trigger Greptile
| free(keys); | ||
| free(values); | ||
| free(buf); | ||
| /* Skip update records in replay - they modify already-inserted vectors */ | ||
| if (on_update != NULL) { | ||
| int cb_res = on_update(ctx, (size_t)index_u64, buf, dim, | ||
| (const char *const *)keys, (const char *const *)values, meta_count); | ||
| if (cb_res != 0) { | ||
| free(buf); free(keys); free(values); fclose(f); return -1; | ||
| } |
There was a problem hiding this comment.
Use-after-free in WAL
on_update callback
buf, keys, and values are freed on lines 772–774, then immediately passed to on_update on line 776–777. The callback receives dangling pointers, causing undefined behavior on every WAL replay containing an UPDATE record. The error path on line 779 then attempts to free them a second time, compounding the issue with a double-free.
The three free() calls must be moved to after the on_update callback (and its error path) completes.
| int n = snprintf(expr + pos, sizeof(expr) - pos, | ||
| "%s%s == \"%s\"", | ||
| fi > 0 ? " AND " : "", | ||
| entries[fi].key, val); | ||
| if (n > 0) pos += (size_t)n; | ||
| } | ||
| expr[pos] = '\0'; |
There was a problem hiding this comment.
Stack buffer overflow when filter expression is truncated
snprintf returns the number of characters that would have been written (not what was actually written) when the buffer is too small. Adding that unclamped value to pos on line 620 will push pos beyond the 2048-byte expr array, and the unconditional expr[pos] = '\0' on line 622 then writes out of bounds on the stack.
The fix is to cap pos at the actual number of bytes written: size_t written = (size_t)n < sizeof(expr) - pos ? (size_t)n : sizeof(expr) - pos - 1; pos += written;
| /* Build "key == "val" AND key2 == "val2" ..." expression */ | ||
| char expr[2048]; | ||
| size_t pos = 0; | ||
| for (size_t fi = 0; fi < nfilter && pos < sizeof(expr) - 1; fi++) { | ||
| const char *val = json_get_string(entries[fi].value); | ||
| if (!val) continue; | ||
| int n = snprintf(expr + pos, sizeof(expr) - pos, | ||
| "%s%s == \"%s\"", | ||
| fi > 0 ? " AND " : "", | ||
| entries[fi].key, val); | ||
| if (n > 0) pos += (size_t)n; | ||
| } | ||
| expr[pos] = '\0'; | ||
| found = db_search_with_filter_expr(ctx->db, query, k, results, distance, expr); |
There was a problem hiding this comment.
Filter expression injection via unescaped metadata values
Metadata values from the JSON request body are interpolated directly into the filter expression string with == "%s" without escaping embedded " or \ characters. A value like x" OR 1==1 OR tag == "y would produce a malformed or manipulated filter expression, potentially bypassing the intended filter constraint in db_search_with_filter_expr.
Summary
py.typedin wheel for PEP 561 type checker supportTest plan
cmake -DBUILD_TESTS=ON .. && ctestall C tests passctest -R corrupt_walskips cleanly (no WAL file)mypy gigavectorresolves types viapy.typed