Skip to content

[Detail Bug] Vector store retries don’t re-issue Qdrant requests after transient failures #61

Description

@detail-app

Detail Bug Report

https://app.detail.dev/org_befd6425-a158-4e24-9d4d-1e5c08769515/bugs/bug_10c7569c-4758-4853-973b-fe24be09a0b2

Introduced in #10 by @WilliamAGH on Feb 7, 2026

Summary

  • Context: HybridVectorService handles Qdrant vector store operations (upsert, delete, count) with retry support via RetrySupport.executeWithRetry.
  • Bug: The Qdrant ListenableFuture is created outside the retry lambda in doUpsert, doDeleteByUrl, and doCountPointsForUrl, so retries await the same already-failed future instead of issuing a new gRPC request.
  • Actual vs. expected: On transient failures, retries should send fresh gRPC requests; instead, every retry re-throws the original failure from the same failed future, making retries completely ineffective.
  • Impact: Any transient Qdrant failure (connection reset, gRPC UNAVAILABLE, timeout) causes the operation to permanently fail instead of recovering on retry — affecting document upserts, deletions, and URL-count queries.

Code with Bug

doUpsert:

var upsertFuture = qdrantClient.upsertAsync(Objects.requireNonNull(collectionName), points);
RetrySupport.executeWithRetry(
    () -> QdrantFutureAwaiter.awaitFuture(upsertFuture, UPSERT_TIMEOUT_SECONDS), // <-- BUG 🔴 retries await same failed future; no new RPC
    "Qdrant hybrid upsert");

doDeleteByUrl:

var deleteFuture =
    qdrantClient.deleteAsync(Objects.requireNonNull(collectionName), Objects.requireNonNull(filter));
RetrySupport.executeWithRetry(
    () -> QdrantFutureAwaiter.awaitFuture(deleteFuture, DELETE_TIMEOUT_SECONDS), // <-- BUG 🔴 retries await same failed future; no new RPC
    "Qdrant delete by URL");

doCountPointsForUrl:

var countFuture =
    qdrantClient.countAsync(Objects.requireNonNull(collectionName), Objects.requireNonNull(filter), true);
RetrySupport.executeWithRetry(
    () -> QdrantFutureAwaiter.awaitFuture(countFuture, COUNT_TIMEOUT_SECONDS), // <-- BUG 🔴 retries await same failed future; no new RPC
    "Qdrant count by URL");

Correct pattern elsewhere in the same class (async call created inside the retry operation):

ScrollResponse scrollResponse = RetrySupport.executeWithRetry(
    () -> QdrantFutureAwaiter.awaitFuture(
        qdrantClient.scrollAsync(Objects.requireNonNull(scrollRequest)), SCROLL_TIMEOUT_SECONDS),
    "Qdrant scroll URLs");

Explanation

RetrySupport.executeWithRetry re-invokes the provided lambda on each attempt. However, these methods capture a single ListenableFuture created before retry begins. If the first RPC fails and the future completes exceptionally, subsequent attempts simply call get() on the same already-completed failed future and deterministically rethrow the same underlying exception.

Because GrpcFuture extends Guava AbstractFuture (write-once completion state), a failed future cannot be “refreshed”; calling get() again does not trigger a new RPC. Result: retries log as if they are happening, but no new gRPC requests are sent.

Codebase Inconsistency

Within HybridVectorService, other Qdrant operations (scrollAllUrlsInCollection, updatePayloadByFilter) already use the correct retry pattern by creating the async request inside the retry lambda. This inconsistency indicates these three methods are unintentionally bypassing retry semantics.

Recommended Fix

Move the Qdrant async call inside the retry lambda in all three methods (doUpsert, doDeleteByUrl, doCountPointsForUrl) so each retry attempt issues a fresh gRPC request.

History

This bug was introduced in commit fbc2a64, which refactored these methods to extract the ListenableFuture into a local variable (to add Objects.requireNonNull boundaries) while claiming to keep behavior unchanged; the extraction inadvertently captured the future outside the retry lambda and broke retries.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions