Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/faq/faq-cloud.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ indexed and unindexed rows.
It is strongly recommended to create scalar indices on the filter columns. Scalar indices
will reduce the amount of data that needs to be scanned and thus speed up the filter.
LanceDB supports `BITMAP`, `BTREE`, and `LABEL_LIST` as our scalar index types. You
can see more details [here](/indexing#scalar-index).
can see more details [here](/indexing/scalar-index).

### Does LanceDB always recreate the full index or incrementally update existing centroids?
LanceDB implements an optimization algorithm to decide whether a delta index will be
Expand All @@ -102,7 +102,7 @@ following:
It is recommended to run queries from an EC2 instance that is in the same region.
- Create scalar indices: If you are filtering on metadata, it is recommended to
create scalar indices on those columns. This will speed up searches with metadata filtering.
See [here](/indexing#scalar-index) for more details on creating a scalar index.
See [here](/indexing/scalar-index) for more details on creating a scalar index.

### Will I always query the latest data?
- For LanceDB Cloud users, yes, strong consistency is guaranteed.
Expand Down
97 changes: 87 additions & 10 deletions docs/search/multivector-search.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,11 @@ In this tutorial, you'll create a table with multiple vector embeddings per docu

Each item in your dataset can have a column containing multiple vectors, which LanceDB can efficiently index and search. When performing a search, you can query using either a single vector embedding or multiple vector embeddings.

LanceDB also integrates with [ConteXtualized Token Retriever (XTR)](https://arxiv.org/abs/2304.01982), an advanced retrieval model that prioritizes the most semantically important document tokens during search. This integration enhances the quality of search results by focusing on the most relevant token matches.

<Tip>
- Currently, only the `cosine` metric is supported for multivector search.
- The vector value type can be `float16`, `float32`, or `float64`.
</Tip>
<Warning>
Currently, only the `cosine` metric is supported for multivector search. The vector value type can be `float16`, `float32`, or `float64`.
</Warning>

### Computing Similarity
## Computing Similarity

MaxSim (Maximum Similarity) is a key concept in late interaction models that:

Expand All @@ -49,7 +46,7 @@ For now, you should use only the `cosine` metric for multivector search.
The vector value type can be `float16`, `float32` or `float64`.
</Warning>

## Example: Multivector Search
## Using Multivector Search

### 1. Setup

Expand Down Expand Up @@ -153,9 +150,89 @@ results_multi = tbl.search(query_multi).limit(5).to_pandas()
```
</CodeGroup>

## What's Next?

If you still need more guidance, you can try the complete [Multivector Search Notebook](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/saas_examples/python_notebook/Multivector_on_LanceDB_Cloud.ipynb).
Visit the [Hugging Face embedding integration](/integrations/embedding/huggingface/) page for info on embedding models.

## Simple Example: ColBERT Embeddings

[ColBERT](https://arxiv.org/abs/2004.12832) is the most well-known late-interaction retrieval model that
represents each document and query as multiple token embeddings and scores matches by taking the best
token-to-token similarities (MaxSim) across them.

Install the dependencies before running this example:

```bash
pip install pylate lancedb pandas
```

<CodeGroup>
```python Python icon="python"
import numpy as np
import pyarrow as pa
import lancedb
from pylate import models

# 1) Load a late-interaction model via PyLate
# PyLate docs show ColBERT() + encode(..., is_query=...) :contentReference[oaicite:2]{index=2}
model = models.ColBERT(model_name_or_path="lightonai/GTE-ModernColBERT-v1")

# You can discover dim from one embedding (avoid guessing)
dim = model.encode(["hello"], is_query=True)[0].shape[1]

# 2) Create a LanceDB table with a multivector column
db = lancedb.connect("./pylate_lancedb")
schema = pa.schema([
pa.field("doc_id", pa.string()),
pa.field("text", pa.string()),
# multivector: list<list<float32, dim>> :contentReference[oaicite:3]{index=3}
pa.field("mv", pa.list_(pa.list_(pa.float32(), dim))),
])

docs = [
{"doc_id": "1", "text": "The train to Tokyo leaves at 5pm."},
{"doc_id": "2", "text": "That Pho restaurant in Hanoi is highly rated."},
{"doc_id": "3", "text": "This is a noodle bar in Osaka, Japan."},
]

# 3) Encode documents with PyLate (token vectors per doc)
doc_texts = [d["text"] for d in docs]
doc_embs = model.encode(doc_texts, is_query=False) # list/array of (T, dim) per doc :contentReference[oaicite:4]{index=4}

rows = []
for d, emb in zip(docs, doc_embs):
emb = np.asarray(emb, dtype=np.float32)
rows.append({**d, "mv": emb.tolist()})

tbl = db.create_table("docs", data=rows, schema=schema, mode="overwrite")

# 4) If your dataset is large, build an index + query using a query matrix
# For small datasets < 100k records, you can skip indexing
# tbl.create_index(vector_column_name="mv", metric="cosine")

query = "Tell me about ramen in Japan"
q_emb = np.asarray(model.encode([query], is_query=True)[0], dtype=np.float32) # (Tq, dim) :contentReference[oaicite:5]{index=5}

out = tbl.search(q_emb).limit(5).to_pandas() # multivector search accepts a matrix :contentReference[oaicite:6]{index=6}
print(out[["doc_id", "text"]])
```
</CodeGroup>

Late interaction models implementations evolve rapidly, so it's recommended to check the latest popular models
when trying out multivector search.

## Advanced Example: XTR Embeddings

[ConteXtualized Token Retriever (XTR)](https://arxiv.org/abs/2304.01982) is a late-interaction retrieval model that represents text as token-level vectors instead of a single embedding.
This lets search score token-to-token matches (MaxSim), which can improve fine-grained relevance.

The notebook linked below shows how to integrate XTR (ConteXtualized Token Retriever), which prioritizes critical document
tokens during the initial retrieval stage and removes the gathering stage to significantly improve performance.
By focusing on the most semantically salient tokens early in the process, XTR reduces computational complexity
with improved recall, ensuring rapid identification of candidate documents.

<Card
icon="book"
title="Multivector search with ColPali + XTR embeddings and LanceDB"
href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/multivector_xtr/main.ipynb"
>
</Card>
68 changes: 68 additions & 0 deletions docs/search/vector-search.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -60,9 +60,77 @@ The trade-off is that the results are not guaranteed to be the true nearest neig

Use ANN search for large-scale applications where speed matters more than perfect recall. LanceDB uses approximate nearest neighbor algorithms to deliver fast results without examining every vector in your dataset.

<Warning>
When a vector index is used, `_distance` is not always the true distance between full vectors. In ANN mode without refinement, LanceDB computes `_distance` using compressed vectors for speed.
</Warning>

### Exact vs Approximate Distances

When doing vector search, the meaning of "distance" depends on whether you are using an index and whether `refine_factor` is specified as part of your query.
`nprobes` controls how many partitions are searched to find candidates, while `refine_factor` controls how many candidates are rescored on full vectors for better distance fidelity and reranking quality.

The table below summarizes the behavior of `_distance` in search results based on your query configuration:

| Query mode | Neighbor quality | `_distance` in results |
| :--- | :--- | :--- |
| No index or `.bypass_vector_index()` | Exact kNN (100% recall) | True distance on full vectors |
| Indexed ANN, no `refine_factor` | Approximate neighbors | Approximate distance on compressed/quantized vectors |
| Indexed ANN + `refine_factor(1)` | Approximate neighbors (same candidate set) | Distances recomputed on full vectors for reranked candidates |
| Indexed ANN + `refine_factor(>1)` | Better recall than no refine (usually) | Distances recomputed on full vectors for reranked candidates |

<CodeGroup>
```python Python icon="python"
# Indexed ANN search without refinement (fast, approximate `_distance`)
fast_results = (
table.search(embedding)
.limit(10)
.to_pandas()
)

# Recompute distances on full vectors for reranked candidates
exact_distance_results = (
table.search(embedding)
.limit(10)
.refine_factor(1)
.to_pandas()
)

# Rerank a larger candidate set for better recall (higher latency)
higher_recall_results = (
table.search(embedding)
.limit(10)
.refine_factor(20)
.to_pandas()
)
```

```typescript TypeScript icon="square-js"
// Indexed ANN search without refinement (fast, approximate `_distance`)
const fastResults = await (table.search(embedding) as lancedb.VectorQuery)
.limit(10)
.toArray();

// Recompute distances on full vectors for reranked candidates
const exactDistanceResults = await (table.search(embedding) as lancedb.VectorQuery)
.limit(10)
.refineFactor(1)
.toArray();

// Rerank a larger candidate set for better recall (higher latency)
const higherRecallResults = await (table.search(embedding) as lancedb.VectorQuery)
.limit(10)
.refineFactor(20)
.toArray();
```
</CodeGroup>

For deeper tuning guidance on indexing and performance estimation, see the [vector indexes](/indexing/vector-index/#search-configuration) page,
For tuning `nprobes`, see below.

### Tuning `nprobes`

- `nprobes` controls how many partitions are searched at query time.
- `nprobes` improves candidate recall, but does not by itself make `_distance` exact.
- By default, LanceDB automatically tunes `nprobes` to achieve the best performance without noticeably sacrificing accuracy.
- In most cases, leave `nprobes` unset and use the auto-tuned value.
- Only tune `nprobes` manually when recall is below your target, or when you need even higher performance for your workload.
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/agents/time-travel-rag/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ vector databases are ill-equipped to handle.

4. "We need to A/B test a new chunking strategy, but we can't disrupt the production system or duplicate the entire dataset." Experimentation is vital for improvement, but it can't come at the cost of production stability or a massive infrastructure bill.

LanceDB's [zero-cost data evolution](/tables/schema) and [time-travel capabilities](/tables/versioning#time-travel) directly address these critical enterprise pain points, providing the foundation for a reliable, auditable, and production-ready RAG system.
LanceDB's [zero-cost data evolution](/tables/schema) and [time-travel capabilities](https://docs.lancedb.com/tables/versioning) directly address these critical enterprise pain points, providing the foundation for a reliable, auditable, and production-ready RAG system.

## Dataset: The U.S. Federal Register

Expand Down
Loading