From 912d9a9fba2d74e80bbeb481a9db36e0623b8933 Mon Sep 17 00:00:00 2001 From: prrao87 Date: Tue, 17 Feb 2026 15:11:59 -0500 Subject: [PATCH 1/5] Clarify approx vs. true distance --- docs/search/vector-search.mdx | 68 +++++++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) diff --git a/docs/search/vector-search.mdx b/docs/search/vector-search.mdx index 3f182ea..272bf2c 100644 --- a/docs/search/vector-search.mdx +++ b/docs/search/vector-search.mdx @@ -60,9 +60,77 @@ The trade-off is that the results are not guaranteed to be the true nearest neig Use ANN search for large-scale applications where speed matters more than perfect recall. LanceDB uses approximate nearest neighbor algorithms to deliver fast results without examining every vector in your dataset. + +When a vector index is used, `_distance` is not always the true distance between full vectors. In ANN mode without refinement, LanceDB computes `_distance` using compressed vectors for speed. + + +### Exact vs Approximate Distances + +When doing vector search, the meaning of "distance" depends on whether you are using an index and whether `refine_factor` is specified as part of your query. +`nprobes` controls how many partitions are searched to find candidates, while `refine_factor` controls how many candidates are rescored on full vectors for better distance fidelity and reranking quality. + +The table below summarizes the behavior of `_distance` in search results based on your query configuration: + +| Query mode | Neighbor quality | `_distance` in results | +| :--- | :--- | :--- | +| No index or `.bypass_vector_index()` | Exact kNN (100% recall) | True distance on full vectors | +| Indexed ANN, no `refine_factor` | Approximate neighbors | Approximate distance on compressed/quantized vectors | +| Indexed ANN + `refine_factor(1)` | Approximate neighbors (same candidate set) | Distances recomputed on full vectors for reranked candidates | +| Indexed ANN + `refine_factor(>1)` | Better recall than no refine (usually) | Distances recomputed on full vectors for reranked candidates | + + +```python Python icon="python" +# Indexed ANN search without refinement (fast, approximate `_distance`) +fast_results = ( + table.search(embedding) + .limit(10) + .to_pandas() +) + +# Recompute distances on full vectors for reranked candidates +exact_distance_results = ( + table.search(embedding) + .limit(10) + .refine_factor(1) + .to_pandas() +) + +# Rerank a larger candidate set for better recall (higher latency) +higher_recall_results = ( + table.search(embedding) + .limit(10) + .refine_factor(20) + .to_pandas() +) +``` + +```typescript TypeScript icon="square-js" +// Indexed ANN search without refinement (fast, approximate `_distance`) +const fastResults = await (table.search(embedding) as lancedb.VectorQuery) + .limit(10) + .toArray(); + +// Recompute distances on full vectors for reranked candidates +const exactDistanceResults = await (table.search(embedding) as lancedb.VectorQuery) + .limit(10) + .refineFactor(1) + .toArray(); + +// Rerank a larger candidate set for better recall (higher latency) +const higherRecallResults = await (table.search(embedding) as lancedb.VectorQuery) + .limit(10) + .refineFactor(20) + .toArray(); +``` + + +For deeper tuning guidance on indexing and performance estimation, see [Vector Indexes](/indexing/vector-index/#search-configuration) and [FAQ: Do I need to set a refine factor when using an index?](/faq/faq-oss/#do-i-need-to-set-a-refine-factor-when-using-an-index), +and for tuning `nprobes`, see below. + ### Tuning `nprobes` - `nprobes` controls how many partitions are searched at query time. +- `nprobes` improves candidate recall, but does not by itself make `_distance` exact. - By default, LanceDB automatically tunes `nprobes` to achieve the best performance without noticeably sacrificing accuracy. - In most cases, leave `nprobes` unset and use the auto-tuned value. - Only tune `nprobes` manually when recall is below your target, or when you need even higher performance for your workload. From 0d37cdaa8ec3bd9d85833a38b01caadc95aaaebf Mon Sep 17 00:00:00 2001 From: prrao87 Date: Tue, 17 Feb 2026 15:25:23 -0500 Subject: [PATCH 2/5] Add XTR embedding example --- docs/search/multivector-search.mdx | 71 ++++++++++++++++++++++++++++-- 1 file changed, 67 insertions(+), 4 deletions(-) diff --git a/docs/search/multivector-search.mdx b/docs/search/multivector-search.mdx index 10c2a16..11adf12 100644 --- a/docs/search/multivector-search.mdx +++ b/docs/search/multivector-search.mdx @@ -15,8 +15,6 @@ In this tutorial, you'll create a table with multiple vector embeddings per docu Each item in your dataset can have a column containing multiple vectors, which LanceDB can efficiently index and search. When performing a search, you can query using either a single vector embedding or multiple vector embeddings. -LanceDB also integrates with [ConteXtualized Token Retriever (XTR)](https://arxiv.org/abs/2304.01982), an advanced retrieval model that prioritizes the most semantically important document tokens during search. This integration enhances the quality of search results by focusing on the most relevant token matches. - - Currently, only the `cosine` metric is supported for multivector search. - The vector value type can be `float16`, `float32`, or `float64`. @@ -153,9 +151,74 @@ results_multi = tbl.search(query_multi).limit(5).to_pandas() ``` -## What's Next? +## Example 1: Using XTR Embeddings -If you still need more guidance, you can try the complete [Multivector Search Notebook](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/saas_examples/python_notebook/Multivector_on_LanceDB_Cloud.ipynb). +[ConteXtualized Token Retriever (XTR)](https://arxiv.org/abs/2304.01982) is a late-interaction retrieval model that represents text as token-level vectors instead of a single embedding. +This lets search score token-to-token matches (MaxSim), which can improve fine-grained relevance. +In LanceDB, you use XTR by generating token vectors with a Hugging Face model and storing them in a multivector column. +1. Generate token-level embeddings (multivectors) with a Hugging Face model. +2. Store them in a `List[List[float]]` vector column. +3. Query with token-level embeddings from the same model. + +```python Python icon="python" +import lancedb +import pyarrow as pa +import torch +import torch.nn.functional as F +from transformers import AutoModel, AutoTokenizer + +# You can also use: "google/xtr-base-multilingual" +model_id = "google/xtr-base-en" +tokenizer = AutoTokenizer.from_pretrained(model_id) +model = AutoModel.from_pretrained(model_id).eval() + +def encode_multivector(text: str) -> list[list[float]]: + """Convert a text string into token-level normalized embeddings.""" + with torch.no_grad(): + inputs = tokenizer( + text, + return_tensors="pt", + truncation=True, + max_length=512, + ) + outputs = model(**inputs).last_hidden_state[0] # [tokens, dim] + token_mask = inputs["attention_mask"][0].bool() + token_vectors = outputs[token_mask] + token_vectors = F.normalize(token_vectors, p=2, dim=1) + return token_vectors.cpu().tolist() + +docs = [ + {"id": 1, "text": "A guide to hiking in the Alps"}, + {"id": 2, "text": "How to optimize IVF_PQ recall in vector search"}, + {"id": 3, "text": "Beginner's recipe for sourdough bread"}, +] + +for row in docs: + row["vector"] = encode_multivector(row["text"]) + +dim = len(docs[0]["vector"][0]) +schema = pa.schema( + [ + pa.field("id", pa.int64()), + pa.field("text", pa.string()), + pa.field("vector", pa.list_(pa.list_(pa.float32(), dim))), + ] +) + +db = lancedb.connect("data/xtr_demo") +table = db.create_table("xtr_docs", data=docs, schema=schema, mode="overwrite") +table.create_index(metric="cosine", vector_column_name="vector") + +query = "tips to improve vector search recall" +query_multivector = encode_multivector(query) +results = table.search(query_multivector).limit(3).to_pandas() +print(results[["id", "text", "_distance"]]) +``` + +For a complete end-to-end example, see +[Multivector search with XTR (Colab)](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/multivector_xtr/main.ipynb) +or [the full notebook source](https://github.com/lancedb/vectordb-recipes/tree/main/examples/multivector_xtr). +You can also review [Hugging Face embedding integration](/integrations/embedding/huggingface/). From e50002c55e0e0b24b50e68468cf2f980f7071b14 Mon Sep 17 00:00:00 2001 From: prrao87 Date: Tue, 17 Feb 2026 16:32:56 -0500 Subject: [PATCH 3/5] Add ColBERT search example with pylate --- docs/search/multivector-search.mdx | 136 ++++++++++++++++------------- 1 file changed, 75 insertions(+), 61 deletions(-) diff --git a/docs/search/multivector-search.mdx b/docs/search/multivector-search.mdx index 11adf12..e4e80ef 100644 --- a/docs/search/multivector-search.mdx +++ b/docs/search/multivector-search.mdx @@ -15,12 +15,11 @@ In this tutorial, you'll create a table with multiple vector embeddings per docu Each item in your dataset can have a column containing multiple vectors, which LanceDB can efficiently index and search. When performing a search, you can query using either a single vector embedding or multiple vector embeddings. - -- Currently, only the `cosine` metric is supported for multivector search. -- The vector value type can be `float16`, `float32`, or `float64`. - + +Currently, only the `cosine` metric is supported for multivector search. The vector value type can be `float16`, `float32`, or `float64`. + -### Computing Similarity +## Computing Similarity MaxSim (Maximum Similarity) is a key concept in late interaction models that: @@ -47,7 +46,7 @@ For now, you should use only the `cosine` metric for multivector search. The vector value type can be `float16`, `float32` or `float64`. -## Example: Multivector Search +## Using Multivector Search ### 1. Setup @@ -151,74 +150,89 @@ results_multi = tbl.search(query_multi).limit(5).to_pandas() ``` -## Example 1: Using XTR Embeddings -[ConteXtualized Token Retriever (XTR)](https://arxiv.org/abs/2304.01982) is a late-interaction retrieval model that represents text as token-level vectors instead of a single embedding. -This lets search score token-to-token matches (MaxSim), which can improve fine-grained relevance. -In LanceDB, you use XTR by generating token vectors with a Hugging Face model and storing them in a multivector column. +Visit the [Hugging Face embedding integration](/integrations/embedding/huggingface/) page for info on embedding models. -1. Generate token-level embeddings (multivectors) with a Hugging Face model. -2. Store them in a `List[List[float]]` vector column. -3. Query with token-level embeddings from the same model. +## Simple Example: ColBERT Embeddings + +[ColBERT](https://arxiv.org/abs/2004.12832) is the most well-known late-interaction retrieval model that +represents each document and query as multiple token embeddings and scores matches by taking the best +token-to-token similarities (MaxSim) across them. + +Install the dependencies before running this example: + +```bash +pip install pylate lancedb pandas +``` ```python Python icon="python" -import lancedb +import numpy as np import pyarrow as pa -import torch -import torch.nn.functional as F -from transformers import AutoModel, AutoTokenizer - -# You can also use: "google/xtr-base-multilingual" -model_id = "google/xtr-base-en" -tokenizer = AutoTokenizer.from_pretrained(model_id) -model = AutoModel.from_pretrained(model_id).eval() - -def encode_multivector(text: str) -> list[list[float]]: - """Convert a text string into token-level normalized embeddings.""" - with torch.no_grad(): - inputs = tokenizer( - text, - return_tensors="pt", - truncation=True, - max_length=512, - ) - outputs = model(**inputs).last_hidden_state[0] # [tokens, dim] - token_mask = inputs["attention_mask"][0].bool() - token_vectors = outputs[token_mask] - token_vectors = F.normalize(token_vectors, p=2, dim=1) - return token_vectors.cpu().tolist() +import lancedb +from pylate import models + +# 1) Load a late-interaction model via PyLate +# PyLate docs show ColBERT() + encode(..., is_query=...) :contentReference[oaicite:2]{index=2} +model = models.ColBERT(model_name_or_path="lightonai/GTE-ModernColBERT-v1") + +# You can discover dim from one embedding (avoid guessing) +dim = model.encode(["hello"], is_query=True)[0].shape[1] + +# 2) Create a LanceDB table with a multivector column +db = lancedb.connect("./pylate_lancedb") +schema = pa.schema([ + pa.field("doc_id", pa.string()), + pa.field("text", pa.string()), + # multivector: list> :contentReference[oaicite:3]{index=3} + pa.field("mv", pa.list_(pa.list_(pa.float32(), dim))), +]) docs = [ - {"id": 1, "text": "A guide to hiking in the Alps"}, - {"id": 2, "text": "How to optimize IVF_PQ recall in vector search"}, - {"id": 3, "text": "Beginner's recipe for sourdough bread"}, + {"doc_id": "1", "text": "The train to Tokyo leaves at 5pm."}, + {"doc_id": "2", "text": "That Pho restaurant in Hanoi is highly rated."}, + {"doc_id": "3", "text": "This is a noodle bar in Osaka, Japan."}, ] -for row in docs: - row["vector"] = encode_multivector(row["text"]) +# 3) Encode documents with PyLate (token vectors per doc) +doc_texts = [d["text"] for d in docs] +doc_embs = model.encode(doc_texts, is_query=False) # list/array of (T, dim) per doc :contentReference[oaicite:4]{index=4} -dim = len(docs[0]["vector"][0]) -schema = pa.schema( - [ - pa.field("id", pa.int64()), - pa.field("text", pa.string()), - pa.field("vector", pa.list_(pa.list_(pa.float32(), dim))), - ] -) +rows = [] +for d, emb in zip(docs, doc_embs): + emb = np.asarray(emb, dtype=np.float32) + rows.append({**d, "mv": emb.tolist()}) + +tbl = db.create_table("docs", data=rows, schema=schema, mode="overwrite") -db = lancedb.connect("data/xtr_demo") -table = db.create_table("xtr_docs", data=docs, schema=schema, mode="overwrite") -table.create_index(metric="cosine", vector_column_name="vector") +# 4) If your dataset is large, build an index + query using a query matrix +# For small datasets < 100k records, you can skip indexing +# tbl.create_index(vector_column_name="mv", metric="cosine") -query = "tips to improve vector search recall" -query_multivector = encode_multivector(query) -results = table.search(query_multivector).limit(3).to_pandas() -print(results[["id", "text", "_distance"]]) +query = "Tell me about ramen in Japan" +q_emb = np.asarray(model.encode([query], is_query=True)[0], dtype=np.float32) # (Tq, dim) :contentReference[oaicite:5]{index=5} + +out = tbl.search(q_emb).limit(5).to_pandas() # multivector search accepts a matrix :contentReference[oaicite:6]{index=6} +print(out[["doc_id", "text"]]) ``` -For a complete end-to-end example, see -[Multivector search with XTR (Colab)](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/multivector_xtr/main.ipynb) -or [the full notebook source](https://github.com/lancedb/vectordb-recipes/tree/main/examples/multivector_xtr). -You can also review [Hugging Face embedding integration](/integrations/embedding/huggingface/). +Late interaction models implementations evolve rapidly, so it's recommended to check the latest popular models +when trying out multivector search. + +## Advanced Example: XTR Embeddings + +[ConteXtualized Token Retriever (XTR)](https://arxiv.org/abs/2304.01982) is a late-interaction retrieval model that represents text as token-level vectors instead of a single embedding. +This lets search score token-to-token matches (MaxSim), which can improve fine-grained relevance. + +The notebook linked below shows how to integrate XTR (ConteXtualized Token Retriever), which prioritizes critical document +tokens during the initial retrieval stage and removes the gathering stage to significantly improve performance. +By focusing on the most semantically salient tokens early in the process, XTR reduces computational complexity +with improved recall, ensuring rapid identification of candidate documents. + + + From 874b767017e023c1a553aa9d3b8843f0464ac580 Mon Sep 17 00:00:00 2001 From: prrao87 Date: Tue, 17 Feb 2026 16:38:11 -0500 Subject: [PATCH 4/5] Fix broken links --- docs/faq/faq-cloud.mdx | 4 ++-- docs/search/vector-search.mdx | 2 +- docs/tutorials/agents/time-travel-rag/index.mdx | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/faq/faq-cloud.mdx b/docs/faq/faq-cloud.mdx index 5d2a8d6..c90bc34 100644 --- a/docs/faq/faq-cloud.mdx +++ b/docs/faq/faq-cloud.mdx @@ -75,7 +75,7 @@ indexed and unindexed rows. It is strongly recommended to create scalar indices on the filter columns. Scalar indices will reduce the amount of data that needs to be scanned and thus speed up the filter. LanceDB supports `BITMAP`, `BTREE`, and `LABEL_LIST` as our scalar index types. You -can see more details [here](/indexing#scalar-index). +can see more details [here](/indexing/scalar-index). ### Does LanceDB always recreate the full index or incrementally update existing centroids? LanceDB implements an optimization algorithm to decide whether a delta index will be @@ -102,7 +102,7 @@ following: It is recommended to run queries from an EC2 instance that is in the same region. - Create scalar indices: If you are filtering on metadata, it is recommended to create scalar indices on those columns. This will speed up searches with metadata filtering. - See [here](/indexing#scalar-index) for more details on creating a scalar index. + See [here](/indexing/scalar-index) for more details on creating a scalar index. ### Will I always query the latest data? - For LanceDB Cloud users, yes, strong consistency is guaranteed. diff --git a/docs/search/vector-search.mdx b/docs/search/vector-search.mdx index 272bf2c..5378e74 100644 --- a/docs/search/vector-search.mdx +++ b/docs/search/vector-search.mdx @@ -124,7 +124,7 @@ const higherRecallResults = await (table.search(embedding) as lancedb.VectorQuer ``` -For deeper tuning guidance on indexing and performance estimation, see [Vector Indexes](/indexing/vector-index/#search-configuration) and [FAQ: Do I need to set a refine factor when using an index?](/faq/faq-oss/#do-i-need-to-set-a-refine-factor-when-using-an-index), +For deeper tuning guidance on indexing and performance estimation, see [Vector Indexes](/indexing/vector-index/#search-configuration) and [FAQ: Do I need to set a refine factor when using an index?](/faq/faq-oss#do-i-need-to-build-a-vector-index-to-run-vector-search), and for tuning `nprobes`, see below. ### Tuning `nprobes` diff --git a/docs/tutorials/agents/time-travel-rag/index.mdx b/docs/tutorials/agents/time-travel-rag/index.mdx index 67b149f..2893d03 100644 --- a/docs/tutorials/agents/time-travel-rag/index.mdx +++ b/docs/tutorials/agents/time-travel-rag/index.mdx @@ -32,7 +32,7 @@ vector databases are ill-equipped to handle. 4. "We need to A/B test a new chunking strategy, but we can't disrupt the production system or duplicate the entire dataset." Experimentation is vital for improvement, but it can't come at the cost of production stability or a massive infrastructure bill. -LanceDB's [zero-cost data evolution](/tables/schema) and [time-travel capabilities](/tables/versioning#time-travel) directly address these critical enterprise pain points, providing the foundation for a reliable, auditable, and production-ready RAG system. +LanceDB's [zero-cost data evolution](/tables/schema) and [time-travel capabilities](https://docs.lancedb.com/tables/versioning) directly address these critical enterprise pain points, providing the foundation for a reliable, auditable, and production-ready RAG system. ## Dataset: The U.S. Federal Register From 18d7854e0858938229ad4d930e64deb5e63c4976 Mon Sep 17 00:00:00 2001 From: prrao87 Date: Tue, 17 Feb 2026 16:41:24 -0500 Subject: [PATCH 5/5] Fix link --- docs/search/vector-search.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/search/vector-search.mdx b/docs/search/vector-search.mdx index 5378e74..f7118d9 100644 --- a/docs/search/vector-search.mdx +++ b/docs/search/vector-search.mdx @@ -124,8 +124,8 @@ const higherRecallResults = await (table.search(embedding) as lancedb.VectorQuer ``` -For deeper tuning guidance on indexing and performance estimation, see [Vector Indexes](/indexing/vector-index/#search-configuration) and [FAQ: Do I need to set a refine factor when using an index?](/faq/faq-oss#do-i-need-to-build-a-vector-index-to-run-vector-search), -and for tuning `nprobes`, see below. +For deeper tuning guidance on indexing and performance estimation, see the [vector indexes](/indexing/vector-index/#search-configuration) page, +For tuning `nprobes`, see below. ### Tuning `nprobes`