lancedb · prrao87 · Feb 17, 2026 · Feb 17, 2026 · Feb 17, 2026 · Feb 17, 2026
diff --git a/docs/faq/faq-cloud.mdx b/docs/faq/faq-cloud.mdx
@@ -75,7 +75,7 @@ indexed and unindexed rows.
 It is strongly recommended to create scalar indices on the filter columns. Scalar indices
 will reduce the amount of data that needs to be scanned and thus speed up the filter.
 LanceDB supports `BITMAP`, `BTREE`, and `LABEL_LIST` as our scalar index types. You 
-can see more details [here](/indexing#scalar-index).
+can see more details [here](/indexing/scalar-index).
 
 ### Does LanceDB always recreate the full index or incrementally update existing centroids?
 LanceDB implements an optimization algorithm to decide whether a delta index will be 
@@ -102,7 +102,7 @@ following:
   It is recommended to run queries from an EC2 instance that is in the same region. 
 - Create scalar indices: If you are filtering on metadata, it is recommended to 
   create scalar indices on those columns. This will speed up searches with metadata filtering. 
-  See [here](/indexing#scalar-index) for more details on creating a scalar index.
+  See [here](/indexing/scalar-index) for more details on creating a scalar index.
 
 ### Will I always query the latest data?
 - For LanceDB Cloud users, yes, strong consistency is guaranteed. 

diff --git a/docs/search/multivector-search.mdx b/docs/search/multivector-search.mdx
@@ -15,14 +15,11 @@ In this tutorial, you'll create a table with multiple vector embeddings per docu
 
 Each item in your dataset can have a column containing multiple vectors, which LanceDB can efficiently index and search. When performing a search, you can query using either a single vector embedding or multiple vector embeddings. 
 
-LanceDB also integrates with [ConteXtualized Token Retriever (XTR)](https://arxiv.org/abs/2304.01982), an advanced retrieval model that prioritizes the most semantically important document tokens during search. This integration enhances the quality of search results by focusing on the most relevant token matches.
-
-<Tip>
-- Currently, only the `cosine` metric is supported for multivector search.
-- The vector value type can be `float16`, `float32`, or `float64`.
-</Tip>
+<Warning>
+Currently, only the `cosine` metric is supported for multivector search. The vector value type can be `float16`, `float32`, or `float64`.
+</Warning>
 
-### Computing Similarity
+## Computing Similarity
 
 MaxSim (Maximum Similarity) is a key concept in late interaction models that:
 
@@ -49,7 +46,7 @@ For now, you should use only the `cosine` metric for multivector search.
 The vector value type can be `float16`, `float32` or `float64`.
 </Warning>
 
-## Example: Multivector Search
+## Using Multivector Search
 
 ### 1. Setup
 
@@ -153,9 +150,89 @@ results_multi = tbl.search(query_multi).limit(5).to_pandas()
 ```
 </CodeGroup>
 
-## What's Next?
 
-If you still need more guidance, you can try the complete [Multivector Search Notebook](https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/saas_examples/python_notebook/Multivector_on_LanceDB_Cloud.ipynb).
+Visit the [Hugging Face embedding integration](/integrations/embedding/huggingface/) page for info on embedding models.
+
+## Simple Example: ColBERT Embeddings
+
+[ColBERT](https://arxiv.org/abs/2004.12832) is the most well-known late-interaction retrieval model that
+represents each document and query as multiple token embeddings and scores matches by taking the best
+token-to-token similarities (MaxSim) across them.
+
+Install the dependencies before running this example:
+
+```bash
+pip install pylate lancedb pandas
+```
+
+<CodeGroup>
+```python Python icon="python"
+import numpy as np
+import pyarrow as pa
+import lancedb
+from pylate import models
+
+# 1) Load a late-interaction model via PyLate
+# PyLate docs show ColBERT() + encode(..., is_query=...) :contentReference[oaicite:2]{index=2}
+model = models.ColBERT(model_name_or_path="lightonai/GTE-ModernColBERT-v1")
+
+# You can discover dim from one embedding (avoid guessing)
+dim = model.encode(["hello"], is_query=True)[0].shape[1]
+
+# 2) Create a LanceDB table with a multivector column
+db = lancedb.connect("./pylate_lancedb")
+schema = pa.schema([
+    pa.field("doc_id", pa.string()),
+    pa.field("text", pa.string()),
+    # multivector: list<list<float32, dim>> :contentReference[oaicite:3]{index=3}
+    pa.field("mv", pa.list_(pa.list_(pa.float32(), dim))),
+])
+
+docs = [
+    {"doc_id": "1", "text": "The train to Tokyo leaves at 5pm."},
+    {"doc_id": "2", "text": "That Pho restaurant in Hanoi is highly rated."},
+    {"doc_id": "3", "text": "This is a noodle bar in Osaka, Japan."},
+]
+
+# 3) Encode documents with PyLate (token vectors per doc)
+doc_texts = [d["text"] for d in docs]
+doc_embs = model.encode(doc_texts, is_query=False)  # list/array of (T, dim) per doc :contentReference[oaicite:4]{index=4}
+
+rows = []
+for d, emb in zip(docs, doc_embs):
+    emb = np.asarray(emb, dtype=np.float32)
+    rows.append({**d, "mv": emb.tolist()})
+
+tbl = db.create_table("docs", data=rows, schema=schema, mode="overwrite")
+
+# 4) If your dataset is large, build an index + query using a query matrix
+# For small datasets < 100k records, you can skip indexing
+# tbl.create_index(vector_column_name="mv", metric="cosine")
+
+query = "Tell me about ramen in Japan"
+q_emb = np.asarray(model.encode([query], is_query=True)[0], dtype=np.float32)  # (Tq, dim) :contentReference[oaicite:5]{index=5}
+
+out = tbl.search(q_emb).limit(5).to_pandas()  # multivector search accepts a matrix :contentReference[oaicite:6]{index=6}
+print(out[["doc_id", "text"]])
+```
+</CodeGroup>
+
+Late interaction models implementations evolve rapidly, so it's recommended to check the latest popular models
+when trying out multivector search.
+
+## Advanced Example: XTR Embeddings
 
+[ConteXtualized Token Retriever (XTR)](https://arxiv.org/abs/2304.01982) is a late-interaction retrieval model that represents text as token-level vectors instead of a single embedding.
+This lets search score token-to-token matches (MaxSim), which can improve fine-grained relevance.
 
+The notebook linked below shows how to integrate XTR (ConteXtualized Token Retriever), which prioritizes critical document
+tokens during the initial retrieval stage and removes the gathering stage to significantly improve performance.
+By focusing on the most semantically salient tokens early in the process, XTR reduces computational complexity
+with improved recall, ensuring rapid identification of candidate documents.
 
+<Card
+    icon="book"
+    title="Multivector search with ColPali + XTR embeddings and LanceDB"
+    href="https://colab.research.google.com/github/lancedb/vectordb-recipes/blob/main/examples/multivector_xtr/main.ipynb"
+>
+</Card>
diff --git a/docs/search/vector-search.mdx b/docs/search/vector-search.mdx
@@ -60,9 +60,77 @@ The trade-off is that the results are not guaranteed to be the true nearest neig
 
 Use ANN search for large-scale applications where speed matters more than perfect recall. LanceDB uses approximate nearest neighbor algorithms to deliver fast results without examining every vector in your dataset.
 
+<Warning>
+When a vector index is used, `_distance` is not always the true distance between full vectors. In ANN mode without refinement, LanceDB computes `_distance` using compressed vectors for speed.
+</Warning>
+
+### Exact vs Approximate Distances
+
+When doing vector search, the meaning of "distance" depends on whether you are using an index and whether `refine_factor` is specified as part of your query.
+`nprobes` controls how many partitions are searched to find candidates, while `refine_factor` controls how many candidates are rescored on full vectors for better distance fidelity and reranking quality.
+
+The table below summarizes the behavior of `_distance` in search results based on your query configuration:
+
+| Query mode | Neighbor quality | `_distance` in results |
+| :--- | :--- | :--- |
+| No index or `.bypass_vector_index()` | Exact kNN (100% recall) | True distance on full vectors |
+| Indexed ANN, no `refine_factor` | Approximate neighbors | Approximate distance on compressed/quantized vectors |
+| Indexed ANN + `refine_factor(1)` | Approximate neighbors (same candidate set) | Distances recomputed on full vectors for reranked candidates |
+| Indexed ANN + `refine_factor(>1)` | Better recall than no refine (usually) | Distances recomputed on full vectors for reranked candidates |
+
+<CodeGroup>
+```python Python icon="python"
+# Indexed ANN search without refinement (fast, approximate `_distance`)
+fast_results = (
+    table.search(embedding)
+    .limit(10)
+    .to_pandas()
+)
+
+# Recompute distances on full vectors for reranked candidates
+exact_distance_results = (
+    table.search(embedding)
+    .limit(10)
+    .refine_factor(1)
+    .to_pandas()
+)
+
+# Rerank a larger candidate set for better recall (higher latency)
+higher_recall_results = (
+    table.search(embedding)
+    .limit(10)
+    .refine_factor(20)
+    .to_pandas()
+)
+```
+
+```typescript TypeScript icon="square-js"
+// Indexed ANN search without refinement (fast, approximate `_distance`)
+const fastResults = await (table.search(embedding) as lancedb.VectorQuery)
+  .limit(10)
+  .toArray();
+
+// Recompute distances on full vectors for reranked candidates
+const exactDistanceResults = await (table.search(embedding) as lancedb.VectorQuery)
+  .limit(10)
+  .refineFactor(1)
+  .toArray();
+
+// Rerank a larger candidate set for better recall (higher latency)
+const higherRecallResults = await (table.search(embedding) as lancedb.VectorQuery)
+  .limit(10)
+  .refineFactor(20)
+  .toArray();
+```
+</CodeGroup>
+
+For deeper tuning guidance on indexing and performance estimation, see the [vector indexes](/indexing/vector-index/#search-configuration) page,
+For tuning `nprobes`, see below.
+
 ### Tuning `nprobes`
 
 - `nprobes` controls how many partitions are searched at query time.
+- `nprobes` improves candidate recall, but does not by itself make `_distance` exact.
 - By default, LanceDB automatically tunes `nprobes` to achieve the best performance without noticeably sacrificing accuracy.
 - In most cases, leave `nprobes` unset and use the auto-tuned value.
 - Only tune `nprobes` manually when recall is below your target, or when you need even higher performance for your workload.

diff --git a/docs/tutorials/agents/time-travel-rag/index.mdx b/docs/tutorials/agents/time-travel-rag/index.mdx
@@ -32,7 +32,7 @@ vector databases are ill-equipped to handle.
 
 4. "We need to A/B test a new chunking strategy, but we can't disrupt the production system or duplicate the entire dataset." Experimentation is vital for improvement, but it can't come at the cost of production stability or a massive infrastructure bill.
 
-LanceDB's [zero-cost data evolution](/tables/schema) and [time-travel capabilities](/tables/versioning#time-travel) directly address these critical enterprise pain points, providing the foundation for a reliable, auditable, and production-ready RAG system.
+LanceDB's [zero-cost data evolution](/tables/schema) and [time-travel capabilities](https://docs.lancedb.com/tables/versioning) directly address these critical enterprise pain points, providing the foundation for a reliable, auditable, and production-ready RAG system.
 
 ## Dataset: The U.S. Federal Register