From 892c7438552db0e8f54f6c58baae4a16005aec7f Mon Sep 17 00:00:00 2001
From: prrao87 <prrao87@gmail.com>
Date: Wed, 18 Feb 2026 15:52:40 -0500
Subject: [PATCH 1/2] Update supported index type

---
 docs/search/vector-search.mdx | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/docs/search/vector-search.mdx b/docs/search/vector-search.mdx
index f7118d9..e330948 100644
--- a/docs/search/vector-search.mdx
+++ b/docs/search/vector-search.mdx
@@ -21,19 +21,30 @@ Ensure you always use the same distance metric that your embedding model was tra
 
 The right metric improves both search accuracy and query performance. Currently, LanceDB supports the following metrics:
 
-| Metric    | Description                                                                                                                                                                                                                                                          | Default |
-| :-------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------ |
-| `l2`      | [Euclidean distance](https://en.wikipedia.org/wiki/Euclidean_distance) - measures the straight-line distance between two points in vector space. Calculated as the square root of the sum of squared differences between corresponding vector components.            | ✓       |
-| `cosine`  | [Cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) - measures the cosine of the angle between two vectors, ranging from -1 to 1. Computed as the dot product divided by the product of vector magnitudes. Use for unnormalized vectors.            | x       |
-| `dot`     | [Dot product](https://en.wikipedia.org/wiki/Dot_product) - calculates the sum of products of corresponding vector components. Provides raw similarity scores without normalization, sensitive to vector magnitudes. Use for normalized vectors for best performance. | x       |
-| `hamming` | [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) - counts the number of positions where corresponding bits differ between binary vectors. Only applicable to binary vectors stored as packed uint8 arrays.                                         | x       |
+| Distance metric | Mathematical form | Notes |
+|---|---|---|
+| `l2` | $\|x-y\|_2=\sqrt{\sum_i (x_i-y_i)^2}$ | Measures the straight-line distance between two points in vector space. Calculated as the square root of the sum of squared differences between corresponding vector components. |
+| `cosine` | $1-\frac{x\cdot y}{\|x\|_2\|y\|_2}$ | Measures directional difference between vectors. Computed as 1 minus cosine similarity (the dot product normalized by both vector magnitudes), so vector length does not affect the score. Use for unnormalized vectors. |
+| `dot` | $x\cdot y=\sum_i x_i y_i$ | Calculates the sum of products of corresponding vector components. Provides raw similarity scores without normalization, sensitive to vector magnitudes. Use for normalized vectors for best performance. |
+| `hamming` | $\sum_i \mathbf{1}[x_i\neq y_i]$ | Counts the number of positions where corresponding bits differ between binary vectors. Only applicable to binary vectors stored as packed uint8 arrays. |
+
+For indexed search, supported distance metrics vary by index type:
+
+| Index type | Supported distance metrics |
+|---|---|
+| `IVF_FLAT` | `["l2", "cosine", "dot", "hamming"]` |
+| `IVF_PQ` | `["l2", "cosine", "dot"]` |
+| `IVF_SQ` | `["l2", "cosine", "dot"]` |
+| `IVF_RQ` | `["l2", "cosine", "dot"]` |
+| `IVF_HNSW_PQ` | `["l2", "cosine", "dot"]` |
+| `IVF_HNSW_SQ` | `["l2", "cosine", "dot"]` |
 
 ### Configure Distance Metric
 
 By default, `l2` will be used as metric type. You can specify the metric type as
 `cosine` or `dot` if required.
 
-**Note:** You can configure the distance metric during search only if there’s no vector index. If a vector index exists, the distance metric will always be the one you specified when creating the index.
+**Note:** You can configure the distance metric during search only if there's no vector index. If a vector index exists, the distance metric will always be the one you specified when creating the index.
 
 <CodeGroup>
 ```python Python icon="python"

From 45aa6a0699b02791d030b234948d95b3c4ea4fc7 Mon Sep 17 00:00:00 2001
From: prrao87 <prrao87@gmail.com>
Date: Wed, 18 Feb 2026 17:36:44 -0500
Subject: [PATCH 2/2] More clarity to HNSW index and vector search

---
 docs/indexing/index.mdx        |  3 +--
 docs/indexing/vector-index.mdx | 36 ++++++++++++++++++++--------------
 docs/search/vector-search.mdx  |  2 +-
 3 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/docs/indexing/index.mdx b/docs/indexing/index.mdx
index d8889eb..39c33b9 100644
--- a/docs/indexing/index.mdx
+++ b/docs/indexing/index.mdx
@@ -28,13 +28,12 @@ LanceDB provides a comprehensive suite of indexing strategies for different data
 
 | Index | Use Case | Description |
 | :--------- | :------- | :---------- |
-| `HNSW` (Vector) | High recall and low latency vector searches. Ideal for applications requiring fast approximate nearest neighbor queries with high accuracy. | Hierarchical Navigable Small World—a graph-based approximate nearest neighbor algorithm.<br />Distance metrics: `l2` `cosine` `dot`<br />Quantizations: `PQ` `SQ`|
 | `IVF` (Vector) | Large-scale vector search with configurable accuracy/speed trade-offs. Supports binary vectors with hamming distance. | Inverted File Index—a partition-based approximate nearest neighbor algorithm that groups similar vectors into partitions for efficient search.<br />Distance metrics: `l2` `cosine` `dot` `hamming`<br />Quantizations: `None/Flat` `PQ` `SQ` `RQ`|
 | `IVF_HNSW` (Vector) | Large-scale vector search requiring both high recall and efficient partitioning. Combines the scalability of IVF with the search quality of HNSW. | Hybrid index combining IVF partitioning with HNSW graphs built within each partition. Provides improved search quality over pure IVF while maintaining scalability.<br />Distance metrics: `l2` `cosine` `dot`<br />Quantizations: `SQ`, `PQ`|
+| `FTS` (Full-text search) | String columns (e.g., title, description, content) requiring keyword-based search with BM25 ranking. | Full-text search index using BM25 ranking algorithm. Tokenizes text with configurable tokenization, stemming, stop word removal, and language-specific processing. |
 | `BTree` (Scalar) | Numeric, temporal, and string columns with mostly distinct values. Best for highly selective queries on columns with many unique values. | Sorted index storing sorted copies of scalar columns with block headers in a btree cache. Header entries map to blocks of rows (4096 rows per block) for efficient disk reads. |
 | `Bitmap` (Scalar) | Low-cardinality columns with few thousand or fewer distinct values. Accelerates equality and range filters. | Stores a bitmap for each distinct value in the column, with one bit per row indicating presence. Memory-efficient for low-cardinality data. |
 | `LabelList` (Scalar) | List columns (e.g., tags, categories, keywords) requiring array containment queries. | Scalar index for `List<T>` columns using an underlying bitmap index structure to enable fast array membership lookups. |
-| `FTS` (Full-text) | String columns (e.g., title, description, content) requiring keyword-based search with BM25 ranking. | Full-text search index using BM25 ranking algorithm. Tokenizes text with configurable tokenization, stemming, stop word removal, and language-specific processing. |
 
 <Note>
 TypeScript currently doesn't support `IvfSq` (IVF with Scalar Quantization).
diff --git a/docs/indexing/vector-index.mdx b/docs/indexing/vector-index.mdx
index 379624d..2dae75c 100644
--- a/docs/indexing/vector-index.mdx
+++ b/docs/indexing/vector-index.mdx
@@ -1,7 +1,7 @@
 ---
 title: "Vector Indexes"
 sidebarTitle: "Vector Index"
-description: "Build and optimize LanceDB vector indexes, including IVF_HNSW_SQ, IVF_RQ, IVF_PQ, and binary indexes."
+description: "Build and optimize LanceDB vector indexes, including IVF, HNSW and binary quantized indexes."
 icon: "arrow-up-right-dots"
 ---
 import {
@@ -18,33 +18,39 @@ import {
     PyVectorIndexCheckStatus as VectorIndexCheckStatus,
 } from '/snippets/indexing.mdx';
 
-LanceDB offers two main vector indexing algorithms: **Inverted File (IVF)** and **Hierarchically Navigable Small Worlds (HNSW)**. You can create multiple vector indexes within a Lance table. This guide walks through common configurations and build patterns.
+You can create and manage multiple vector indexes on any Lance dataset. LanceDB offers two kinds of vector indexing algorithms: **Inverted File (IVF)** and **Hierarchically Navigable Small Worlds (HNSW)**.
 
-### Option 1: Self-Hosted Indexing
+<Info>
+**IVF + HNSW**
 
-**Manual, Sync or Async:** If using LanceDB Open Source, you will have to build indexes manually, as well as reindex and tune indexing parameters. The Python SDK lets you do this *synchronously and asynchronously*.
+In LanceDB, HNSW is not exposed as a top-level vector index. Instead, it's available as a sub-index inside IVF partitions. What this means in practice is that vectors are first partitioned by IVF, then each selected partition is searched using an HNSW graph (with quantization via `IVF_HNSW_PQ` / `IVF_HNSW_SQ`). This combines IVF's scalability with HNSW's higher-recall ANN search within partitions.
+</Info>
 
-### Option 2: Automated Indexing
+### Manual Indexing
 
-**Automatic and Async:** Indexing is automatic in LanceDB Cloud/Enterprise. As soon as data is updated, our system automates index optimization. *This is done asynchronously*.
+If using LanceDB OSS, you will have to create the vector index manually, by calling `table.create_index()`, and updating the index as new data arrives and tuning its parameters is also a manual process.
 
-Here is what happens in the background - when a table contains a single vector column named `vector`, LanceDB automatically:
+### Automatic Indexing
 
-- Infers the vector column from the schema
-- Creates an optimized `IVF_PQ` index without manual configuration
-- The default distance is `l2` or euclidean
+ <Badge color="red">Enterprise-only</Badge> 
+Vector indexing is managed **automatically** in LanceDB Cloud/Enterprise. As soon as data is updated, the system updates the index and optimizates it. *This is done asynchronously as a background process*.
 
-Finally, LanceDB Cloud/Enterprise will analyze your data distribution to **automatically configure indexing parameters**.
+When you create a table in LanceDB Enterprise, LanceDB automatically:
 
-<Note title="Manual Index Creation">
-You can create a new index with different parameters using `create_index` - this replaces any existing index
+- Infers the vector columns from the schema
+- Create an optimized `IVF_PQ` index without manual configuration
+- Automatically configure indexing parameters
 
+The default distance is `l2` (Euclidean).
+
+<Note>
+You can call `create_index()` with different parameters to create a new index -- this replaces any existing index.
 Although the `create_index` API returns immediately, the building of the vector index is asynchronous. To wait until all data is fully indexed, you can specify the `wait_timeout` parameter.
 </Note>
 
 ## Choose the Right Index
 
-Use this table as a quick starting point:
+Use this table as a quick starting point for choosing the right index type and quantization method for your use case:
 
 | If your top priority is... | Use this index | Why | Typical compressed size vs. raw vectors |
 | :--- | :--- | :--- | :--- |
@@ -59,7 +65,7 @@ If your vector search frequently includes metadata filters (`where(...)`), prefe
 Compression ratios are practical rules of thumb and can vary with vector distribution, metric, and configuration.
 For small dimensions, choose `IVF_PQ` for accuracy, not for guaranteed higher compression than `IVF_RQ`.
 
-### Indexing Tuning by Index Type
+### Index Tuning
 
 Start with these values, then tune for your workload:
 
diff --git a/docs/search/vector-search.mdx b/docs/search/vector-search.mdx
index e330948..82008d3 100644
--- a/docs/search/vector-search.mdx
+++ b/docs/search/vector-search.mdx
@@ -42,7 +42,7 @@ For indexed search, supported distance metrics vary by index type:
 ### Configure Distance Metric
 
 By default, `l2` will be used as metric type. You can specify the metric type as
-`cosine` or `dot` if required.
+`cosine` or `dot` if required (`hamming` is supported for `IVF_FLAT` index only).
 
 **Note:** You can configure the distance metric during search only if there's no vector index. If a vector index exists, the distance metric will always be the one you specified when creating the index.