From 1fcd24eb92680e8d208c0ab493557c957ded6986 Mon Sep 17 00:00:00 2001 From: aamijar Date: Sat, 25 Apr 2026 04:18:20 +0000 Subject: [PATCH 1/7] update-readme --- README.md | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 79 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index d88d3401..decac7c6 100644 --- a/README.md +++ b/README.md @@ -1,29 +1,100 @@ -# Lucene cuVS +# cuVS Lucene This is a project for using [cuVS](https://github.com/rapidsai/cuvs), NVIDIA's GPU accelerated vector search library, with [Apache Lucene](https://github.com/apache/lucene). -## Overview +## Contents -This library provides a new [KnnVectorFormat](https://lucene.apache.org/core/10_3_1/core/org/apache/lucene/codecs/KnnVectorsFormat.html) which can be plugged into a Lucene codec. +1. [What is cuvs-lucene?](#what-is-cuvs-lucene) +2. [Installing cuvs-lucene](#installing-cuvs-lucene) +3. [Getting Started](#getting-started) +4. [Contributing](#contributing) +5. [References](#references) -## Building +## What is cuvs-lucene? + +`cuvs-lucene` provides a pluggable [KnnVectorsFormat](https://lucene.apache.org/core/10_3_1/core/org/apache/lucene/codecs/KnnVectorsFormat.html) that uses cuVS to offload vector index build — and optionally search — to NVIDIA GPUs. Because it plugs in through a standard Lucene codec, existing Lucene applications can take advantage of GPU acceleration with minimal code changes and gracefully fall back to the default CPU codec when no GPU is present. + +Four codecs are currently provided: + +- `Lucene101AcceleratedHNSWCodec` — GPU-accelerated HNSW build with CPU HNSW search. The on-disk format is standard Lucene HNSW, so indexes built on the GPU can be read by any stock Lucene 10.x reader. + - `LuceneAcceleratedHNSWScalarQuantizedCodec` — scalar-quantized vectors for a smaller index footprint. + - `LuceneAcceleratedHNSWBinaryQuantizedCodec` — binary-quantized vectors for an even smaller index footprint. +- `CuVS2510GPUSearchCodec` — GPU-accelerated HNSW build and GPU search + +## Installing cuvs-lucene ### Prerequisites -- [CUDA 12.0+](https://developer.nvidia.com/cuda-toolkit-archive), -- [Maven 3.9.6+](https://maven.apache.org/download.cgi), +- [CUDA 12.0+](https://developer.nvidia.com/cuda-toolkit-archive) - [JDK 22](https://jdk.java.net/archive/) +- [Maven 3.9.6+](https://maven.apache.org/download.cgi) +- The native `libcuvs_c.so` on the runtime library path. Please see the cuVS [Build and Install Guide](https://docs.rapids.ai/api/cuvs/nightly/build/) for install options (conda, pip, tarball, or build from source). + +### Maven + +To pull `cuvs-lucene` into a Maven project, add the following dependency to your `pom.xml`: + +```xml + + com.nvidia.cuvs.lucene + cuvs-lucene + 26.06.0 + +``` + +### Building from source ```sh +git clone https://github.com/rapidsai/cuvs-lucene.git +cd cuvs-lucene mvn clean compile package ``` -The artifacts would be built and available in the target / folder. -### Running Tests +The resulting artifacts are written to `target/`. To run the tests, point `LD_LIBRARY_PATH` at a local `libcuvs_c.so`: + ```sh export LD_LIBRARY_PATH={ PATH TO YOUR LOCAL libcuvs_c.so }:$LD_LIBRARY_PATH && mvn clean test ``` +## Getting Started + +The snippet below plugs the GPU-accelerated HNSW codec into a standard Lucene `IndexWriter`. Once the codec is set on the `IndexWriterConfig`, indexing proceeds exactly as it would with the default Lucene codec, and search uses the stock `KnnFloatVectorQuery`: + +```java +import com.nvidia.cuvs.lucene.AcceleratedHNSWParams; +import com.nvidia.cuvs.lucene.Lucene101AcceleratedHNSWCodec; +import org.apache.lucene.codecs.Codec; +import org.apache.lucene.document.Document; +import org.apache.lucene.document.KnnFloatVectorField; +import org.apache.lucene.index.IndexWriter; +import org.apache.lucene.index.IndexWriterConfig; +import org.apache.lucene.store.Directory; +import org.apache.lucene.store.FSDirectory; + +import static org.apache.lucene.index.VectorSimilarityFunction.EUCLIDEAN; + +AcceleratedHNSWParams params = new AcceleratedHNSWParams.Builder().build(); +Codec codec = new Lucene101AcceleratedHNSWCodec(params); +IndexWriterConfig config = new IndexWriterConfig().setCodec(codec); + +try (Directory dir = FSDirectory.open(indexPath); + IndexWriter writer = new IndexWriter(dir, config)) { + Document doc = new Document(); + doc.add(new KnnFloatVectorField("vector_field", embedding, EUCLIDEAN)); + writer.addDocument(doc); +} +``` + +For fully runnable versions of this example, including one that indexes and searches entirely on the GPU using `CuVS2510GPUSearchCodec`, please refer to the [`examples/`](examples) directory. + ## Contributing +If you are interested in contributing to cuvs-lucene, please read our [Contributing guide](CONTRIBUTING.md). + > [!NOTE] > The code style format is automatically enforced (including the missing license header, if any) using the [Spotless maven plugin](https://github.com/diffplug/spotless/tree/main/plugin-maven). This currently happens in the maven's `validate` stage. + +## References + +- [Bring Massive-Scale Vector Search to the GPU with Apache Lucene](https://www.nvidia.com/en-us/on-demand/session/gtc25-S71286/) — NVIDIA GTC 2025 session video +- [Exploring GPU-accelerated vector search in Elasticsearch with NVIDIA](https://www.elastic.co/search-labs/blog/gpu-accelerated-vector-search-elasticsearch-nvidia) — Blog +- [Apache Lucene Accelerated with the NVIDIA cuVS 25.06 Release](https://searchscale.com/blog/apache-lucene-accelerated-with-nvidia-cuvs-25.06-release/) — Blog From f793f8b11c1a13bed47c564141ab45f0477d2032 Mon Sep 17 00:00:00 2001 From: aamijar Date: Sat, 25 Apr 2026 04:33:10 +0000 Subject: [PATCH 2/7] update references --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index decac7c6..9cde7442 100644 --- a/README.md +++ b/README.md @@ -96,5 +96,6 @@ If you are interested in contributing to cuvs-lucene, please read our [Contribut ## References - [Bring Massive-Scale Vector Search to the GPU with Apache Lucene](https://www.nvidia.com/en-us/on-demand/session/gtc25-S71286/) — NVIDIA GTC 2025 session video +- [cuVS and Lucene: GPU-based Vector Search](https://www.youtube.com/watch?v=qiW7iIDFJC0) - Berlin Buzzwords 2024 session video - [Exploring GPU-accelerated vector search in Elasticsearch with NVIDIA](https://www.elastic.co/search-labs/blog/gpu-accelerated-vector-search-elasticsearch-nvidia) — Blog - [Apache Lucene Accelerated with the NVIDIA cuVS 25.06 Release](https://searchscale.com/blog/apache-lucene-accelerated-with-nvidia-cuvs-25.06-release/) — Blog From 8d81c45a538c9528a76fae64e3abac5a14eb8d9b Mon Sep 17 00:00:00 2001 From: aamijar Date: Sat, 25 Apr 2026 04:34:52 +0000 Subject: [PATCH 3/7] update references --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 9cde7442..c2082631 100644 --- a/README.md +++ b/README.md @@ -97,5 +97,5 @@ If you are interested in contributing to cuvs-lucene, please read our [Contribut - [Bring Massive-Scale Vector Search to the GPU with Apache Lucene](https://www.nvidia.com/en-us/on-demand/session/gtc25-S71286/) — NVIDIA GTC 2025 session video - [cuVS and Lucene: GPU-based Vector Search](https://www.youtube.com/watch?v=qiW7iIDFJC0) - Berlin Buzzwords 2024 session video -- [Exploring GPU-accelerated vector search in Elasticsearch with NVIDIA](https://www.elastic.co/search-labs/blog/gpu-accelerated-vector-search-elasticsearch-nvidia) — Blog -- [Apache Lucene Accelerated with the NVIDIA cuVS 25.06 Release](https://searchscale.com/blog/apache-lucene-accelerated-with-nvidia-cuvs-25.06-release/) — Blog +- [Exploring GPU-accelerated vector search in Elasticsearch with NVIDIA](https://www.elastic.co/search-labs/blog/gpu-accelerated-vector-search-elasticsearch-nvidia) — Elasticsearch Blog +- [Apache Lucene Accelerated with the NVIDIA cuVS 25.06 Release](https://searchscale.com/blog/apache-lucene-accelerated-with-nvidia-cuvs-25.06-release/) — SearchScale Blog From 6ca8e669869c3fc2881d5afd73c6c92dfa6e3853 Mon Sep 17 00:00:00 2001 From: aamijar Date: Sat, 25 Apr 2026 04:37:08 +0000 Subject: [PATCH 4/7] em dash --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index c2082631..ded2110f 100644 --- a/README.md +++ b/README.md @@ -96,6 +96,6 @@ If you are interested in contributing to cuvs-lucene, please read our [Contribut ## References - [Bring Massive-Scale Vector Search to the GPU with Apache Lucene](https://www.nvidia.com/en-us/on-demand/session/gtc25-S71286/) — NVIDIA GTC 2025 session video -- [cuVS and Lucene: GPU-based Vector Search](https://www.youtube.com/watch?v=qiW7iIDFJC0) - Berlin Buzzwords 2024 session video +- [cuVS and Lucene: GPU-based Vector Search](https://www.youtube.com/watch?v=qiW7iIDFJC0) — Berlin Buzzwords 2024 session video - [Exploring GPU-accelerated vector search in Elasticsearch with NVIDIA](https://www.elastic.co/search-labs/blog/gpu-accelerated-vector-search-elasticsearch-nvidia) — Elasticsearch Blog - [Apache Lucene Accelerated with the NVIDIA cuVS 25.06 Release](https://searchscale.com/blog/apache-lucene-accelerated-with-nvidia-cuvs-25.06-release/) — SearchScale Blog From 3da81d0cc24b7ad21643f962f670b34026794fc2 Mon Sep 17 00:00:00 2001 From: aamijar Date: Fri, 15 May 2026 07:26:05 +0000 Subject: [PATCH 5/7] fix doc version link to 10.2.0 --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index ded2110f..03b5570f 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ This is a project for using [cuVS](https://github.com/rapidsai/cuvs), NVIDIA's G ## What is cuvs-lucene? -`cuvs-lucene` provides a pluggable [KnnVectorsFormat](https://lucene.apache.org/core/10_3_1/core/org/apache/lucene/codecs/KnnVectorsFormat.html) that uses cuVS to offload vector index build — and optionally search — to NVIDIA GPUs. Because it plugs in through a standard Lucene codec, existing Lucene applications can take advantage of GPU acceleration with minimal code changes and gracefully fall back to the default CPU codec when no GPU is present. +`cuvs-lucene` provides a pluggable [KnnVectorsFormat](https://lucene.apache.org/core/10_2_0/core/org/apache/lucene/codecs/KnnVectorsFormat.html) that uses cuVS to offload vector index build — and optionally search — to NVIDIA GPUs. Because it plugs in through a standard Lucene codec, existing Lucene applications can take advantage of GPU acceleration with minimal code changes and gracefully fall back to the default CPU codec when no GPU is present. Four codecs are currently provided: From 0e068a65415f510778de81c94f189a64b121d3f2 Mon Sep 17 00:00:00 2001 From: aamijar Date: Sat, 16 May 2026 07:25:12 +0000 Subject: [PATCH 6/7] update snippet --- README.md | 34 +++++++++++++++++++++++----------- 1 file changed, 23 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index 03b5570f..8f107371 100644 --- a/README.md +++ b/README.md @@ -60,8 +60,14 @@ export LD_LIBRARY_PATH={ PATH TO YOUR LOCAL libcuvs_c.so }:$LD_LIBRARY_PATH && m The snippet below plugs the GPU-accelerated HNSW codec into a standard Lucene `IndexWriter`. Once the codec is set on the `IndexWriterConfig`, indexing proceeds exactly as it would with the default Lucene codec, and search uses the stock `KnnFloatVectorQuery`: ```java +package com.nvidia.cuvs.lucene.examples; + +import static org.apache.lucene.index.VectorSimilarityFunction.EUCLIDEAN; + import com.nvidia.cuvs.lucene.AcceleratedHNSWParams; import com.nvidia.cuvs.lucene.Lucene101AcceleratedHNSWCodec; +import java.nio.file.Path; +import java.nio.file.Paths; import org.apache.lucene.codecs.Codec; import org.apache.lucene.document.Document; import org.apache.lucene.document.KnnFloatVectorField; @@ -70,17 +76,23 @@ import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; -import static org.apache.lucene.index.VectorSimilarityFunction.EUCLIDEAN; - -AcceleratedHNSWParams params = new AcceleratedHNSWParams.Builder().build(); -Codec codec = new Lucene101AcceleratedHNSWCodec(params); -IndexWriterConfig config = new IndexWriterConfig().setCodec(codec); - -try (Directory dir = FSDirectory.open(indexPath); - IndexWriter writer = new IndexWriter(dir, config)) { - Document doc = new Document(); - doc.add(new KnnFloatVectorField("vector_field", embedding, EUCLIDEAN)); - writer.addDocument(doc); +public class ReadmeSnippet { + public static void main(String[] args) throws Exception { + AcceleratedHNSWParams params = new AcceleratedHNSWParams.Builder().build(); + Codec codec = new Lucene101AcceleratedHNSWCodec(params); + IndexWriterConfig config = new IndexWriterConfig().setCodec(codec); + + Path indexPath = Paths.get("index"); + float[] embedding = new float[] {0.1f, 0.2f, 0.3f, 0.4f}; + + try (Directory dir = FSDirectory.open(indexPath); + IndexWriter writer = new IndexWriter(dir, config)) { + Document doc = new Document(); + doc.add(new KnnFloatVectorField("vector_field", embedding, EUCLIDEAN)); + writer.addDocument(doc); + } + System.out.println("README snippet ran successfully."); + } } ``` From 03e59a9087fa5675b0456bed46253c25c8923d6f Mon Sep 17 00:00:00 2001 From: aamijar Date: Sat, 16 May 2026 09:37:49 +0000 Subject: [PATCH 7/7] update code snippet --- README.md | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 8f107371..8c3a9ee9 100644 --- a/README.md +++ b/README.md @@ -27,7 +27,7 @@ Four codecs are currently provided: - [CUDA 12.0+](https://developer.nvidia.com/cuda-toolkit-archive) - [JDK 22](https://jdk.java.net/archive/) - [Maven 3.9.6+](https://maven.apache.org/download.cgi) -- The native `libcuvs_c.so` on the runtime library path. Please see the cuVS [Build and Install Guide](https://docs.rapids.ai/api/cuvs/nightly/build/) for install options (conda, pip, tarball, or build from source). +- A compatible cuVS installation (26.04 - 26.06). For Maven usage, install the cuVS tarball and add it to your system library load path. See the cuVS [tarball install instructions](https://docs.rapids.ai/api/cuvs/stable/build/#download-extract). ### Maven @@ -49,15 +49,19 @@ cd cuvs-lucene mvn clean compile package ``` -The resulting artifacts are written to `target/`. To run the tests, point `LD_LIBRARY_PATH` at a local `libcuvs_c.so`: +The resulting artifacts are written to `target/`. To run the tests, first install cuVS and add it to your system library load path, as described in the cuVS [tarball install instructions](https://docs.rapids.ai/api/cuvs/stable/build/#download-extract), then run: ```sh -export LD_LIBRARY_PATH={ PATH TO YOUR LOCAL libcuvs_c.so }:$LD_LIBRARY_PATH && mvn clean test +mvn clean test ``` ## Getting Started -The snippet below plugs the GPU-accelerated HNSW codec into a standard Lucene `IndexWriter`. Once the codec is set on the `IndexWriterConfig`, indexing proceeds exactly as it would with the default Lucene codec, and search uses the stock `KnnFloatVectorQuery`: +The example below plugs the GPU-accelerated HNSW codec into a standard Lucene `IndexWriter`. Once the codec is set on the `IndexWriterConfig`, indexing proceeds exactly as it would with the default Lucene codec, and search uses the stock `KnnFloatVectorQuery`. + +Before running it, make sure cuVS is installed and available on your system library load path. The cuVS [tarball install instructions](https://docs.rapids.ai/api/cuvs/stable/build/#download-extract) show how to set this up. + +In a Maven project that includes the `cuvs-lucene` dependency shown above, create `src/main/java/com/nvidia/cuvs/lucene/examples/HelloCuvsLucene.java`: ```java package com.nvidia.cuvs.lucene.examples; @@ -76,7 +80,7 @@ import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; -public class ReadmeSnippet { +public class HelloCuvsLucene { public static void main(String[] args) throws Exception { AcceleratedHNSWParams params = new AcceleratedHNSWParams.Builder().build(); Codec codec = new Lucene101AcceleratedHNSWCodec(params); @@ -91,12 +95,20 @@ public class ReadmeSnippet { doc.add(new KnnFloatVectorField("vector_field", embedding, EUCLIDEAN)); writer.addDocument(doc); } - System.out.println("README snippet ran successfully."); + + System.out.println("Hello cuVS Lucene ran successfully."); } } ``` -For fully runnable versions of this example, including one that indexes and searches entirely on the GPU using `CuVS2510GPUSearchCodec`, please refer to the [`examples/`](examples) directory. +Run it: + +```sh +mvn -q compile org.codehaus.mojo:exec-maven-plugin:3.5.1:java \ + -Dexec.mainClass=com.nvidia.cuvs.lucene.examples.HelloCuvsLucene +``` + +For more examples, including one that indexes and searches entirely on the GPU using `CuVS2510GPUSearchCodec`, please refer to the [`examples/`](examples) directory. ## Contributing