Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/docs/extraction/prerequisites.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ For additional hardware details, refer to [Support Matrix](support-matrix.md).

- **System Memory**: At least 256 GB RAM
- **CPU Cores**: At least 32 CPU cores
- **GPU**: NVIDIA GPU with at least 24 GB VRAM. Use a model [listed in the Support Matrix](support-matrix.md); examples include A10G, A100, H100, L40S, and H200 NVL.
- **GPU**: NVIDIA GPU with at least 24 GB VRAM. See the [Support Matrix](support-matrix.md) for supported GPUs (e.g., A100, A10G, L40S).

!!! note

Expand Down
27 changes: 27 additions & 0 deletions docs/docs/extraction/python-api-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -516,6 +516,33 @@ results = ingestor.ingest()

For more information about working with infographics and multimodal content, refer to [Use Multimodal Embedding](vlm-embed.md).

### Caption Images and Control Reasoning

The caption task can call a VLM with optional prompt and reasoning overrides:

- `prompt` (user prompt): defaults to `"Caption the content of this image:"`.
- `reasoning` (bool): when `True`, enables reasoning (internally maps to `"/think"`); when `False`, disables reasoning (internally maps to `"/no_think"`). Defaults to `False` per the Nemotron Nano 12B v2 VL model card.
- `context_text_max_chars` (int, optional): Maximum characters of page text to include as context for the VLM.
- `temperature` (float, optional): Sampling temperature for the VLM.

Example:
```python
from nemo_retriever.client.interface import Ingestor

ingestor = (
Ingestor()
.files("path/to/doc-with-images.pdf")
.extract(extract_images=True)
.caption(
prompt="Caption the content of this image:",
reasoning=True, # or False
)
.ingest()
)
Comment on lines +532 to +541
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 .ingest() chained inside builder — result is a list, not an Ingestor

The example chains .ingest() inside the parenthesized builder expression and assigns the return value to ingestor. Because ingest() returns a list of document results, ingestor will hold a list, not an Ingestor object. Users who follow this pattern and then call ingestor.embed() or any other method will get AttributeError. Separate the ingest() call and store its result in a distinct variable.

Suggested change
ingestor = (
Ingestor()
.files("path/to/doc-with-images.pdf")
.extract(extract_images=True)
.caption(
prompt="Caption the content of this image:",
reasoning=True, # or False
)
.ingest()
)
ingestor = (
Ingestor()
.files("path/to/doc-with-images.pdf")
.extract(extract_images=True)
.caption(
prompt="Caption the content of this image:",
reasoning=True, # or False
)
)
results = ingestor.ingest()
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/python-api-reference.md
Line: 532-541

Comment:
**`.ingest()` chained inside builder — result is a list, not an `Ingestor`**

The example chains `.ingest()` inside the parenthesized builder expression and assigns the return value to `ingestor`. Because `ingest()` returns a list of document results, `ingestor` will hold a list, not an `Ingestor` object. Users who follow this pattern and then call `ingestor.embed()` or any other method will get `AttributeError`. Separate the `ingest()` call and store its result in a distinct variable.

```suggestion
ingestor = (
    Ingestor()
    .files("path/to/doc-with-images.pdf")
    .extract(extract_images=True)
    .caption(
        prompt="Caption the content of this image:",
        reasoning=True,  # or False
    )
)
results = ingestor.ingest()
```

How can I resolve this? If you propose a fix, please make it concise.

```



## Extract Embeddings

The `embed` method in the NeMo Retriever Library generates text embeddings for document content.
Expand Down
26 changes: 26 additions & 0 deletions docs/docs/extraction/quickstart-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,22 @@ g. When core services have fully started, `nvidia-smi` should show processes lik
h. Run the command `docker ps`. You should see output similar to the following. Confirm that the status of the containers is `Up`.

```
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a1b2c3d4e5f6 nvcr.io/.../page-elements:latest "..." 10 minutes ago Up 10 minutes 0.0.0.0:8002->8002/tcp nemo-retriever-page-elements-1
b2c3d4e5f6a1 nvcr.io/.../ocr:latest "..." 10 minutes ago Up 10 minutes 0.0.0.0:8001->8001/tcp nemo-retriever-ocr-1
c3d4e5f6a1b2 nvcr.io/.../nv-ingest-ms-runtime:latest "..." 10 minutes ago Up 10 minutes 0.0.0.0:5000->5000/tcp nemo-retriever-ms-runtime-1
d4e5f6a1b2c3 nvcr.io/.../milvus:latest "..." 12 minutes ago Up 12 minutes 0.0.0.0:19530->19530/tcp nemo-retriever-milvus-1
```

## Step 2: Install the Client

Install the NeMo Retriever Library client on your host so you can send requests to the services you started in Step 1. Using a virtual environment is recommended:

```bash
uv venv --python 3.12 nv-ingest-dev
source nv-ingest-dev/bin/activate
uv pip install nv-ingest==26.3.0-RC4 nv-ingest-api==26.3.0-RC4 nv-ingest-client==26.3.0-RC4
```
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1b885f37c991 nvcr.io/nvidia/nemo-microservices/nv-ingest:... "/usr/bin/tini -- /w…" 7 minutes ago Up 7 minutes (healthy) 0.0.0.0:7670... nv-ingest-nv-ingest-ms-runtime-1
14ef31ed7f49 milvusdb/milvus:v2.5.3-gpu "/tini -- bash -c 's…" 7 minutes ago Up 7 minutes (healthy) 0.0.0.0:9091... milvus-standalone
Expand All @@ -103,6 +119,7 @@ h. Run the command `docker ps`. You should see output similar to the following.

i. To run the NeMo Retriever Library Python client from your host machine, Python 3.12 or later is required. Create a virtual environment and install the client packages:

To confirm that you have activated your virtual environment, run `which pip` and `which python`, and confirm that you see `nv-ingest-dev` in the result. You can do this before any pip or python command that you run.
```shell
uv venv --python 3.12 nv-ingest-dev
source nv-ingest-dev/bin/activate
Expand All @@ -117,6 +134,15 @@ i. To run the NeMo Retriever Library Python client from your host machine, Pytho

Interaction from the host requires the appropriate port to be exposed from the `nv-ingest` runtime container, as defined in the `docker-compose.yaml` file. If you prefer, you can disable this port and interact directly from within the container.

```bash
docker exec -it nemo-retriever-ms-runtime-1 bash
```
This command opens a shell in the `/workspace` directory, where the `DATASET_ROOT` from your `.env` file is mounted at `./data`. The pre-configured Python environment in the container includes all necessary Python client libraries. You should see a prompt similar to the following.

```bash
root@your-computer-name:/workspace#
```
From this prompt, you can run the `nemo-retriever` CLI and Python examples.
j. To work inside the container, run the following code.
Comment on lines 95 to 146
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 New "Step 2" section inserted mid-list, causing structural and rendering breakage

An H2 heading (## Step 2: Install the Client) was inserted inside an ordered list that continues with lettered sub-steps (h, i, j). This breaks the ordered list structure and causes several downstream issues: the old docker ps table (lines 104–118) falls outside any code fence and renders as unstyled text; the activation hint appears twice (plain text at line 122 and as a !!! tip at line 129); step i installs 26.1.2 while Step 2 installs 26.3.0-RC4, presenting conflicting instructions; and the new docker exec block (lines 137–145) duplicates step j without proper indentation. The new content should be integrated into the existing ordered-list steps rather than injected as a top-level heading mid-sequence.

Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/docs/extraction/quickstart-guide.md
Line: 95-146

Comment:
**New "Step 2" section inserted mid-list, causing structural and rendering breakage**

An H2 heading (`## Step 2: Install the Client`) was inserted inside an ordered list that continues with lettered sub-steps (h, i, j). This breaks the ordered list structure and causes several downstream issues: the old `docker ps` table (lines 104–118) falls outside any code fence and renders as unstyled text; the activation hint appears twice (plain text at line 122 and as a `!!! tip` at line 129); step i installs `26.1.2` while Step 2 installs `26.3.0-RC4`, presenting conflicting instructions; and the new `docker exec` block (lines 137–145) duplicates step j without proper indentation. The new content should be integrated into the existing ordered-list steps rather than injected as a top-level heading mid-sequence.

How can I resolve this? If you propose a fix, please make it concise.


```bash
Expand Down
Loading
Loading