(retriever) Add VLM image captioning via vLLM by edknv · Pull Request #1660 · NVIDIA/NeMo-Retriever

edknv · 2026-03-19T16:57:03Z

Description

Add a .caption() pipeline stage to both batch and in-process ingestors that generates text descriptions for extracted images using a VLM (Nemotron Nano 12B v2 VL via vLLM locally, or a remote NIM endpoint).
Use nv-ingest-api's extract_image_like_objects_from_pdfium_page during PDF extraction to detect, merge, and crop image-like objects (images, shapes, forms) from each page into the images column.
The caption stage filters out small images (< 32px), sends the remaining to the VLM, and writes captions back as images[i]["text"]. Optionally prepends surrounding page text to the VLM prompt via context_text_max_chars.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

edknv and others added 22 commits March 19, 2026 09:56

(retriever) Add VLM image captioning via vLLM

74b7125

Merge branch 'main' into edwardk/retriever-image-caption

db95dc5

revert fix pyproject.toml

4c99cca

add batch mode

c601e7f

build endpoint working

cca5001

add context window

1384c6f

update readme

8ba2c81

Merge branch 'main' into edwardk/retriever-image-caption

d09f014

install vllm wheels for cu130 support

06e5d8e

pin vllm to exact match

58fe381

cache model globally

f90de97

set gpu memory utilization

2a3df58

set caption batch size

1306f2b

remove batch size arg

858f7ca

skip loading ocr

5a2e0fd

use fractional gpu

8309207

filter out small images

564d72c

updates

a921382

updates

b0b4475

fix tests

e6cb852

simplify

ae08679

Merge branch 'main' into edwardk/retriever-image-caption

3450347

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(retriever) Add VLM image captioning via vLLM#1660

(retriever) Add VLM image captioning via vLLM#1660
edknv wants to merge 22 commits intoNVIDIA:mainfrom
edknv:edwardk/retriever-image-caption

edknv commented Mar 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

edknv commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

edknv commented Mar 19, 2026 •

edited

Loading