-
Notifications
You must be signed in to change notification settings - Fork 318
(lib) Add support for vLLM-backed text/VL embedder #1494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jperez999
merged 91 commits into
NVIDIA:main
from
charlesbluca:retriever-vllm-for-embeddings-1
Apr 27, 2026
Merged
Changes from all commits
Commits
Show all changes
91 commits
Select commit
Hold shift + click to select a range
57fe183
First pass of reduced vLLM embedder
charlesbluca 81b711e
Merge remote-tracking branch 'upstream/main' into retriever-vllm-for-…
charlesbluca 2f19148
Merge remote-tracking branch 'upstream/main' into retriever-vllm-for-…
charlesbluca ec5a3e3
Pull vLLM CUDA 13 wheel
charlesbluca c9d4c42
Merge remote-tracking branch 'upstream/main' into retriever-vllm-for-…
charlesbluca a4fb012
Merge remote-tracking branch 'upstream/main' into retriever-vllm-for-…
charlesbluca 840a765
feat(beir): plumb use_vllm through BeirConfig to Retriever
charlesbluca 11afd40
Plumb vLLM CLI option into inprocess example
charlesbluca e5db4ad
Merge remote-tracking branch 'upstream/main' into retriever-vllm-for-…
charlesbluca 804bde6
Linting
charlesbluca d00b4b2
Allow vLLM to be toggled separately for recall
charlesbluca 65c224b
Apply ingest/recall vLLM split to fused example
charlesbluca 9915ac1
Bump vLLM / torch to allow vLLM-backed VL embedder
charlesbluca c5fbfa5
Merge remote-tracking branch 'upstream/main' into retriever-vllm-for-…
charlesbluca 1a803b9
Port vLLM embedding support to new _BatchEmbedGPUActor
charlesbluca 715ecc7
Merge branch 'main' into retriever-vllm-for-embeddings-1
charlesbluca da47bb0
Revert vLLM/torch bump; add >=0.17.0 guard for VLM embedder
charlesbluca e8ffd24
Add --embed-use-vllm flag to graph_pipeline example
charlesbluca 2c74315
Move vLLM >=0.17.0 guard to VLM embedder only
charlesbluca f4293d0
Honour device and hf_cache_dir in vLLM embedding path
charlesbluca aa172f7
Refactor: split vLLM-only embedder classes, matching parse/captioner …
charlesbluca 8ed5558
Remove use_vllm toggle: always use vLLM for non-VL embedding, HF for VL
charlesbluca b32ae31
Use HF for recall queries; fix vLLM normalization to match HF output …
charlesbluca d57e066
Merge branch 'main' into retriever-vllm-for-embeddings-1
charlesbluca f1fd386
Fix tests broken by use_vllm removal and method rename
charlesbluca 60c4cf2
Fix dead code, double-prefix bug, and swallowed PoolerConfig errors
charlesbluca e9a7b8e
Remove stale embed_use_vllm tests from test_embed_params.py
charlesbluca 48d5f1a
Address code review: vLLM params, dead try/except, and missing tests
charlesbluca 00ae3fd
Merge upstream/main: guard vLLM/flashinfer deps to linux, adopt arche…
charlesbluca 28145b9
Address code review: use factory in Retriever, fix ragged tensor, fil…
charlesbluca be8de73
Merge branch 'main' into retriever-vllm-for-embeddings-1
charlesbluca 9ead948
Fix stale README vllm flag docs and add return type to create_local_e…
charlesbluca 69bb31e
Fix ragged tensor crash and document CUDA_VISIBLE_DEVICES side effect…
charlesbluca b7c3fa3
Forward dimensions param through factory/processor; add revision to e…
charlesbluca f2366c8
Fix silent count mismatch when vLLM returns None embeddings
charlesbluca 512736f
Merge remote-tracking branch 'upstream/main' into retriever-vllm-for-…
charlesbluca 6bdc13a
Fix row drop and missing L2 norm in VL vLLM embedder
charlesbluca b0c7b8b
Fix double prefix, remove stale embed-vllm extra, document global sid…
charlesbluca 8a1899a
Remove CUDA_VISIBLE_DEVICES global mutation from VLLMEmbedder
charlesbluca de44543
Avoid overwriting compile cache env vars if already set
charlesbluca 2baa908
Merge remote-tracking branch 'upstream/main' into retriever-vllm-for-…
charlesbluca 14d5e1b
Fix TypeError when calling VL embedder with prefix kwarg in text_embe…
charlesbluca fbb433f
text-embed: remove custom compile cache override and fix PoolerConfig…
charlesbluca 572b8b7
text-embed: remove unused embed_via_vllm helper
charlesbluca 8c6ec7f
Unify local text embedder on vLLM in LlamaNemotronEmbed1BV2Embedder
charlesbluca f705de7
Drop unused device from local text vLLM embedder
charlesbluca d959685
Add selectable local query embed backend (HF vs vLLM)
charlesbluca fc5b10b
Merge branch 'main' into retriever-vllm-for-embeddings-1
charlesbluca 988ba2f
Deprecate device on LlamaNemotronEmbed1BV2Embedder (vLLM path)
charlesbluca cb89300
Merge branch 'main' into retriever-vllm-for-embeddings-1
charlesbluca 61d893f
Merge upstream/main into retriever-vllm-for-embeddings-1
charlesbluca f52034e
fix(harness): recursive glob for subdirectory datasets; add jp20 swee…
charlesbluca eed2630
Merge upstream/main into retriever-vllm-for-embeddings-1
charlesbluca 32addf1
Implement full multimodal support for LlamaNemotronEmbedVL1BV2VLLMEmb…
charlesbluca 65ff1d9
Merge upstream/main into retriever-vllm-for-embeddings-1
charlesbluca c727185
Merge upstream/main into retriever-vllm-for-embeddings-1
charlesbluca 172b1b0
style: apply black formatting to vLLM embedder and tests
charlesbluca da0d022
fix: correct vLLM install instruction to include [local] extra
charlesbluca b6a6a52
feat(embed): add HF ingest backend selector and fix image batch size
charlesbluca 0b2f511
feat(model): route VL query and ingest through vLLM by default
charlesbluca 1d3f360
fix(embed): add missing --embed-local-ingest-backend CLI arg and forw…
charlesbluca 38ebcb8
feat(harness): add 18-run embedder × reranker sweep suite
charlesbluca 447d52d
fix(harness): remove --rerank-modality flag from graph_pipeline invoc…
charlesbluca 371a6d3
fix(ray): propagate HF_HUB_OFFLINE to os.environ before ray.init()
charlesbluca 363d20d
chore(harness): revert harness changes for follow-up PR
charlesbluca a277139
Merge remote-tracking branch 'upstream/main' into retriever-vllm-for-…
charlesbluca ca17fd8
chore(harness): drop sweep YAML and test_configs additions
charlesbluca a66df02
feat(embed): explicit backend selection, HF ingest support, and query…
charlesbluca 9db3c35
fix(embed): defer HF model load to first use and harden bool/cache-cl…
charlesbluca 3fe5209
fix(embed): map "auto" backend to "hf" in _get_local_embedder
charlesbluca 982c4a0
Merge branch 'main' into retriever-vllm-for-embeddings-1
charlesbluca e9f86a6
refactor(embed): drop 'auto' backend — query defaults hf, ingest defa…
charlesbluca 2933433
fix(embed): drop stale VL comment; fix DeprecationWarning stacklevel
charlesbluca 53f352a
test(beir): drop stale 'auto' backend assertion
charlesbluca 6967a5d
fix(embed): restore normalize/max_length in gpu_operator; guard vllm …
charlesbluca ed415a4
feat: VL embedder always uses HF backend; text embedder defaults to v…
charlesbluca 64f29bc
Merge branch 'main' into retriever-vllm-for-embeddings-1
charlesbluca 246f6e0
fix(embed): VL create_local_embedder respects backend param; default …
charlesbluca 556436b
fix(embed): forward gpu_memory_utilization/enforce_eager to VL vLLM e…
charlesbluca c79c17c
fix(embed): respect local_ingest_backend config in text-embed CLI path
charlesbluca cb73da9
fix: address code review comments on local_ingest_backend comment and…
charlesbluca 6618eb4
Merge branch 'main' into retriever-vllm-for-embeddings-1
charlesbluca b4eb2b3
Add queries comparison (#1928)
tomer-levin-nv 235baab
fix(embed): skip prefix kwarg for HF ingest backend in _embed closure
charlesbluca aa863dc
Merge branch 'main' into retriever-vllm-for-embeddings-1
charlesbluca c6c4a47
fix(text_embed): strip whitespace in _to_bool; prefer embed_model_nam…
charlesbluca 7f2c248
fix(review): address three code-review comments
charlesbluca 0cf1f8c
fix(embed): add unload() to all four embedder classes; pass device kw…
charlesbluca a6d3505
fix(review): add field validator for local_ingest_backend; add LlamaN…
charlesbluca b28115a
fix(embed): defer vLLM GPU allocation to first use via _ensure_loaded()
charlesbluca 9b0d15c
fix(embed): defer HF VL embedder GPU load via _ensure_loaded()
charlesbluca File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.