AIDP OpenAI Demo example - demonstrating the NVIDIA AI Data Platform Retrieval API following the OpenAI Vector Store Search specification, exposed via MCP.#12
Conversation
This example demonstrates the NVIDIA AI Data Platform (AIDP) Retrieval API following the OpenAI Vector Store Search specification, exposed via MCP. Features: - OpenAI-compatible search_vector_store tool via MCP - NVIDIA NIMs for embeddings and LLM reasoning - Milvus vector database integration - Synthetic data generation using Nemotron LLM - Full ReAct agent workflow with NeMo Agent Toolkit Signed-off-by: Patrick Moorhead <pmoorhead@nvidia.com>
- rest_api.py: OpenAI-compatible REST endpoint - examples.py: Comprehensive API usage examples - Updated pyproject.toml with fastapi/uvicorn deps - Updated README with complete file list Signed-off-by: Patrick Moorhead <pmoorhead@nvidia.com>
|
@CodeRabbit review full |
|
@PicoNVIDIA: I'll perform a full review of all the changes in this PR. ✅ Actions performedFull review triggered. |
WalkthroughThis pull request introduces a complete example project demonstrating AIDP Retrieval API with OpenAI compatibility. It includes project configuration, a data loading script, FastAPI REST service, MCP server implementation, workflow configuration, and example usage patterns that integrate Milvus vector database with NVIDIA embedding APIs. Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant FastAPI as FastAPI REST Service
participant Auth as Authentication
participant Embedding as NVIDIA Embeddings API
participant Milvus as Milvus Vector DB
participant Format as Response Formatter
Client->>FastAPI: POST /v1/vector_stores/{id}/search
FastAPI->>Auth: verify_api_key(Bearer token)
Auth-->>FastAPI: ✓ API Key Valid
rect rgb(200, 220, 240)
Note over FastAPI,Milvus: Vector Search Flow
FastAPI->>Embedding: GET embedding for query text
Embedding-->>FastAPI: embedding vector
FastAPI->>Milvus: search(vector, limit, filters)
Milvus-->>FastAPI: hits with distances & metadata
end
rect rgb(220, 240, 220)
Note over FastAPI,Format: Response Formatting
FastAPI->>Format: Apply score_threshold & filters
Format->>Format: Convert distances to similarity scores
Format-->>FastAPI: Filtered results list
end
FastAPI-->>Client: SearchResponse (OpenAI-compatible JSON)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In @examples/aidp_openai_demo/scripts/load_support_tickets.py:
- Around line 44-61: The generate_embeddings function calls requests.post
without a timeout which can hang; update the call to include a reasonable
timeout (e.g., timeout=10) when posting to NVIDIA_EMBED_URL with headers from
get_nvidia_headers and json=payload, and wrap the request in a try/except to
catch requests.Timeout (and requests.RequestException) so you can log or
re-raise a clear error; ensure the modified code still calls
response.raise_for_status() and processes result["data"] into embeddings as
before.
- Around line 97-106: The LLM API call using requests.post(NVIDIA_LLM_URL,
headers=headers, json=payload) lacks a timeout and can hang; update the call in
load_support_tickets.py to pass a timeout parameter (e.g., timeout=120) to
requests.post and keep response.raise_for_status() as-is so long-running
generation is allowed but bounded; ensure you modify the line that creates the
response (the requests.post invocation) and test that timeouts propagate
cleanly.
In @examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py:
- Around line 286-293: The current CORS setup uses app.add_middleware with
CORSMiddleware and allow_origins=["*"] together with allow_credentials=True
which permits credentialed requests from any origin; change this for production
by replacing the wildcard with a configurable list of trusted origins (e.g.,
read ALLOWED_ORIGINS from an environment variable), then pass that list to
app.add_middleware (keeping allow_credentials=True if needed) so only explicitly
allowed origins are permitted; update any related configuration or startup code
to parse the ALLOWED_ORIGINS env var (split on commas) and use that variable in
the CORSMiddleware call.
In @examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py:
- Around line 139-145: The Milvus search call (client.search) can hang
indefinitely; update the search invocation that uses vector_store_id,
query_embedding and max_num_results to pass a per-operation timeout (e.g.,
timeout=30.0) so the call fails fast on unresponsive Milvus; add the timeout
parameter to the client.search(...) argument list and choose an appropriate
value consistent with other timeouts in the codebase.
🧹 Nitpick comments (11)
examples/aidp_openai_demo/README.md (1)
81-103: Consider minor markdown formatting improvements.The documentation is comprehensive and well-structured. For improved markdown compliance and accessibility:
- Add language specifiers to fenced code blocks (lines 81, 141, 230) - helps with syntax highlighting
- Wrap the bare URL at line 175 in angle brackets:
<http://localhost:3000>- Line 177: Consider using a proper heading (
###) instead of bold text for "Alternative: Command Line"These changes improve document accessibility and comply with markdown best practices.
📝 Suggested markdown improvements
Line 81 - Add language identifier:
-``` +```text ┌─────────────┐ REST ┌─────────────────┐Line 141 - Add language identifier:
-``` +```text Creating collection: support_tickets with explicit schemaLine 175 - Wrap bare URL:
-Navigate to: http://localhost:3000 +Navigate to: <http://localhost:3000>Line 177 - Use proper heading:
-**Alternative: Command Line** +### Alternative: Command LineLine 230 - Add language identifier:
-``` +```http POST /v1/vector_stores/{vector_store_id}/searchAlso applies to: 141-146, 175-175, 230-232
examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py (3)
51-55: Consider binding restrictions for production deployments.The server binds to all network interfaces (
host="0.0.0.0"), making it accessible from any network the host is connected to. For a demo, this may be intentional for flexibility, but consider:
- For production: bind to
localhostor specific interface- Document the security implications in deployment guidance
- Consider adding authentication/authorization if exposing externally
159-167: Improve filter parameter validation.The filter logic requires ALL three parameters (
filter_key,filter_type,filter_value) to be present. If a user provides only some of these parameters, they are silently ignored, which could be confusing.Consider adding explicit validation:
🔍 Proposed improvement for filter validation
# Apply filters - if filter_key and filter_type and filter_value: + # Validate filter parameters - all or none must be provided + filter_params_provided = [filter_key, filter_type, filter_value] + filter_params_count = sum(1 for p in filter_params_provided if p is not None) + + if filter_params_count > 0 and filter_params_count < 3: + # Partial filter parameters provided - this is likely an error + logger.warning(f"Partial filter parameters provided (count={filter_params_count}). All three (filter_key, filter_type, filter_value) are required for filtering.") + + if filter_params_count == 3: attr_value = entity.get(filter_key, "") if filter_type == "eq" and attr_value != filter_value: continue elif filter_type == "ne" and attr_value == filter_value: continue elif filter_type == "contains" and filter_value not in str(attr_value): continue
200-204: Improve exception logging to include traceback.The exception handler should use
logging.exception()instead oflogging.error()to automatically include the full traceback, which is valuable for debugging.🔧 Proposed fix
except Exception as e: - logger.error(f"Search error: {e}") + logger.exception(f"Search error: {e}") return json.dumps({ "error": {"code": 500, "message": str(e)} })examples/aidp_openai_demo/scripts/load_support_tickets.py (3)
1-4: Shebang placement issue.The shebang (
#!/usr/bin/env python3) should be at the very beginning of the file, before the SPDX license headers, to ensure the script is executable.Suggested fix
+#!/usr/bin/env python3 # SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0 - -#!/usr/bin/env python3
205-207: Remove redundant import.
DataTypeis already imported at line 18; the re-import here is unnecessary.Proposed fix
def create_collection(client: MilvusClient): """Create the support_tickets collection in Milvus with explicit schema.""" - from pymilvus import DataType -
278-290: Addstrict=Truetozip()for safety.Using
zip(..., strict=True)would catch potential mismatches betweenticketsandembeddingslist lengths, which could indicate a bug. Also, the loop variableiis unused.Proposed fix
import uuid data = [] - for i, (ticket, embedding) in enumerate(zip(tickets, embeddings)): + for ticket, embedding in zip(tickets, embeddings, strict=True): data.append({ "pk": str(uuid.uuid4()), # String primary key "vector": embedding, "title": ticket["title"][:256], # Truncate to fit schema "text": f"{ticket['title']}. {ticket['content']}"[:4096], # Use 'text' field for retriever "category": ticket.get("category", "General")[:64], "severity": ticket.get("severity", "medium")[:32], "status": ticket.get("status", "open")[:32] })examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py (2)
1-4: Shebang placement issue.Same as in the other file—move the shebang to line 1 for proper script execution.
385-394: Improve error handling and logging.The exception handling can be improved:
- Use
logger.exception()to capture the full traceback- Chain the exception with
from eto preserve context for debuggingProposed fix
try: return retrieval_service.search(vector_store_id, request) except HTTPException: raise except Exception as e: - logger.error(f"Search error: {e}") + logger.exception("Search error") raise HTTPException( status_code=500, - detail=f"Internal system error: {str(e)}" - ) + detail=f"Internal system error: {e!s}" + ) from eexamples/aidp_openai_demo/src/nat_aidp_openai_demo/examples.py (2)
1-4: Shebang placement issue.Same fix needed—move shebang to line 1.
333-350: Simplify health check logic.The exception handling can be cleaner. Since
response.raise_for_status()isn't called, a non-200 status doesn't raise an exception, making the manual check necessary. Consider simplifying:Proposed fix
try: # Check if API is available response = requests.get(f"{AIDP_API_BASE}/", timeout=5) - if response.status_code != 200: - raise Exception("API not available") - + response.raise_for_status() print(f"✓ AIDP API Server running at {AIDP_API_BASE}") - except Exception as e: + except requests.exceptions.RequestException: print(f""" ✗ AIDP API Server not running at {AIDP_API_BASE}
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
examples/aidp_openai_demo/README.mdexamples/aidp_openai_demo/pyproject.tomlexamples/aidp_openai_demo/scripts/load_support_tickets.pyexamples/aidp_openai_demo/src/nat_aidp_openai_demo/__init__.pyexamples/aidp_openai_demo/src/nat_aidp_openai_demo/configs/workflow.ymlexamples/aidp_openai_demo/src/nat_aidp_openai_demo/examples.pyexamples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.pyexamples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py
🧰 Additional context used
🧬 Code graph analysis (3)
examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py (1)
examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py (1)
search(169-259)
examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py (1)
examples/aidp_openai_demo/src/nat_aidp_openai_demo/examples.py (1)
search_vector_store(79-130)
examples/aidp_openai_demo/scripts/load_support_tickets.py (1)
examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py (1)
search(169-259)
🪛 Gitleaks (8.30.0)
examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py
[high] 413-415: Discovered a potential authorization token provided in a curl command header, which could compromise the curl accessed resource.
(curl-auth-header)
🪛 markdownlint-cli2 (0.18.1)
examples/aidp_openai_demo/README.md
81-81: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
141-141: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
175-175: Bare URL used
(MD034, no-bare-urls)
177-177: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
230-230: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.10)
examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py
4-4: Shebang should be at the beginning of the file
(EXE005)
53-53: Possible binding to all interfaces
(S104)
73-73: Avoid specifying long messages outside the exception class
(TRY003)
200-200: Do not catch blind exception: Exception
(BLE001)
201-201: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py
4-4: Shebang should be at the beginning of the file
(EXE005)
149-149: Avoid specifying long messages outside the exception class
(TRY003)
271-271: Unused function argument: app
(ARG001)
345-345: Unused function argument: api_key
(ARG001)
373-373: Unused function argument: api_key
(ARG001)
389-389: Do not catch blind exception: Exception
(BLE001)
390-390: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
391-394: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
393-393: Use explicit conversion flag
Replace with conversion flag
(RUF010)
419-419: Possible binding to all interfaces
(S104)
examples/aidp_openai_demo/src/nat_aidp_openai_demo/examples.py
4-4: Shebang should be at the beginning of the file
(EXE005)
337-337: Abstract raise to an inner function
(TRY301)
337-337: Create your own exception
(TRY002)
337-337: Avoid specifying long messages outside the exception class
(TRY003)
341-341: Do not catch blind exception: Exception
(BLE001)
341-341: Local variable e is assigned to but never used
Remove assignment to unused variable e
(F841)
examples/aidp_openai_demo/scripts/load_support_tickets.py
4-4: Shebang should be at the beginning of the file
(EXE005)
36-37: Avoid specifying long messages outside the exception class
(TRY003)
56-56: Probable use of requests call without timeout
(S113)
105-105: Probable use of requests call without timeout
(S113)
121-121: Consider moving this statement to an else block
(TRY300)
207-207: Redefinition of unused DataType from line 18
Remove definition: DataType
(F811)
280-280: Loop control variable i not used within loop body
Rename unused i to _i
(B007)
280-280: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
346-346: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (17)
examples/aidp_openai_demo/src/nat_aidp_openai_demo/__init__.py (1)
1-3: LGTM!Appropriate SPDX license headers for the package initialization file.
examples/aidp_openai_demo/src/nat_aidp_openai_demo/configs/workflow.yml (1)
18-43: LGTM!The workflow configuration is well-structured:
- MCP client correctly points to the server on port 8081 (matches server.py)
- LLM configuration uses appropriate NVIDIA NIM model with conservative temperature (0) for consistent reasoning
- Agent setup includes sensible retry logic (max_retries: 2, retry_parsing_errors: true)
examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py (2)
70-91: LGTM! Good timeout and error handling.The embedding function includes:
- Appropriate 30-second timeout for the external API call
- Proper authentication header handling
- Error propagation via
raise_for_status()for HTTP errors- Network errors (connection issues, timeouts) will propagate to the caller's exception handler
207-221: LGTM!The entry point provides clear server information and correctly starts the MCP server with the streamable-http transport. The banner helpfully displays the endpoint URL, tool name, and Milvus connection details.
examples/aidp_openai_demo/pyproject.toml (1)
11-18: No issues found with dependency versions.All specified dependencies are available on PyPI and appropriately constrained. The
nvidia-nat[langchain,mcp]>=1.4.0a0,<1.5.0constraint matches multiple actively maintained alpha, beta, and RC builds (the specific version1.4.0a0does not exist, but the constraint correctly captures all qualifying versions from 1.4.0a20251008 onward). Thepymilvus~=2.6constraint is satisfied by stable releases 2.6.0 through 2.6.6. For an example project, using alpha versions is appropriate and expected.examples/aidp_openai_demo/scripts/load_support_tickets.py (3)
343-356: LGTM with note on exception handling.The broad
Exceptioncatch on line 346 is intentional to gracefully fall back to static tickets when LLM generation fails for any reason. This is appropriate for a demo/seeding script.
129-202: LGTM!Good fallback data with realistic support ticket examples covering various NVIDIA product categories.
238-266: LGTM!Random vector generation with proper normalization for demo mode is appropriate.
examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py (6)
300-326: LGTM for demo purposes.The authentication correctly enforces Bearer token format. The comment on lines 317-318 appropriately documents this is demo-only validation. The
api_keyparameters in endpoints are not unused—they trigger theverify_api_keydependency for authentication enforcement.
146-167: LGTM!Good implementation with proper timeout (30s) on the external API call and appropriate error handling.
169-259: LGTM with optimization note.The search implementation is correct. The post-filtering approach (lines 214-230) works but could be optimized by pushing filters to Milvus using
filterparameter in the search call for better performance with large datasets. Current approach is fine for demo/small scale.
71-127: LGTM!Pydantic models are well-structured with appropriate validation constraints and follow the OpenAI API specification.
401-419: LGTM!Binding to
0.0.0.0is appropriate for containerized deployment. The Gitleaks warning about the curl example is a false positive—it's documentation showing usage patterns, not an actual secret.
270-277: LGTM!The
appparameter is required by FastAPI's lifespan context manager protocol—the static analysis warning is a false positive.examples/aidp_openai_demo/src/nat_aidp_openai_demo/examples.py (3)
79-130: LGTM!Clean implementation with proper timeout handling and correct OpenAI-compatible request format.
133-311: LGTM!Well-structured examples that clearly demonstrate the API usage patterns and OpenAI alignment.
60-62: LGTM!Good use of environment variables with sensible defaults for demo purposes.
| def generate_embeddings(texts: List[str]) -> List[List[float]]: | ||
| """Generate embeddings using NVIDIA NIM.""" | ||
| headers = get_nvidia_headers() | ||
|
|
||
| payload = { | ||
| "input": texts, | ||
| "model": EMBEDDING_MODEL, | ||
| "input_type": "passage", | ||
| "encoding_format": "float", | ||
| "truncate": "END" | ||
| } | ||
|
|
||
| response = requests.post(NVIDIA_EMBED_URL, headers=headers, json=payload) | ||
| response.raise_for_status() | ||
|
|
||
| result = response.json() | ||
| embeddings = [item["embedding"] for item in result["data"]] | ||
| return embeddings |
There was a problem hiding this comment.
Add timeout to external API call.
The requests.post() call lacks a timeout, which could cause the script to hang indefinitely if the NVIDIA API is unresponsive. This is flagged by static analysis (S113).
Proposed fix
- response = requests.post(NVIDIA_EMBED_URL, headers=headers, json=payload)
+ response = requests.post(NVIDIA_EMBED_URL, headers=headers, json=payload, timeout=60)🧰 Tools
🪛 Ruff (0.14.10)
56-56: Probable use of requests call without timeout
(S113)
🤖 Prompt for AI Agents
In @examples/aidp_openai_demo/scripts/load_support_tickets.py around lines 44 -
61, The generate_embeddings function calls requests.post without a timeout which
can hang; update the call to include a reasonable timeout (e.g., timeout=10)
when posting to NVIDIA_EMBED_URL with headers from get_nvidia_headers and
json=payload, and wrap the request in a try/except to catch requests.Timeout
(and requests.RequestException) so you can log or re-raise a clear error; ensure
the modified code still calls response.raise_for_status() and processes
result["data"] into embeddings as before.
| payload = { | ||
| "model": LLM_MODEL, | ||
| "messages": [{"role": "user", "content": prompt}], | ||
| "temperature": 0.8, | ||
| "max_tokens": 8000 | ||
| } | ||
|
|
||
| print(f"Generating {num_tickets} synthetic support tickets using {LLM_MODEL}...") | ||
| response = requests.post(NVIDIA_LLM_URL, headers=headers, json=payload) | ||
| response.raise_for_status() |
There was a problem hiding this comment.
Add timeout to LLM API call.
Similar to the embeddings call, this requests.post() to the LLM endpoint should include a timeout to prevent indefinite hangs. Given that LLM generation may take longer, a higher timeout (e.g., 120s) is appropriate.
Proposed fix
payload = {
"model": LLM_MODEL,
"messages": [{"role": "user", "content": prompt}],
"temperature": 0.8,
"max_tokens": 8000
}
print(f"Generating {num_tickets} synthetic support tickets using {LLM_MODEL}...")
- response = requests.post(NVIDIA_LLM_URL, headers=headers, json=payload)
+ response = requests.post(NVIDIA_LLM_URL, headers=headers, json=payload, timeout=120)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| payload = { | |
| "model": LLM_MODEL, | |
| "messages": [{"role": "user", "content": prompt}], | |
| "temperature": 0.8, | |
| "max_tokens": 8000 | |
| } | |
| print(f"Generating {num_tickets} synthetic support tickets using {LLM_MODEL}...") | |
| response = requests.post(NVIDIA_LLM_URL, headers=headers, json=payload) | |
| response.raise_for_status() | |
| payload = { | |
| "model": LLM_MODEL, | |
| "messages": [{"role": "user", "content": prompt}], | |
| "temperature": 0.8, | |
| "max_tokens": 8000 | |
| } | |
| print(f"Generating {num_tickets} synthetic support tickets using {LLM_MODEL}...") | |
| response = requests.post(NVIDIA_LLM_URL, headers=headers, json=payload, timeout=120) | |
| response.raise_for_status() |
🧰 Tools
🪛 Ruff (0.14.10)
105-105: Probable use of requests call without timeout
(S113)
🤖 Prompt for AI Agents
In @examples/aidp_openai_demo/scripts/load_support_tickets.py around lines 97 -
106, The LLM API call using requests.post(NVIDIA_LLM_URL, headers=headers,
json=payload) lacks a timeout and can hang; update the call in
load_support_tickets.py to pass a timeout parameter (e.g., timeout=120) to
requests.post and keep response.raise_for_status() as-is so long-running
generation is allowed but bounded; ensure you modify the line that creates the
response (the requests.post invocation) and test that timeouts propagate
cleanly.
| # Add CORS middleware | ||
| app.add_middleware( | ||
| CORSMiddleware, | ||
| allow_origins=["*"], | ||
| allow_credentials=True, | ||
| allow_methods=["*"], | ||
| allow_headers=["*"], | ||
| ) |
There was a problem hiding this comment.
Review CORS configuration for production use.
Using allow_origins=["*"] with allow_credentials=True allows any origin to make credentialed requests. While acceptable for local development/demo, this should be restricted in production to specific trusted origins.
Suggested approach for production
# For production, specify allowed origins explicitly
ALLOWED_ORIGINS = os.getenv("ALLOWED_ORIGINS", "http://localhost:3000").split(",")
app.add_middleware(
CORSMiddleware,
allow_origins=ALLOWED_ORIGINS,
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)🤖 Prompt for AI Agents
In @examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py around lines
286 - 293, The current CORS setup uses app.add_middleware with CORSMiddleware
and allow_origins=["*"] together with allow_credentials=True which permits
credentialed requests from any origin; change this for production by replacing
the wildcard with a configurable list of trusted origins (e.g., read
ALLOWED_ORIGINS from an environment variable), then pass that list to
app.add_middleware (keeping allow_credentials=True if needed) so only explicitly
allowed origins are permitted; update any related configuration or startup code
to parse the ALLOWED_ORIGINS env var (split on commas) and use that variable in
the CORSMiddleware call.
| # Search Milvus | ||
| search_results = client.search( | ||
| collection_name=vector_store_id, | ||
| data=[query_embedding], | ||
| limit=max_num_results, | ||
| output_fields=["pk", "title", "text", "category", "severity", "status"] | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py | sed -n '130,155p'Repository: NVIDIA/NeMo-Agent-Toolkit-Examples
Length of output: 1180
🌐 Web query:
pymilvus MilvusClient timeout configuration search operations
💡 Result:
Short answer: set timeout per-operation by passing timeout to search(), or set a client connection timeout in MilvusClient(...) (which is only for establishing the connection, not a default for all RPCs). See docs.
Examples
-
Per-search timeout:
client.search(collection_name="mycol", data=vectors, limit=10, timeout=5.0). [1] -
Client constructor timeout (connection timeout only):
client = MilvusClient(uri="http://localhost:19530", timeout=10) # used for establishing connection. [2][3]
Notes
- Most MilvusClient methods (search, query, load_collection, list_collections, etc.) accept a timeout: Optional[float] parameter that controls that RPC's deadline (None = wait until response/error). [1][3]
Sources
[1] pymilvus search() API reference. (milvus.io)
[2] Connect/Configure a timeout (client connection) in Milvus docs. (blog.milvus.io)
[3] MilvusClient(...) constructor / timeout parameter. (milvus.io)
Citations:
- 1: https://milvus.io/api-reference/pymilvus/v2.5.x/MilvusClient/Vector/search.md?utm_source=openai
- 2: https://blog.milvus.io/docs/connect-to-milvus-server.md?utm_source=openai
- 3: https://milvus.io/api-reference/pymilvus/v2.5.x/MilvusClient/Client/MilvusClient.md?utm_source=openai
🏁 Script executed:
# Search for timeout patterns in the codebase
rg -i "timeout" examples/aidp_openai_demo/ -B 2 -A 2Repository: NVIDIA/NeMo-Agent-Toolkit-Examples
Length of output: 1818
🏁 Script executed:
# Check imports and client initialization
cat -n examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py | sed -n '1,50p'Repository: NVIDIA/NeMo-Agent-Toolkit-Examples
Length of output: 1752
Add timeout parameter to Milvus search operation to prevent indefinite hangs.
The client.search() call at lines 140-145 lacks a timeout configuration. If the Milvus server becomes unresponsive, the operation will block indefinitely. The pymilvus library supports per-operation timeouts via the timeout parameter.
Consider adding a timeout to the search operation, for example:
search_results = client.search(
collection_name=vector_store_id,
data=[query_embedding],
limit=max_num_results,
output_fields=["pk", "title", "text", "category", "severity", "status"],
timeout=30.0
)This aligns with the timeout patterns already used elsewhere in the codebase (e.g., 30s for API calls). Adjust the timeout value based on expected Milvus response time for your dataset and hardware.
🤖 Prompt for AI Agents
In @examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py around lines
139 - 145, The Milvus search call (client.search) can hang indefinitely; update
the search invocation that uses vector_store_id, query_embedding and
max_num_results to pass a per-operation timeout (e.g., timeout=30.0) so the call
fails fast on unresponsive Milvus; add the timeout parameter to the
client.search(...) argument list and choose an appropriate value consistent with
other timeouts in the codebase.
|
@AnuradhaKaruppiah, please review when you have time. Thanks for the help. |
|
/ok to test 022d366 |
There was a problem hiding this comment.
Should this live under src/ if it's intended to be run by the user?
There was a problem hiding this comment.
Why not expose these tools through NAT? nat mcp serve --config_file <>
This is non-blocking, but it would be nice to see tighter integration.
I also recommend opening an Issue+PR on NeMo Agent Toolkit that dynamically adds vector store endpoints to the FastAPI front-end. /v1/vector_stores/{vector_store_id}/search
We can add an optional list of RetrieverRefs to the FastAPI configuration :)
There was a problem hiding this comment.
Some partners do not want to use NAT, only the server spec, I used NAT as a reference and recommended it if they didn't already have one.
There was a problem hiding this comment.
I understand that, but this is also a NeMo Agent Toolkit Examples repository.
| requires-python = ">=3.11" | ||
|
|
||
| dependencies = [ | ||
| "nvidia-nat[langchain,mcp]>=1.4.0a0,<1.5.0", |
There was a problem hiding this comment.
We use semantic versioning as best as possible. So what you write in now should work until 2.0.0
| "nvidia-nat[langchain,mcp]>=1.4.0a0,<1.5.0", | |
| "nvidia-nat[langchain,mcp]>=1.4.0a0,<2.0.0", |
| "fastmcp", | ||
| "fastapi", | ||
| "uvicorn", | ||
| "requests", |
There was a problem hiding this comment.
Leaving all of these unversioned seems odd. Maybe add a comment that says # versions determined by NeMo Agent Toolkit
| version = "0.1.0" | ||
| description = "AIDP Retrieval API Demo with NVIDIA NIMs - OpenAI Vector Store Search specification via MCP" | ||
| readme = "README.md" | ||
| requires-python = ">=3.11" |
There was a problem hiding this comment.
| requires-python = ">=3.11" | |
| requires-python = ">=3.11,<3.14" |
|
|
||
| ```bash | ||
| export NVIDIA_API_KEY=<YOUR_API_KEY> | ||
| nat serve --config_file src/nat_aidp_openai_demo/configs/workflow.yml --port 8000 |
There was a problem hiding this comment.
It is normal for examples to have a symbolic link from configs -> src/<package>/configs
That way you can have:
--config_file configs/workflow.yml
|
@PicoNVIDIA could you please address the review feedback? |
Description
AIDP OpenAI Demo example, demonstrating the NVIDIA AI Data Platform Retrieval API following the OpenAI Vector Store Search specification, exposed via MCP.
search_vector_storetool via MCPnv-embedqa-e5-v5) and LLM reasoning (llama-3.1-70b-instruct)Summary by CodeRabbit
New Features
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.