In RAG v1.0.0, a single server managed both ingestion and retrieval/generation APIs.
In RAG v2.0.0, the architecture has evolved to utilize two separate servers:
- RAG Server - Manages retrieval and generation APIs.
- Ingestion Server - Manages ingestion APIs.
Also the pipeline by default using on-prem models as default. Earlier it used to use NVIDIA cloud hosted models as default. The minimum hardware requirements for deploying the blueprint in its default settings is specified here. This guide outlines the key changes and steps required for migration.
| Feature | RAG v1.0.0 (Single Server) | RAG v2.0.0 (Separate Servers) |
|---|---|---|
| API Hosting | Single server for all APIs | Two servers: RAG Server and Ingestion Server |
| Retrieval & Generation | Same server as ingestion | Hosted separately in RAG Server |
| Document Ingestion | Same server as retrieval | Hosted separately in Ingestion Server |
Updated openapi schemas are available here.
-
Collection Management:
- Create Collection:
- New Endpoint:
POST /collections - Description: Allows the creation of document collections. Previously, collections were implicitly created during document uploads.
- New Endpoint:
- Delete Collection:
- New Endpoint:
DELETE /collections/{collection_name} - Description: Enables deletion of entire collections.
- New Endpoint:
- Create Collection:
-
Multi-file Document Upload:
- Enhanced Endpoint:
POST /documents - Description: Supports uploading multiple files in a single request. Previously, only single-file uploads were supported.
- Enhanced Endpoint:
| API Endpoint | RAG v1.0.0 | RAG v2.0.0 |
|---|---|---|
/documents (POST) - Upload Document |
Unified Server | Now in Ingestion Server |
/documents (GET) - List Documents |
Unified Server | Now in Ingestion Server |
/documents (DELETE) - Delete Document |
Unified Server | Now in Ingestion Server |
/generate (POST) - Generate Answer |
Unified Server | Now in RAG Server |
/search (POST) - Document Search |
Unified Server | Now in RAG Server |
-
Ingestion API Enhancements:
PATCH /documentsintroduced in v2.0.0 for deleting & uploading documents in a single request.POST /documentswill throw error if a document exists in the collectionPOST /documentsnow accepts multiple files as a list instead a single file. The payload schema in v2.0.0 is non-backward compatible with v1.0.0.- A seperate
POST /collectionsAPI is now needed to be called to create a new collection. In v1.0.0, a new collection was automatically created whenPOST /documentswas called. - New optional parameters introduced for all APIs to improve the runtime configurability of the pipeline.
DELETE /documentsAPI now accepts multiple files (List[str]) in the payload instead of a single string. This is again non-backward compatible with v1.0.0.
-
Document Search and Generate Enhancements:
searchandgenerateAPI now includes additional options added to refine retrieval results.- Both of these APIs remain backward compatible with v1.0.0.
-
Health API remains unchanged:
/healthendpoint still exists in both servers and is backward compatible.
Ensure that you run two separate containers for RAG Server and Ingestion Server by following the quickstart guide
Modify API calls in your client applications:
- For Retrieval & Generation, update requests to point to the RAG Server (e.g.,
http://rag-server:8081). - For Document Ingestion, update requests to point to the Ingestion Server (e.g.,
http://ingestion-server:8082).
You can understand the updated schemas for APIs in v2.0.0 by following the notebooks.