RAG Service is a template service for a retrieval-augmented generator based on the examples of LangChain. See: Build a Retrieval Augmented Generation (RAG) App
This service can be used with curl but there is another project that serves the user interface: L1Blom/ragui
- Clone L1Blom/rag to your project directory
- Move config.py_example to config.py and add your API Keys
- Choose an ID for your instance, like MyDocs
- Copy constants/constants.ini to constants_MyDocs.ini
- Change the contents of this file to reflect your situation
- Create a MyDocs/ directory in data/ and in MyDocs/ a directory vectorstore/ and html/
- Add the files for your RAG context, like text and PDF files, to your MyDocs/ directory
- Create a Python virtual environment for your project (optional)
- Modify the example services/rag.service_template file to point to the right place of your project directory
- Copy the services file to /etc/systed/system/rag_MyDocs.service
- Enable and start the service
sudo systemctl enable rag_MyDocs
sudo systemctl start rag_MyDocs
sudo systemctl status rag_MyDocsAll calls support POST and GET. For <ID> use your chosen ID like MyDocs
-
/prompt/<ID>/
Parameter: prompt (string)
Your prompt to be send
-
/prompt/<ID>/stream
Parameter: prompt (string)
Your prompt to be send
Streams the answer token-by-token (text/plain, chunked transfer encoding). Uses the same RAG chain and session history as
/prompt/<ID>/but viachain.stream()instead ofchain.invoke(). RequiresX-Accel-Buffering: noheader (set automatically) and nginxproxy_buffering offfor the streaming location. -
/prompt/<ID>/full
Parameter: prompt (string)
Your prompt to be send
Returns all document fragments used for this prompt
-
/prompt/<ID>/search
Parameter: prompt (string)
Your prompt to be send
Similar search in the local documents, returns fragments and scores
-
/prompt/<ID>/model
Parameter: model (string)
Your model to be used, like "gpt-4o"
Checking on valid models with OpenAI client.models.list(). Can result in http 500 error (non-fatal)
-
/prompt/<ID>/temp
Parameter: temp (string, will be cast to float)
Temperature setting, between 0.0 and 2.0
Settings above 1.0 can give significant halicunations and degrades performance too.
Timeout can result in http 408 error (non-fatal)
-
/prompt/<ID>/reload
Parameters: none
Rebuilds the vector store from local files.
Current behavior:
- Clears existing vector data first.
- Reloads all configured document sources.
- Re-initializes the RAG chain after rebuild.
For X posts, reload uses local snapshots (
data/<tweet_id>/post.json+post.txt) and does not need a refetch. -
/prompt/<ID>/clear
Paramters: none
Clears the cache, the in-memory history
-
/prompt/<ID>/cache
Paramaters: none
Prints the cache contents to the response object
-
/prompt/<ID>/modelnames
Paramaters: none
Prints the names of the possible models used in the selected APIs
-
/prompt/<ID>/params
Paramaters: section (string), param (string)
Prints the settings from the .ini file
-
/prompt/<ID>/image
Parameters: prompt (string), image (URL to image)
Uploads the image to openAI and use prompt to get the desired contents like: 'What is the mood of the persons?'
Note: only works if model is set to 'gpt-4o'. Other models result in http 500 error (non-fatal)
-
/prompt/<ID>/upload
Parameters: file (string) (maximum size 16 Mb)
Uploads the file to the directory DATA_DIR, only if the extension is listed in DATA_GLOB_* If not, results in http 500 error (non-fatal)
-
/prompt/<ID>/uploadx
Parameters: url (string) - X (Twitter) post URL
Fetches an X post via the X API v2, stores a local snapshot, and vectorizes text for RAG context.
Supported URL formats:
Requires X API Bearer Token to be set in environment variable:
- X_API_KEY or TWITTER_BEARER_TOKEN
Get your API key from: https://developer.x.com/
Local storage layout per post:
data/<tweet_id>/post.jsondata/<tweet_id>/post.txtdata/<tweet_id>/images/data/<tweet_id>/videos/data/<tweet_id>/audio/
Indexing scope:
- Text content is indexed.
- Video and audio are downloaded for later use and are not indexed/transformed yet.
-
/prompt//uploadx/batch
Parameters: file (JSON array or text file, one URL per line)
Batch version of
uploadx.Behavior:
- Validates and normalizes URLs.
- Processes each valid URL sequentially.
- Stores URLs in
x.jsonand writes local per-post snapshots. - Returns a summary of successful and failed URLs.
-
/prompt/<ID>/xposts/chat
Parameters: prompt (string)
Answers a question using all X posts as context (bypasses vector search).
When the full post set exceeds a single LLM context window, posts are processed via map-reduce:
- Posts are split into character-budgeted batches (configurable via
xposts_batch_charsin the INI file). - Each batch is analyzed in parallel (4 workers via ThreadPoolExecutor), extracting facts and per-batch statistics.
- A deterministic author stats table (post counts, likes, retweets) is computed from all posts and included in the reduce prompt for accurate aggregation queries.
- A final synthesis step combines all batch findings into one streamed answer.
- After the answer streams,
[[IMG:path]]markers are inserted inline after cited post URLs that have downloaded images. Images are limited toxposts_max_images(default 5).
Response is streamed (text/plain, chunked). Progress markers
[Analyzing N/M...]are emitted during batch processing and stripped by the frontend before saving to chat history.If all posts fit in one batch, a single-batch shortcut streams a direct answer with no map-reduce overhead.
- Posts are split into character-budgeted batches (configurable via
# change the model
curl -X POST --data-urlencode "model=gpt-4o" http://<your server>:<your port>/prompt/<ID>/model
Model set to: gpt-4o
# prompt to your data
curl -X POST --data-urlencode "prompt=your question?" http://<your server>:<your port>/prompt/<ID>
Your answer based on the context files provided in data/<ID>important contstants are:
# simple string like "myDocs"
ID = _unittest
# Directory that will be scanned for files to be added to the context
DATA_DIR=data/_unittest
# All the file extentions you want to be part of the context, see LangChain documentation
# Currently text and pdf are supported by RAG Service
DATA_GLOB_TXT = *.txt
DATA_GLOB_PDF = *.pdf
# Persistence directory for vectorstore
PERSISTENCE = data/_unittest/vectorstore
# Where the HTML files reside, also needed for the unit tests
HTML = data/_unittest/htmlThe following INI keys control the xposts/chat map-reduce behavior:
# Max characters of X posts per LLM batch in map-reduce. Lower this for
# small-context models (e.g. 20000 for 8K-token models).
xposts_batch_chars = 100000
# Max number of images to embed in the answer for cited posts.
xposts_max_images = 5Both keys are read from the [DEFAULT] section of the project's INI file.
If absent, the code defaults to 20000 chars and 5 images respectively.
To run the unit tests, run the program in the project directory using the ID '_unittest'. It will start a local RAG service accessible at port 8888 (see constants__unittest.py for all defaults). When it is running, unit tests can be performed. Currently when USE_LLM is set to OPENAI, it will run smoothly. Other settings like GROQ might fail depending on the licences you have because of too many calls per minute. if so, try to run the unit test one by one. See below all possible API-calls and paramters:
<your virtual environment>/bin/python ragservice.py _unittest
INFO:root:Working directory is /home/leen/projects/rag
INFO:httpx:HTTP Request: GET https://api.openai.com/v1/models "HTTP/1.1 200 OK"
INFO:root:path -> /prompt/_unittest prompt
INFO:root:path -> /prompt/_unittest/stream prompt
INFO:root:path -> /prompt/_unittest/full prompt
INFO:root:path -> /prompt/_unittest/search prompt,similar
INFO:root:path -> /prompt/_unittest/documents id
INFO:root:path -> /prompt/_unittest/params section,param
INFO:root:path -> /prompt/_unittest/globals
INFO:root:path -> /prompt/_unittest/modelnames
INFO:root:path -> /prompt/_unittest/embeddingnames
INFO:root:path -> /prompt/_unittest/model model
INFO:root:path -> /prompt/_unittest/embeddings embedding
INFO:root:path -> /prompt/_unittest/chunk chunk_size,chunk_overlap
INFO:root:path -> /prompt/_unittest/temp temp
INFO:root:path -> /prompt/_unittest/reload
INFO:root:path -> /prompt/_unittest/clear
INFO:root:path -> /prompt/_unittest/cache
INFO:root:path -> /prompt/_unittest/file file
INFO:root:path -> /prompt/_unittest/context file,action
INFO:root:path -> /prompt/_unittest/image image,prompt
INFO:root:path -> /prompt/_unittest/upload
INFO:root:path -> /prompt/_unittest/uploadx url
INFO:chromadb.telemetry.product.posthog:Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information.
INFO:root:Loaded 8 chunks from persistent vectorstore
INFO:root:Chain initialized: gpt-4o
* Serving Flask app 'ragservice'
* Debug mode: off
INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:8888
* Running on http://192.168.2.200:8888
INFO:werkzeug:Press CTRL+C to quitNow you are able to run the unit tests:
<your virtual environament>/bin/python ragservice_unittest.py -v
Testing OPENAI
test_cache (__main__.RagServiceMethods.test_cache)
Test to print the contents of the cache ... User:content='who wrote rag service?' User:content='who wrote rag service?'
AI:content='RAG Service was developed by Leen Blom.' AI:content='RAG Service was developed by L1Blom.'
ok
test_clear (__main__.RagServiceMethods.test_clear)
Test to clear the cache ... ok
test_image (__main__.RagServiceMethods.test_image)
Test image ... ok
test_model (__main__.RagServiceMethods.test_model)
Test model setting, correct or incorrect model according to LLM ... ok
test_prompt (__main__.RagServiceMethods.test_prompt)
Test prompt ... ok
test_reload (__main__.RagServiceMethods.test_reload)
Test reload of the data ... ok
test_temperature (__main__.RagServiceMethods.test_temperature)
Test to set temparature too high, low, within boundaries 0.0 and 2.0 ... ok
----------------------------------------------------------------------
Ran 7 tests in 17.713s
OK- None at the moment
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
The image used in the unittest is licensed CC BY-NC-ND 4.0 and was found at Trusted Reviews