RAG Service

RAG Service is a template service for a retrieval-augmented generator based on the examples of LangChain. See: Build a Retrieval Augmented Generation (RAG) App

This service can be used with curl but there is another project that serves the user interface: L1Blom/ragui

Technologies

Installation

Clone L1Blom/rag to your project directory
Move config.py_example to config.py and add your API Keys
Choose an ID for your instance, like MyDocs
Copy constants/constants.ini to constants_MyDocs.ini
Change the contents of this file to reflect your situation
Create a MyDocs/ directory in data/ and in MyDocs/ a directory vectorstore/ and html/
Add the files for your RAG context, like text and PDF files, to your MyDocs/ directory
Create a Python virtual environment for your project (optional)
Modify the example services/rag.service_template file to point to the right place of your project directory
Copy the services file to /etc/systed/system/rag_MyDocs.service
Enable and start the service

sudo systemctl enable rag_MyDocs
sudo systemctl start rag_MyDocs
sudo systemctl status rag_MyDocs

Commands

All calls support POST and GET. For <ID> use your chosen ID like MyDocs

/prompt/<ID>/

Parameter: prompt (string)

Your prompt to be send
/prompt/<ID>/stream

Parameter: prompt (string)

Your prompt to be send

Streams the answer token-by-token (text/plain, chunked transfer encoding). Uses the same RAG chain and session history as /prompt/<ID>/ but via chain.stream() instead of chain.invoke(). Requires X-Accel-Buffering: no header (set automatically) and nginx proxy_buffering off for the streaming location.
/prompt/<ID>/full

Parameter: prompt (string)

Your prompt to be send

Returns all document fragments used for this prompt
/prompt/<ID>/search

Parameter: prompt (string)

Your prompt to be send

Similar search in the local documents, returns fragments and scores
/prompt/<ID>/model

Parameter: model (string)

Your model to be used, like "gpt-4o"

Checking on valid models with OpenAI client.models.list(). Can result in http 500 error (non-fatal)
/prompt/<ID>/temp

Parameter: temp (string, will be cast to float)

Temperature setting, between 0.0 and 2.0

Settings above 1.0 can give significant halicunations and degrades performance too.

Timeout can result in http 408 error (non-fatal)
/prompt/<ID>/reload

Parameters: none

Rebuilds the vector store from local files.

Current behavior:
- Clears existing vector data first.
- Reloads all configured document sources.
- Re-initializes the RAG chain after rebuild.
For X posts, reload uses local snapshots (data/<tweet_id>/post.json + post.txt) and does not need a refetch.
/prompt/<ID>/clear

Paramters: none

Clears the cache, the in-memory history
/prompt/<ID>/cache

Paramaters: none

Prints the cache contents to the response object
/prompt/<ID>/modelnames

Paramaters: none

Prints the names of the possible models used in the selected APIs
/prompt/<ID>/params

Paramaters: section (string), param (string)

Prints the settings from the .ini file
/prompt/<ID>/image

Parameters: prompt (string), image (URL to image)

Uploads the image to openAI and use prompt to get the desired contents like: 'What is the mood of the persons?'

Note: only works if model is set to 'gpt-4o'. Other models result in http 500 error (non-fatal)
/prompt/<ID>/upload

Parameters: file (string) (maximum size 16 Mb)

Uploads the file to the directory DATA_DIR, only if the extension is listed in DATA_GLOB_* If not, results in http 500 error (non-fatal)
/prompt/<ID>/uploadx

Parameters: url (string) - X (Twitter) post URL

Fetches an X post via the X API v2, stores a local snapshot, and vectorizes text for RAG context.

Supported URL formats:
- https://x.com/username/status/1234567890
- https://twitter.com/username/status/1234567890
Requires X API Bearer Token to be set in environment variable:
- X_API_KEY or TWITTER_BEARER_TOKEN
Get your API key from: https://developer.x.com/

Local storage layout per post:
- data/<tweet_id>/post.json
- data/<tweet_id>/post.txt
- data/<tweet_id>/images/
- data/<tweet_id>/videos/
- data/<tweet_id>/audio/
Indexing scope:
- Text content is indexed.
- Video and audio are downloaded for later use and are not indexed/transformed yet.
/prompt//uploadx/batch

Parameters: file (JSON array or text file, one URL per line)

Batch version of uploadx.

Behavior:
- Validates and normalizes URLs.
- Processes each valid URL sequentially.
- Stores URLs in x.json and writes local per-post snapshots.
- Returns a summary of successful and failed URLs.
/prompt/<ID>/xposts/chat

Parameters: prompt (string)

Answers a question using all X posts as context (bypasses vector search).

When the full post set exceeds a single LLM context window, posts are processed via map-reduce:
- Posts are split into character-budgeted batches (configurable via xposts_batch_chars in the INI file).
- Each batch is analyzed in parallel (4 workers via ThreadPoolExecutor), extracting facts and per-batch statistics.
- A deterministic author stats table (post counts, likes, retweets) is computed from all posts and included in the reduce prompt for accurate aggregation queries.
- A final synthesis step combines all batch findings into one streamed answer.
- After the answer streams, [[IMG:path]] markers are inserted inline after cited post URLs that have downloaded images. Images are limited to xposts_max_images (default 5).
Response is streamed (text/plain, chunked). Progress markers [Analyzing N/M...] are emitted during batch processing and stripped by the frontend before saving to chat history.

If all posts fit in one batch, a single-batch shortcut streams a direct answer with no map-reduce overhead.

Usage

# change the model
curl -X POST --data-urlencode "model=gpt-4o" http://<your server>:<your port>/prompt/<ID>/model

Model set to: gpt-4o
# prompt to your data
curl -X POST --data-urlencode "prompt=your question?" http://<your server>:<your port>/prompt/<ID>

Your answer based on the context files provided in data/<ID>

Constants file

important contstants are:

# simple string like "myDocs"
ID = _unittest
# Directory that will be scanned for files to be added to the context
DATA_DIR=data/_unittest
# All the file extentions you want to be part of the context, see LangChain documentation
# Currently text and pdf are supported by RAG Service
DATA_GLOB_TXT = *.txt
DATA_GLOB_PDF = *.pdf
# Persistence directory for vectorstore
PERSISTENCE = data/_unittest/vectorstore
# Where the HTML files reside, also needed for the unit tests
HTML = data/_unittest/html

X Posts configuration

The following INI keys control the xposts/chat map-reduce behavior:

# Max characters of X posts per LLM batch in map-reduce. Lower this for
# small-context models (e.g. 20000 for 8K-token models).
xposts_batch_chars = 100000

# Max number of images to embed in the answer for cited posts.
xposts_max_images = 5

Both keys are read from the [DEFAULT] section of the project's INI file. If absent, the code defaults to 20000 chars and 5 images respectively.

Unit tests

To run the unit tests, run the program in the project directory using the ID '_unittest'. It will start a local RAG service accessible at port 8888 (see constants__unittest.py for all defaults). When it is running, unit tests can be performed. Currently when USE_LLM is set to OPENAI, it will run smoothly. Other settings like GROQ might fail depending on the licences you have because of too many calls per minute. if so, try to run the unit test one by one. See below all possible API-calls and paramters:

<your virtual environment>/bin/python ragservice.py _unittest
INFO:root:Working directory is /home/leen/projects/rag
INFO:httpx:HTTP Request: GET https://api.openai.com/v1/models "HTTP/1.1 200 OK"
INFO:root:path -> /prompt/_unittest prompt
INFO:root:path -> /prompt/_unittest/stream prompt
INFO:root:path -> /prompt/_unittest/full prompt
INFO:root:path -> /prompt/_unittest/search prompt,similar
INFO:root:path -> /prompt/_unittest/documents id
INFO:root:path -> /prompt/_unittest/params section,param
INFO:root:path -> /prompt/_unittest/globals 
INFO:root:path -> /prompt/_unittest/modelnames 
INFO:root:path -> /prompt/_unittest/embeddingnames 
INFO:root:path -> /prompt/_unittest/model model
INFO:root:path -> /prompt/_unittest/embeddings embedding
INFO:root:path -> /prompt/_unittest/chunk chunk_size,chunk_overlap
INFO:root:path -> /prompt/_unittest/temp temp
INFO:root:path -> /prompt/_unittest/reload 
INFO:root:path -> /prompt/_unittest/clear 
INFO:root:path -> /prompt/_unittest/cache 
INFO:root:path -> /prompt/_unittest/file file
INFO:root:path -> /prompt/_unittest/context file,action
INFO:root:path -> /prompt/_unittest/image image,prompt
INFO:root:path -> /prompt/_unittest/upload 
INFO:root:path -> /prompt/_unittest/uploadx url
INFO:chromadb.telemetry.product.posthog:Anonymized telemetry enabled. See                     https://docs.trychroma.com/telemetry for more information.
INFO:root:Loaded 8 chunks from persistent vectorstore
INFO:root:Chain initialized: gpt-4o
 * Serving Flask app 'ragservice'
 * Debug mode: off
INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:8888
 * Running on http://192.168.2.200:8888
INFO:werkzeug:Press CTRL+C to quit

Now you are able to run the unit tests:

<your virtual environament>/bin/python ragservice_unittest.py -v
Testing OPENAI
test_cache (__main__.RagServiceMethods.test_cache)
Test to print the contents of the cache ... User:content='who wrote rag service?' User:content='who wrote rag service?'
AI:content='RAG Service was developed by Leen Blom.' AI:content='RAG Service was developed by L1Blom.'
ok
test_clear (__main__.RagServiceMethods.test_clear)
Test to clear the cache ... ok
test_image (__main__.RagServiceMethods.test_image)
Test image ... ok
test_model (__main__.RagServiceMethods.test_model)
Test model setting, correct or incorrect model according to LLM ... ok
test_prompt (__main__.RagServiceMethods.test_prompt)
Test prompt ... ok
test_reload (__main__.RagServiceMethods.test_reload)
Test reload of the data ... ok
test_temperature (__main__.RagServiceMethods.test_temperature)
Test to set temparature too high, low, within boundaries 0.0 and 2.0 ... ok

----------------------------------------------------------------------
Ran 7 tests in 17.713s

OK

TODO's and wishes

None at the moment

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

The image used in the unittest is licensed CC BY-NC-ND 4.0 and was found at Trusted Reviews

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
constants		constants
data/_unittest		data/_unittest
docs		docs
plans		plans
rag		rag
scripts		scripts
services		services
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pylintrc		.pylintrc
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
REBUILD.md		REBUILD.md
REFACTOR.md		REFACTOR.md
config.json_template		config.json_template
config.py_example		config.py_example
config_unittest.py		config_unittest.py
configservice.py		configservice.py
docker-compose.yml		docker-compose.yml
left-handed-mouse.sh		left-handed-mouse.sh
nginx_example.conf		nginx_example.conf
nltk.sh		nltk.sh
ragservice.py		ragservice.py
ragservice_old.py		ragservice_old.py
ragservice_unittest.py		ragservice_unittest.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Service

Technologies

Installation

Commands

Usage

Constants file

X Posts configuration

Unit tests

TODO's and wishes

Contributing

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG Service

Technologies

Installation

Commands

Usage

Constants file

X Posts configuration

Unit tests

TODO's and wishes

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages