RAG External Storage Configuration Example

This example shows how to configure RAG to use external storage for your personal documents and vector database.

Why Use External Storage?

Privacy: Keep personal CV/resume/publications outside the project directory
Security: Prevent accidental commits of sensitive documents to Git
Organization: Centralize personal documents in one location
Flexibility: Share vector database across multiple projects

Configuration Steps

1. Choose External Storage Location

Pick a location outside the project directory:

Windows Example:

C:\Users\YourName\Documents\RAG_Data\
├── personal_docs\      # Your CV, resume, publications
└── vector_db\          # ChromaDB storage

Linux/Mac Example:

/home/username/rag_data/
├── personal_docs/
└── vector_db/

2. Update rag/config.yaml

Edit rag/config.yaml and change the paths to absolute paths:

# Vector Database Settings
vector_db:
  persist_directory: "C:/Users/YourName/Documents/RAG_Data/vector_db"  # Windows
  # persist_directory: "/home/username/rag_data/vector_db"  # Linux/Mac
  collection_name: "personal_knowledge_base"
  embedding_model: "all-MiniLM-L6-v2"
  chunk_size: 500
  chunk_overlap: 100

# Personal Documents Directory
personal_docs:
  directory: "C:/Users/YourName/Documents/RAG_Data/personal_docs"  # Windows
  # directory: "/home/username/rag_data/personal_docs"  # Linux/Mac

Important Notes:

Use forward slashes / even on Windows (or double backslashes \\\\)
Use absolute paths (starting with drive letter on Windows, / on Linux/Mac)
Create the directories before running ingestion

3. Create External Directories

# Windows
mkdir "C:\Users\YourName\Documents\RAG_Data\personal_docs"
mkdir "C:\Users\YourName\Documents\RAG_Data\vector_db"

# Linux/Mac
mkdir -p ~/rag_data/personal_docs
mkdir -p ~/rag_data/vector_db

4. Add Your Documents

Copy your CV, resume, and publications to the external directory:

# Windows
copy cv.pdf "C:\Users\YourName\Documents\RAG_Data\personal_docs\"
copy resume.docx "C:\Users\YourName\Documents\RAG_Data\personal_docs\"

# Linux/Mac
cp cv.pdf ~/rag_data/personal_docs/
cp resume.docx ~/rag_data/personal_docs/

5. Run Ingestion

python rag/ingest_documents.py

The script will automatically use the paths from config.yaml.

Graceful Fallback

What happens if the vector database is not found?

The RAG tool has graceful fallback built-in:

During initialization: Logs a warning but doesn't crash
During agent execution: Returns a helpful message and continues
Agent behavior: Falls back to ArXiv and web search only

Example log output when DB not found:

WARNING - Vector database not found at: C:/Users/YourName/Documents/RAG_Data/vector_db
WARNING - RAG functionality will be disabled until you run document ingestion.
INFO - To enable RAG: 1) Add documents to personal_docs, 2) Run 'python rag/ingest_documents.py'

Agent will continue normally using only ArXiv and web search tools.

Benefits of External Storage

✅ No accidental commits - Personal files stay out of Git
✅ Centralized storage - One location for all personal documents
✅ Reusable database - Share vector DB across multiple projects
✅ Easy backup - Backup one external directory
✅ Privacy - Sensitive data never in project directory

Troubleshooting

"Vector database not found"

Check that the path in config.yaml is correct and absolute
Ensure the directory exists
Run python rag/ingest_documents.py to create the database

"Permission denied"

Ensure you have write permissions to the external directory
On Linux/Mac, check directory permissions with ls -la

"No documents found"

Verify documents are in the personal_docs directory specified in config
Check that file extensions are supported (PDF, DOCX, TXT, MD)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAG External Storage Configuration Example

Why Use External Storage?

Configuration Steps

1. Choose External Storage Location

2. Update rag/config.yaml

3. Create External Directories

4. Add Your Documents

5. Run Ingestion

Graceful Fallback

Benefits of External Storage

Troubleshooting

FilesExpand file tree

EXTERNAL_STORAGE.md

Latest commit

History

EXTERNAL_STORAGE.md

File metadata and controls

RAG External Storage Configuration Example

Why Use External Storage?

Configuration Steps

1. Choose External Storage Location

2. Update rag/config.yaml

3. Create External Directories

4. Add Your Documents

5. Run Ingestion

Graceful Fallback

Benefits of External Storage

Troubleshooting