Skip to content

Latest commit

 

History

History
131 lines (96 loc) · 3.94 KB

File metadata and controls

131 lines (96 loc) · 3.94 KB

RAG External Storage Configuration Example

This example shows how to configure RAG to use external storage for your personal documents and vector database.

Why Use External Storage?

  • Privacy: Keep personal CV/resume/publications outside the project directory
  • Security: Prevent accidental commits of sensitive documents to Git
  • Organization: Centralize personal documents in one location
  • Flexibility: Share vector database across multiple projects

Configuration Steps

1. Choose External Storage Location

Pick a location outside the project directory:

Windows Example:

C:\Users\YourName\Documents\RAG_Data\
├── personal_docs\      # Your CV, resume, publications
└── vector_db\          # ChromaDB storage

Linux/Mac Example:

/home/username/rag_data/
├── personal_docs/
└── vector_db/

2. Update rag/config.yaml

Edit rag/config.yaml and change the paths to absolute paths:

# Vector Database Settings
vector_db:
  persist_directory: "C:/Users/YourName/Documents/RAG_Data/vector_db"  # Windows
  # persist_directory: "/home/username/rag_data/vector_db"  # Linux/Mac
  collection_name: "personal_knowledge_base"
  embedding_model: "all-MiniLM-L6-v2"
  chunk_size: 500
  chunk_overlap: 100

# Personal Documents Directory
personal_docs:
  directory: "C:/Users/YourName/Documents/RAG_Data/personal_docs"  # Windows
  # directory: "/home/username/rag_data/personal_docs"  # Linux/Mac

Important Notes:

  • Use forward slashes / even on Windows (or double backslashes \\\\)
  • Use absolute paths (starting with drive letter on Windows, / on Linux/Mac)
  • Create the directories before running ingestion

3. Create External Directories

# Windows
mkdir "C:\Users\YourName\Documents\RAG_Data\personal_docs"
mkdir "C:\Users\YourName\Documents\RAG_Data\vector_db"

# Linux/Mac
mkdir -p ~/rag_data/personal_docs
mkdir -p ~/rag_data/vector_db

4. Add Your Documents

Copy your CV, resume, and publications to the external directory:

# Windows
copy cv.pdf "C:\Users\YourName\Documents\RAG_Data\personal_docs\"
copy resume.docx "C:\Users\YourName\Documents\RAG_Data\personal_docs\"

# Linux/Mac
cp cv.pdf ~/rag_data/personal_docs/
cp resume.docx ~/rag_data/personal_docs/

5. Run Ingestion

python rag/ingest_documents.py

The script will automatically use the paths from config.yaml.

Graceful Fallback

What happens if the vector database is not found?

The RAG tool has graceful fallback built-in:

  1. During initialization: Logs a warning but doesn't crash
  2. During agent execution: Returns a helpful message and continues
  3. Agent behavior: Falls back to ArXiv and web search only

Example log output when DB not found:

WARNING - Vector database not found at: C:/Users/YourName/Documents/RAG_Data/vector_db
WARNING - RAG functionality will be disabled until you run document ingestion.
INFO - To enable RAG: 1) Add documents to personal_docs, 2) Run 'python rag/ingest_documents.py'

Agent will continue normally using only ArXiv and web search tools.

Benefits of External Storage

No accidental commits - Personal files stay out of Git
Centralized storage - One location for all personal documents
Reusable database - Share vector DB across multiple projects
Easy backup - Backup one external directory
Privacy - Sensitive data never in project directory

Troubleshooting

"Vector database not found"

  • Check that the path in config.yaml is correct and absolute
  • Ensure the directory exists
  • Run python rag/ingest_documents.py to create the database

"Permission denied"

  • Ensure you have write permissions to the external directory
  • On Linux/Mac, check directory permissions with ls -la

"No documents found"

  • Verify documents are in the personal_docs directory specified in config
  • Check that file extensions are supported (PDF, DOCX, TXT, MD)