This example shows how to configure RAG to use external storage for your personal documents and vector database.
- Privacy: Keep personal CV/resume/publications outside the project directory
- Security: Prevent accidental commits of sensitive documents to Git
- Organization: Centralize personal documents in one location
- Flexibility: Share vector database across multiple projects
Pick a location outside the project directory:
Windows Example:
C:\Users\YourName\Documents\RAG_Data\
├── personal_docs\ # Your CV, resume, publications
└── vector_db\ # ChromaDB storage
Linux/Mac Example:
/home/username/rag_data/
├── personal_docs/
└── vector_db/
Edit rag/config.yaml and change the paths to absolute paths:
# Vector Database Settings
vector_db:
persist_directory: "C:/Users/YourName/Documents/RAG_Data/vector_db" # Windows
# persist_directory: "/home/username/rag_data/vector_db" # Linux/Mac
collection_name: "personal_knowledge_base"
embedding_model: "all-MiniLM-L6-v2"
chunk_size: 500
chunk_overlap: 100
# Personal Documents Directory
personal_docs:
directory: "C:/Users/YourName/Documents/RAG_Data/personal_docs" # Windows
# directory: "/home/username/rag_data/personal_docs" # Linux/MacImportant Notes:
- Use forward slashes
/even on Windows (or double backslashes\\\\) - Use absolute paths (starting with drive letter on Windows,
/on Linux/Mac) - Create the directories before running ingestion
# Windows
mkdir "C:\Users\YourName\Documents\RAG_Data\personal_docs"
mkdir "C:\Users\YourName\Documents\RAG_Data\vector_db"
# Linux/Mac
mkdir -p ~/rag_data/personal_docs
mkdir -p ~/rag_data/vector_dbCopy your CV, resume, and publications to the external directory:
# Windows
copy cv.pdf "C:\Users\YourName\Documents\RAG_Data\personal_docs\"
copy resume.docx "C:\Users\YourName\Documents\RAG_Data\personal_docs\"
# Linux/Mac
cp cv.pdf ~/rag_data/personal_docs/
cp resume.docx ~/rag_data/personal_docs/python rag/ingest_documents.pyThe script will automatically use the paths from config.yaml.
What happens if the vector database is not found?
The RAG tool has graceful fallback built-in:
- During initialization: Logs a warning but doesn't crash
- During agent execution: Returns a helpful message and continues
- Agent behavior: Falls back to ArXiv and web search only
Example log output when DB not found:
WARNING - Vector database not found at: C:/Users/YourName/Documents/RAG_Data/vector_db
WARNING - RAG functionality will be disabled until you run document ingestion.
INFO - To enable RAG: 1) Add documents to personal_docs, 2) Run 'python rag/ingest_documents.py'
Agent will continue normally using only ArXiv and web search tools.
✅ No accidental commits - Personal files stay out of Git
✅ Centralized storage - One location for all personal documents
✅ Reusable database - Share vector DB across multiple projects
✅ Easy backup - Backup one external directory
✅ Privacy - Sensitive data never in project directory
"Vector database not found"
- Check that the path in
config.yamlis correct and absolute - Ensure the directory exists
- Run
python rag/ingest_documents.pyto create the database
"Permission denied"
- Ensure you have write permissions to the external directory
- On Linux/Mac, check directory permissions with
ls -la
"No documents found"
- Verify documents are in the
personal_docsdirectory specified in config - Check that file extensions are supported (PDF, DOCX, TXT, MD)