A privacy-first RAG pipeline designed for healthcare, implementing PII scrubbing and AWS Bedrock integration to ensure HIPAA-compliant AI inference.
docker-compose up --build
# API: http://localhost:8000/docs
# UI: http://localhost:8501graph TD
A["📄 Clinical PDFs"] --> B["🛠️ Ingestion<br/>(Parse & Chunk)"]
B -->|PHI Scrubbing| C["🧬 Embeddings<br/>(Multi-Provider)"]
C --> D[("🌲 Pinecone<br/>Vector DB")]
subgraph RAG["⚙️ RAG Engine"]
D <-->|Semantic Search| E["🔍 Retrieval<br/>& Compression"]
E -->|Relevant Context| F["🏭 LLM Provider<br/>Selection"]
F -->|OpenAI/Anthropic/Bedrock| G["🧠 Generate<br/>Answer"]
end
G -->|Grounded Response| H["🚀 FastAPI<br/>REST API"]
subgraph UI["👥 User Interface Layer"]
H --> I["💻 Streamlit<br/>Dashboard"]
I --> J["🏥 Clinical<br/>Intelligence"]
end
style A fill:#e8f5e9
style D fill:#1b5e20,stroke:#fff,color:#fff,stroke-width:3px
style G fill:#4a148c,stroke:#fff,color:#fff,stroke-width:3px
style RAG fill:#f5f5f5,stroke:#666,stroke-width:2px
style UI fill:#fff3e0,stroke:#ff6f00,stroke-width:2px
style H fill:#ff9800,stroke:#fff,color:#fff,stroke-width:2px
style I fill:#ffb74d,stroke:#fff,color:#333,stroke-width:2px
style J fill:#0d2949,stroke:#fff,color:#fff,stroke-width:3px
- 📄 Upload PDFs
- 💬 Ask questions → AI answers from docs
- 📥 Export analysis
- 🔐 De-identify PHI (HIPAA)
- ⚡ 20-30% fewer tokens (compression)
- 🔄 Any LLM (swap in
.env) - ✅ Validated (F:1.00, P:1.00)
- 📊 Production Monitoring** (LangSmith tracing)
| Metric | Score |
|---|---|
| Faithfulness | 1.00 |
| Relevancy | 0.97 |
| Precision | 1.00 |
| Overall | 0.99 |
- SETUP.md - Install
- API.md - REST API
- UI.md - Dashboard
- FEATURES.md - Advanced
- MONITORING.md - 📊 LangSmith Tracing
- TROUBLESHOOTING.md - Help
- DOCKER.md - Deploy
- ARCHITECTURE.md - Design
| Layer | Technology | Purpose |
|---|---|---|
| Backend API | FastAPI | REST endpoints, async request handling |
| Frontend UI | Streamlit | Interactive dashboard, document upload |
| LLM Orchestration | LangChain | Chain-of-thought reasoning, prompt management |
| Vector Database | Pinecone | Semantic search, embeddings storage |
| Evaluation | Ragas Framework | RAG quality metrics (Faithfulness, Precision, Recall) |
| Containerization | Docker + Compose | Production deployment, multi-service orchestration |
| LLM Providers | OpenAI, Anthropic, AWS Bedrock | Multi-provider support (plug-and-play) |
| Embeddings | OpenAI/Anthropic/Bedrock | Text-to-vector conversion |
| Testing | Pytest | Unit + integration tests |
| Language | Python 3.11+ | Core implementation language |
| Monitoring | LangChain Tracing (LangSmith) | Debug RAG pipeline, token tracking, latency analysis |
✅ Provider Factory (swap LLMs)
✅ Contextual Compression (20-30% tokens)
✅ PHI De-identification (HIPAA)
✅ Gold QA Benchmarks
✅ Ragas Evaluation
✅ LangSmith Monitoring** (production observability)
SETUP.md • API.md • MONITORING.md • TROUBLESHOOTING.md
Run Evaluation:
python eval/evaluate_rag.pyExpected Results:
- Faithfulness: 1.00/1.00
- Answer Relevancy: 0.97/1.00
- Context Precision: 1.00/1.00
- Overall Score: 0.99/1.00
ValidationException (Bedrock): Ensure your AWS region supports the selected model and that you have active model access in your Bedrock console.
IndexNotFound (Pinecone): Ensure the PINECONE_INDEX_NAME in your .env matches the index you created in the Pinecone dashboard.
ModuleNotFoundError: Ensure you have activated your virtual environment:
# On Windows:
.venv\Scripts\Activate.ps1
# On Linux/Mac:
source .venv/bin/activateNo working Bedrock models found: This is expected if you don't have an AWS account or Bedrock access. The system automatically falls back to Anthropic (Claude 3.5 Sonnet) or OpenAI (GPT-4o) as configured in your .env.
For debugging and monitoring RAG pipeline execution, you can enable LangChain tracing via LangSmith:
-
Sign up for LangSmith (free tier available)
-
Get your API key from LangSmith dashboard
-
Add to your
.env:LANGCHAIN_TRACING_V2=true LANGCHAIN_API_KEY=your-langsmith-api-key
-
Run your queries - traces will automatically be sent to LangSmith:
python test_query.py
The system will print: 🔍 LangChain tracing enabled - traces will be sent to LangSmith
Benefits:
- Monitor token usage and latency
- Debug RAG chain execution step-by-step
- Track LLM calls, embeddings, and retrieved context
- Visualize the complete prompt flow