Skip to content

onkaryadav2541/DocuChat_AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 DocuChat AI

A Hybrid RAG (Retrieval-Augmented Generation) Application that enables secure, cost-effective chat with your PDF documents.

Python Streamlit LangChain Gemini


📖 Overview

DocuChat AI allows users to upload multiple PDF documents and ask questions in natural language. The system provides accurate, context-aware answers by referencing the specific content of your files.

Unlike standard wrappers, this project utilizes a Hybrid Architecture for maximum efficiency:

  1. Local "Brain" (Privacy & Cost): Uses HuggingFace embeddings running locally on your CPU. This ensures your data structure remains private and you pay $0 in embedding costs.
  2. Cloud "Genius" (Speed & Intellect): Uses Google Gemini Flash for high-speed answer generation, utilizing smart routing to find the most stable model available in your region.

🚀 Key Features

  • Zero-Cost Embeddings: Powered by sentence-transformers/all-MiniLM-L6-v2 (Running locally).
  • Smart Model Routing: Automatically detects and switches to the best available Google Gemini model (gemini-flash-latest) to bypass regional restrictions.
  • Persistent Memory: Your "Brain" (./chroma_db) is saved to disk, so you don't have to re-process documents every time you restart.
  • Multi-File Ingestion: Upload and process multiple PDFs simultaneously.
  • Rate-Limit Protection: Optimized logic to prevent API crashes on the Free Tier.

🛠️ Tech Stack

  • Frontend: Streamlit
  • Orchestration: LangChain (Modern LCEL Architecture)
  • Vector Database: ChromaDB
  • LLM: Google Gemini 1.5 Flash / Pro (via langchain-google-genai)
  • Embeddings: HuggingFace (all-MiniLM-L6-v2)

⚙️ Installation

1. Clone the Repository

git clone [https://github.com/YOUR_USERNAME/DocuChat_AI.git](https://github.com/YOUR_USERNAME/DocuChat_AI.git)
cd DocuChat_AI

2. Create a Virtual Environment

# Windows
python -m venv .venv
.venv\Scripts\activate

# Mac/Linux
python3 -m venv .venv
source .venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Setup API Keys Create a .env file in the root directory and add your key:

GOOGLE_API_KEY=your_actual_api_key_here

🏃‍♂️ Usage

1. Run the Application

streamlit run app.py

2. Build the Knowledge Base

  • Open the sidebar on the left.
  • Upload your PDF documents.
  • Click "Build Brain 🧠".
  • Wait for the "✅ Knowledge Base Built!" message.

3. Chat

  • Ask questions like: "What is the total invoice amount?" or "Summarize the limitations in this contract."
  • The AI will answer based only on the documents provided.

👨‍💻 Author

Developed by Onkar Yadav

  • Built with Python & LangChain.

About

Private RAG System for Enterprise Document Analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages