Live API: https://rag.datawebify.com/api/v1/health
Interactive Docs: https://rag.datawebify.com/docs
Portfolio: datawebify.com/projects/agai4-rag-knowledge-agent
Part of: Agentic AI Portfolio — Project 4 of 50
Enterprise teams waste thousands of hours per month searching through PDFs, policy documents, manuals, and knowledge bases to answer repetitive questions. A typical support or operations team of 3 agents spends 60-70% of their time on lookups that should be instant.
Cost of manual knowledge lookup:
- 3 support agents x $25/hr x 160 hrs/month = $12,000/month
- Average query resolution time: 4-6 minutes
- Error rate from outdated documents: 15-20%
A production-ready RAG (Retrieval Augmented Generation) system that ingests your company documents and answers any question in under 3 seconds — with source citations, conversation memory, and a REST API that plugs into any existing SaaS product.
After deployment:
- Query resolution time: under 3 seconds (vs 4-6 minutes)
- Agent hours saved: 400+ hrs/month
- Cost per query: $0.002 (vs $1.67 human cost)
- Availability: 24/7 with zero agent involvement
| Metric | Before | After | Change |
|---|---|---|---|
| Query resolution time | 4-6 min | under 3 sec | 99% faster |
| Monthly agent cost | $12,000 | $180 (AI cost) | 98.5% savings |
| Document coverage | Manual search | 10,000+ docs instant | 100x scale |
| Availability | Business hours | 24/7 | Always on |
| Accuracy | 80% (human error) | 94%+ (cited sources) | 18% improvement |
┌─────────────────────────────────────────────┐
│ Ingestion Pipeline │
│ (PDF / DOCX / URL → Chunks → Embeds) │
└──────────────────┬──────────────────────────┘
│
▼
┌─────────────────┐
│ Vector Store │
│ (Pinecone) │
└────────┬────────┘
│
┌──────────▼──────────┐
│ RAG Query Agent │
│ (Semantic Search + │
│ GPT-4o-mini) │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ FastAPI Backend │
│ (Chat + REST API) │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Metrics Layer │
│ (Latency, Accuracy, │
│ Query Volume) │
└─────────────────────┘
| Layer | Technology |
|---|---|
| LLM | OpenAI GPT-4o-mini |
| Embeddings | OpenAI text-embedding-3-small (1536d) |
| Vector Database | Pinecone |
| RAG Framework | LangChain |
| Document Loaders | PyMuPDF, python-docx, BeautifulSoup |
| API Layer | FastAPI + Uvicorn |
| Metadata + Logs | Supabase (PostgreSQL) |
| Deployment | Docker + Railway |
| Language | Python 3.12 |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/health | System health check |
| POST | /api/v1/ingest/file | Upload PDF, DOCX, or TXT |
| POST | /api/v1/ingest/url | Ingest content from URL |
| POST | /api/v1/chat | Ask a question, get cited answer |
| POST | /api/v1/session/new | Start a conversation session |
| GET | /api/v1/session/{id}/history | Retrieve chat history |
| DELETE | /api/v1/session/{id} | End a session |
| GET | /api/v1/metrics | System performance metrics |
Full interactive documentation: https://rag.datawebify.com/docs
Multi-format ingestion Ingest PDF, DOCX, TXT files and any public URL. Automatic chunking with overlap ensures no context is lost at document boundaries.
Semantic search Questions are matched by meaning, not keywords. "What is our cancellation policy?" finds the right clause even if the document says "termination procedure."
Source citations Every answer includes the source document and relevance score. No hallucinations — the model only answers from your documents.
Conversation memory Multi-turn sessions allow follow-up questions. Users can ask "tell me more about that" without repeating context.
Metadata filtering Restrict search to specific document categories, source types, or filenames. Query only HR documents, only legal contracts, etc.
Metrics tracking Every query is logged with latency, retrieval score, and session ID. Business reporting shows queries handled, hours saved, cost per query.
git clone https://github.com/umair801/rag-knowledge-agent
cd rag-knowledge-agentpython -m venv venv
venv\Scripts\activate
pip install -r requirements.txtcp .env.example .env
# Add your API keys to .envuvicorn main:app --reload --port 8000curl -X POST https://rag.datawebify.com/api/v1/ingest/url \
-H "Content-Type: application/json" \
-d '{"url": "https://yoursite.com/docs", "category": "support"}'curl -X POST https://rag.datawebify.com/api/v1/chat \
-H "Content-Type: application/json" \
-d '{"query": "What is your refund policy?", "top_k": 5}'rag-knowledge-agent/
├── app/
│ ├── ingestion/
│ │ ├── loaders.py # PDF, DOCX, URL, TXT loaders
│ │ ├── chunker.py # Token-aware text chunking
│ │ ├── embedder.py # OpenAI embeddings + Pinecone storage
│ │ └── pipeline.py # Orchestration: load → chunk → embed → store
│ ├── retrieval/
│ │ ├── retriever.py # Semantic search via Pinecone
│ │ ├── generator.py # GPT-4o-mini answer generation
│ │ ├── memory.py # Multi-turn conversation sessions
│ │ └── rag_agent.py # Full pipeline orchestrator
│ ├── api/
│ │ ├── routes.py # All FastAPI endpoints
│ │ └── app.py # App factory + middleware
│ ├── metrics/
│ │ ├── tracker.py # Query logging to Supabase
│ │ └── reporter.py # Business metrics reporter
│ └── models/
│ └── schemas.py # Pydantic data models
├── Dockerfile
├── railway.toml
├── requirements.txt
├── main.py
└── README.md
OPENAI_API_KEY= # OpenAI API key
PINECONE_API_KEY= # Pinecone API key
PINECONE_INDEX_NAME= # Pinecone index name
PINECONE_ENVIRONMENT= # Pinecone region
SUPABASE_URL= # Supabase project URL
SUPABASE_KEY= # Supabase anon key
APP_ENV= # development or production
LOG_LEVEL= # INFO or DEBUG- SaaS companies — Answer customer questions from help docs instantly
- Legal firms — Query contracts and case files without manual search
- Healthcare — Search clinical guidelines and compliance documents
- E-commerce — Product catalog and policy Q&A at scale
- Enterprise IT — Internal knowledge base for support teams
Building a RAG knowledge base for your organization?
Muhammad Umair
Agentic AI Specialist and Enterprise Consultant
datawebify.com | github.com/umair801
AgAI_4 of 50 — Enterprise Agentic AI Portfolio