HR RAG System is a Retrieval-Augmented Generation (RAG) application for querying HR employee data using natural language. The system transforms structured employee data into searchable documents, retrieves relevant context using embeddings, and generates answers using an LLM.
HR Data → Document Processing → Embeddings (OpenAI) → Chroma Vector Store → Retrieval → LLM → FastAPI → Response
- Converts HR employee data into structured documents
- Generates embeddings using OpenAI
- Stores vectors in Chroma for efficient retrieval
- Uses LangChain for retrieval and response generation
- Exposes a FastAPI endpoint for querying the system
- Returns context-aware, generated answers from HR data
- Python
- OpenAI
- LangChain
- Chroma (Vector Database)
- FastAPI
- Summarize an employee profile
- Query HR data using natural language
- Retrieve insights from employee records
- Generate answers grounded in internal data
POST /query
{
"question": "Give me a summary of employee 1"
}{
"answer": "..."
}This project is structured as an API-based AI system using FastAPI. It can be deployed to cloud platforms, but is currently provided as a local or development environment due to dataset size and deployment constraints.
hr-rag-system/
├── hr_employee_rag.py # RAG pipeline and core logic
├── hr_rag_api.py # FastAPI application
├── hr_data/
│ └── employees/ # Processed employee documents
├── HR-Employee-Attrition.csv # Source dataset
├── requirements.txt # Dependencies
└── README.md
Clone the repository:
git clone https://github.com/moatazsaad/hr-rag-system.git
cd hr-rag-systemCreate a virtual environment:
python -m venv envActivate environment:
Windows:
env\Scripts\activatemacOS/Linux:
source env/bin/activateInstall dependencies:
pip install -r requirements.txtSet your OpenAI API key in a .env file:
OPENAI_API_KEY=your_api_key_hereRun the API:
uvicorn hr_rag_api:app --reloadcurl -X POST "http://127.0.0.1:8000/query" \
-H "Content-Type: application/json" \
-d '{"question": "Give me a summary of employee 1"}'- End-to-end RAG pipeline from structured HR data to generated answers
- Combines embeddings, vector database, and LLMs
- Exposes a production-style API using FastAPI
- Demonstrates applied GenAI and retrieval-based systems