It is a demonstration project of Retrieval Augmented Generation (RAG) where I used Sentence Transformers to convert data into embeddings. Used Vector DB to store and retrieve vectors as per user query and make answers context-aware. Used LLM to synthesize answers per query.
This repository contains research Jupyter Notebooks, RAG pipeline file, and a Streamlit web application. The project is specifically customized for Google financial data (2023 and 2024) and Google stock price analysis.
- Data Collection: Gathered Google stock price data and financial reports for 2023 and 2024.
- Preprocessing: Cleaned and structured the data for efficient retrieval and downstream tasks.
- Step-by-step RAG Implementation: Developed and tested the RAG pipeline in Jupyter Notebooks for rapid experimentation.
- Modular Design: Built retriever and planner modules with clear interfaces for easy scalability.
- Evaluation: Assessed the quality of the retriever and planner on Google financial queries.
- Interactive UI: Built a user-friendly Streamlit web app for real-time querying and visualization.
- Logs: Displayed logs and intermediate results for transparency and debugging.
- Python 3.10 or higher
- Jupyter Notebook or JupyterLab environment
- Git (to clone the repository)
git clone https://github.com/dev-nikhil02/Agentic-RAG-demo
cd Agentic-RAG-demopython -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activateSet your Cohere API key in a .env file:
COHERE_API_KEY=your_api_key
You can install the required Python packages by running:
pip install -r requirements.txt- Launch Jupyter Notebook:
jupyter notebook
- Open the main notebook file (e.g.,
Agentic-RAG-demo.ipynb). - Note - It is only for research purpose
Use this command in your terminal after installing the dependencies. This will run the app in your browser:
streamlit run app.pyStreamlit UI Features:
- Query input for financial questions about Google
- Display of generated answers with retrieved context
- Logs and intermediate results for transparency
The project architecture is designed around two main components:
-
Retriever Module:
- Searches and retrieves relevant documents or passages from the Google financial dataset based on input queries.
- Uses vector similarity search (e.g., ChromaDB) for efficient retrieval.
-
Generator Module:
- Uses retrieved documents as context to generate informed responses or summaries.
- Powered by generative transformer models (Cohere).
-
Streamlit UI Layer:
- Provides an interactive interface for users to input queries and view results.
- No need to select data source as the planner module is implemented.
Customized RAG Pipeline for Google Financial Data:
- Data Source: Google financial reports (2023, 2024), stock price CSVs.
- Retrieval: Semantic search over financial documents and time-series data.
- Generation: Summarization and Q&A tailored to financial context.
- Stored Logs for all of this.
Example of generated text with retrieved context for Google financial data.
Streamlit app interface for querying Google financial data and visualizing stock prices.
Contributions, suggestions, and bug reports are welcome! Please open an issue or submit pull requests to improve the project.




