RAG-based financial document Q&A system. Benchmarks Cohere vs Hugging Face embeddings on the FinanceBench dataset.
Compares Cohere and Hugging Face embedding models for financial document retrieval and question answering. Uses the FinanceBench dataset (real-world financial queries with ground truth answers) to evaluate which approach delivers better results.
graph TD
A[FinanceBench Dataset] --> B[Document Chunking]
B --> C1[Cohere Embeddings]
B --> C2[HuggingFace Embeddings]
C1 --> D[FAISS Vector Store]
C2 --> D
D --> E[Similarity Search]
E --> F1[Google Gemini Generation]
E --> F2[Cohere Generation]
F1 --> G[Evaluation Metrics]
F2 --> G
| Model | Precision | Recall | Cosine Similarity |
|---|---|---|---|
| Cohere | Higher | Moderate | Higher |
| HuggingFace | Moderate | Higher | Moderate |
- Cohere performed better on finance-specific retrieval tasks
- Hugging Face showed better recall across broader queries
- Full results in
cohere_vs_huggingface_results.csv
| Component | Technology |
|---|---|
| Embeddings | Cohere API, HuggingFace all-MiniLM-L6-v2 |
| Vector Store | FAISS |
| Generation | Google Gemini, Cohere |
| Evaluation | NLTK, scikit-learn (Precision, Recall, F1, Cosine Similarity) |
| Dataset | FinanceBench (PatronusAI) |
git clone https://github.com/Atharv279/RAGify-Finance.git
cd RAGify-Finance
pip install -r requirements.txtSet API keys:
export COHERE_API_KEY="your_key"
export GOOGLE_API_KEY="your_key"Run:
python main.py- Add Voyage AI embeddings for comparison
- Expand to additional finance-specific NLP models
- Test with larger document corpora