Skip to content

Atharv279/RAGify-Finance

Repository files navigation

RAGify Finance

Python LangChain FAISS Gemini

RAG-based financial document Q&A system. Benchmarks Cohere vs Hugging Face embeddings on the FinanceBench dataset.


Overview

Compares Cohere and Hugging Face embedding models for financial document retrieval and question answering. Uses the FinanceBench dataset (real-world financial queries with ground truth answers) to evaluate which approach delivers better results.

Video Walkthrough

Architecture

graph TD
    A[FinanceBench Dataset] --> B[Document Chunking]
    B --> C1[Cohere Embeddings]
    B --> C2[HuggingFace Embeddings]
    C1 --> D[FAISS Vector Store]
    C2 --> D
    D --> E[Similarity Search]
    E --> F1[Google Gemini Generation]
    E --> F2[Cohere Generation]
    F1 --> G[Evaluation Metrics]
    F2 --> G
Loading

Results

Model Precision Recall Cosine Similarity
Cohere Higher Moderate Higher
HuggingFace Moderate Higher Moderate
  • Cohere performed better on finance-specific retrieval tasks
  • Hugging Face showed better recall across broader queries
  • Full results in cohere_vs_huggingface_results.csv

Tech Stack

Component Technology
Embeddings Cohere API, HuggingFace all-MiniLM-L6-v2
Vector Store FAISS
Generation Google Gemini, Cohere
Evaluation NLTK, scikit-learn (Precision, Recall, F1, Cosine Similarity)
Dataset FinanceBench (PatronusAI)

Setup

git clone https://github.com/Atharv279/RAGify-Finance.git
cd RAGify-Finance
pip install -r requirements.txt

Set API keys:

export COHERE_API_KEY="your_key"
export GOOGLE_API_KEY="your_key"

Run:

python main.py

Future Work

  • Add Voyage AI embeddings for comparison
  • Expand to additional finance-specific NLP models
  • Test with larger document corpora

About

Benchmarks Cohere vs HuggingFace embeddings for financial document Q&A using RAG

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages