Comprehensive benchmark comparing Quantum-Enhanced RAG vs Classical RAG for agricultural information retrieval using real-world data from 50+ trusted sources.
Quantum-Rag-Benchmark-agri/
├── web_crawler.py # Scrapes 50+ agricultural sources
├── quantum_rag.py # Quantum-enhanced RAG system
├── classical_rag.py # Classical RAG baseline
├── compare_rag_results.py # Automated comparison script
├── run_pipeline.py # Master pipeline runner
├── requirements.txt # Python dependencies
├── setup_crawler.sh # Crawler setup script
│
├── agricultural_data_complete/ # Web crawler output
│ ├── txt/ # Clean TXT files (50+ sources)
│ ├── json/ # Metadata files
│ └── logs/ # Scraping statistics
│
├── old/ # Original implementation
│ └── src/
│ ├── quantum_embeddings/ # Quantum feature maps
│ ├── quantum_rag.py # Original quantum RAG
│ └── baseline_rag.py # Original baseline
│
└── *.csv, *.json # Results and logs
- 50+ Trusted Sources: FAO, USDA, World Bank, CGIAR, research institutions
- Clean TXT Output: Perfect for RAG systems
- Organized Storage: Separate folders for txt, json, logs
- Automatic Merging: Creates unified corpus file
- Quantum Feature Maps: Angle, Amplitude, IQP embeddings
- PennyLane & Qiskit: Multiple quantum backends
- Hybrid Embeddings: Combines classical + quantum features
- Configurable Qubits: 4-16 qubits supported
- MiniLM Embeddings: Fast, 384-dimensional
- Qdrant Vector DB: In-memory for speed
- T5 Generation: Local answer generation
- Gemini API Support: Optional cloud LLM
- Automated Benchmarking: 10 test queries
- Multiple Metrics: Speed, similarity, diversity, overlap
- Statistical Analysis: Aggregate results
- Export Formats: JSON and CSV
pip install -r requirements.txt
playwright install # For web crawlerpython run_pipeline.pyThis will:
- Check dependencies
- Run web crawler (if no data exists)
- Run automated comparison
- Generate results
# Step 1: Collect data (3-5 minutes)
python web_crawler.py
# Step 2: Run comparison
python compare_rag_results.py
# Step 3: Try interactive systems
python quantum_rag.py # Quantum-enhanced
python classical_rag.py # Classical baseline# Full scraping (50+ sources)
python web_crawler.py
# Quick sample (10 sources for testing)
# Edit web_crawler.py and uncomment:
# asyncio.run(scrape_quick_sample())python quantum_rag.pyOptions:
- Embedding Type: angle, amplitude, classical
- Qubits: 4-16 (default: 8)
- Interactive Mode: Ask questions in real-time
python classical_rag.pyFeatures:
- Fast classical embeddings
- Same interface as Quantum RAG
- Logs to CSV for comparison
python compare_rag_results.pyOutputs:
rag_comparison_results.json- Detailed resultsrag_comparison_results.csv- Spreadsheet format
The comparison evaluates:
-
Retrieval Speed
- Average time per query
- Speed ratio (quantum vs classical)
-
Retrieval Quality
- Average similarity scores
- Top-k overlap between systems
-
Source Diversity
- Variety of sources in results
- Coverage across corpus
-
Statistical Significance
- Aggregate metrics
- Per-query analysis
In quantum_rag.py:
# Embedding types
- angle: Simple rotation-based (fast)
- amplitude: Dense state preparation (expressive)
- iqp: Instantaneous Quantum Polynomial (complex)
# Qubits
n_qubits = 8 # 4-16 recommendedIn classical_rag.py:
# Embedding model
embedder = SentenceTransformer("all-MiniLM-L6-v2")
# Chunk settings
chunk_size = 500 # words
overlap = 50 # wordsIn web_crawler.py:
# Delay between requests (be polite!)
delay = 3 # seconds
# Retry attempts
max_retries = 3The web crawler collects from:
International Organizations (10)
- FAO (Food and Agriculture Organization)
- World Bank Agriculture
- CGIAR Research
US Government (8)
- USDA Farming, Crops, Livestock
- Economic Research Service
- National Agricultural Statistics
Other Governments (5)
- UK DEFRA
- EU Agriculture
- Australia, India
Research Institutions (10+)
- CIMMYT, IRRI, ICRISAT
- IFPRI, CSIRO
Other (7)
- AGRIS Database
- Precision Agriculture
- Sustainable Agriculture
- qdrant-client >= 1.7.0
- sentence-transformers >= 2.2.2
- transformers >= 4.35.0
- crawl4ai >= 0.2.0
- pennylane >= 0.33.0
- qiskit >= 0.45.0
- numpy, pandas, tqdm
- google-generativeai (for Gemini API)
- python-dotenv (for .env support)
- CPU: Any modern processor
- RAM: 8GB+ recommended
- Storage: 1GB for data
- GPU: Not required (CPU mode)
For better answer generation:
-
Get API key from https://makersuite.google.com/app/apikey
-
Create
.envfile:
GEMINI_API_KEY=your-api-key-here- Run any RAG system - it will auto-detect Gemini
pip install -r requirements.txt
playwright installpython web_crawler.py
# Wait 3-5 minutes for completionpip install pennylane qiskit
# Or use classical mode- Check internet connection
- Some sites may block automated access
- URLs may have changed
- Check logs in
agricultural_data_complete/logs/
- Use Quick Sample First: Test with 10 sources before full scraping
- Classical for Speed: Use classical RAG for fast prototyping
- Lower Qubits: Start with 4-6 qubits for faster quantum processing
- Cache Models: Models are cached after first download
- Data Collection: 3-5 minutes (web crawler)
- Indexing: 2-3 minutes (first time)
- Comparison: 1-2 minutes (10 queries)
- Interactive Use: Real-time queries
Total: ~10 minutes for complete pipeline
{
"timestamp": "2025-11-03T...",
"num_queries": 10,
"queries": [...],
"aggregates": {
"classical": {...},
"quantum": {...},
"comparison": {...}
}
}Columns: query, classical_time, classical_similarity, quantum_time, quantum_similarity, overlap, speedup, score_improvement_pct
Contributions welcome! Areas:
- Additional data sources
- New quantum feature maps
- Evaluation metrics
- Visualization tools
See LICENSE file in repository
If you use this benchmark in research:
@software{quantum_rag_agri_2025,
title={Quantum RAG Benchmark - Agriculture},
author={...},
year={2025},
url={https://github.com/abeer555/Quantum-Rag-Benchmark-agri}
}
- Issues: GitHub Issues
- Docs: See WEB_CRAWLER_README.md for crawler details
- Contact: Repository owner
Built with: PennyLane, Qiskit, Qdrant, Sentence Transformers, Crawl4AI