An AI-powered hybrid search and labeling assistant that fuses Elastic Cloud BM25 + kNN retrieval with Google Vertex AI Gemini 2.0 reasoning — built to revolutionize enterprise knowledge search and evaluation.
- Overview
- Features
- Architecture
- Tech Stack
- Installation
- Environment Variables
- Running Locally
- Deployment
- Usage
- Project Structure
- Evaluation Metrics
- Future Upgrades
- Contributors
- License
SearchSphere Agent is a full-stack AI search platform that integrates Elastic Cloud hybrid retrieval (BM25 + vector search) with Google Vertex AI Gemini 2.0 for contextual reasoning, evaluation, and dataset labeling.
It helps teams and enterprises find smarter, label faster, and evaluate efficiently — a complete foundation for AI-powered RAG systems.
- 🔍 Hybrid Search — Combines Elastic BM25 (lexical) + kNN (semantic) for deep understanding.
- 🤖 Gemini Reasoning — Uses Vertex AI Gemini-2.0-Flash for summaries and responses.
- 🧩 Label Assist — Create ground-truth JSONs interactively for model evaluation.
- 📊 Metrics Dashboard — Live precision@K and latency stats (p50/p95).
- 💬 Conversational Refinement — Natural chat-style interface for query reasoning.
- 🔐 Cloud Ready — Dockerized backend, deployed via Google Cloud Run + Netlify.
Frontend (Next.js 14, TypeScript, Tailwind)
│
▼
Next.js API routes (proxy)
│
▼
Backend (FastAPI)
├── Elastic Cloud (BM25 + kNN)
├── Vertex AI (Gemini + Embeddings)
├── Evaluation Engine
└── Label Assist Service
| Layer | Technology |
|---|---|
| Frontend | Next.js 14, React 18, Tailwind CSS, Recharts |
| Backend | FastAPI (Python 3.11), Elastic Cloud, Vertex AI |
| Deployment | Google Cloud Run (backend), Netlify (frontend) |
| Database | Elastic Cloud index (searchsphere_docs) |
| CI/CD | GitHub Actions, Docker |
git clone https://github.com/MrigankJaiswal-hub/SearchSphere-Agent.git
cd SearchSphere-Agentcd backend
python -m venv .venv
.venv\Scripts\activate # (Windows)
pip install -r requirements.txtcd ../web
npm installELASTIC_CLOUD_ID=your_elastic_cloud_id
ELASTIC_API_KEY=your_elastic_api_key
ELASTIC_INDEX=searchsphere_docs
VERTEX_LOCATION=us-central1
GCP_PROJECT_ID=searchsphere-ai
VERTEX_EMBED_MODEL=text-embedding-005
VERTEX_CHAT_MODEL=gemini-2.0-flash-001
ES_KNN_NUM_CANDIDATES=120
CORS_ORIGIN=*NEXT_PUBLIC_API_BASE=http://127.0.0.1:8080cd backend
uvicorn app:app --reload --port 8080cd web
npm run devVisit 👉 http://localhost:3000
gcloud run deploy searchsphere-backend \
--source . \
--region us-central1 \
--set-env-vars "CORS_ORIGIN=https://your-frontend-url.netlify.app"-
Import
/webdirectory from GitHub. -
Add Environment Variable:
NEXT_PUBLIC_API_BASE=https://<your-cloudrun-url> -
Deploy site.
-
Visit your deployed frontend.
-
Enter a natural language query (e.g., “What is hybrid search?”).
-
Observe semantic + lexical search fusion results.
-
View real-time precision and latency metrics.
-
Go to Label Assist, input a query, and export groundtruth.json.
-
Upload groundtruth.json in Run Evaluation to compute P@K.
| Folder/File | Purpose | Status |
|---|---|---|
/backend |
FastAPI backend | ✅ |
/web |
Next.js frontend | ✅ |
/docs |
Documentation (README, credits, architecture, eval_matrix.xlsx) | ✅ |
/docs/credits.txt |
Acknowledgments | ✅ |
/docs/evaluation_matrix.xlsx |
Metrics template | ✅ |
/docs/diagram.png |
Architecture diagram (export from draw.io / Lucidchart) | ✅ |
.github/workflows/ |
Optional CI/CD | Optional ✅ |
.env.example |
Example env vars | ✅ |
LICENSE |
MIT License | ✅ |
README.md |
Overview | ✅ |
/→ Main chat & hybrid search page/metrics→ Live latency & precision dashboard/label→ Label Assist tool for dataset generation
To test evaluation manually:
curl -X POST "$URL/api/eval/precision" \
-H "Content-Type: application/json" \
-d '{"query":"hybrid search","k":10}'SearchSphere-Agent/
├── backend/
│ ├── app.py
│ ├── routes/
│ ├── utils/
│ ├── tests/
│ └── requirements.txt
│
├── web/
│ ├── app/
│ ├── components/
│ ├── lib/
│ ├── public/
│ ├── package.json
│ └── tailwind.config.ts
│
├── scripts/
├── assets/
└── README.md
- Precision@K: Evaluates top-K relevance.
- Latency Tracking: p50/p95 in milliseconds.
- Label Assist: Exports
groundtruth.jsonfor retraining.
Example output:
p50: 730 ms | p95: 1100 ms | Precision@10: 0.86
- Multi-modal retrieval (text, images, audio, video).
- Auth & multi-tenant support (Firebase/Cognito).
- Feedback-driven fine-tuning of hybrid fusion weights.
- Enhanced real-time dashboards and analytics.
Mrigank Jaiswal B.Tech
🖥️ Built full-stack architecture, Elastic-Vertex integration, frontend UI, and deployment automation.
This project is licensed under the MIT License. © 2025 Mrigank Jaiswal