Project that routes multiple-choice and open questions to the most cost-efficient LLM depending on predicted category and heuristic difficulty. It includes a FastAPI backend (api.py), a Vite + React frontend (frontend/), and an on-disk sentence embedder plus an SVM classifier for category prediction (embedder/ and a clasificador_svm.pkl file expected at repository root).
Authors: Jorge Martínez, Jose María Martín, Inés García and Mario Gutiérrez.
Quick summary: The backend computes a difficulty score from the input question (and optional answer choices), predicts a category using sentence embeddings + an SVM classifier, then selects the cheapest model that meets a minimum accuracy threshold (based on tasa_aciertos.csv). The selected LLM is queried and its final answer returned to the frontend.
Demo
- Watch a short demo video of the project on YouTube: Demo video
Repository layout
api.py: FastAPI backend entrypoint that processes questions and routes them to LLMs.CONFIG.json: Local configuration file used byapi.py(contains API keys).requirements.txt: Python dependencies for the backend.tasa_aciertos.csv: Model performance table used for routing.embedder/: Folder containing a Sentence Transformer model and tokenizer assets.frontend/: Vite + React frontend application.
Features & behavior
- Difficulty estimation:
heuristic_difficultyuses question length, Flesch reading ease, and answer-option similarity (text or numeric) to compute a score, mapped to labels (Very low,Low,Medium,High,Very high). - Category prediction: A SentenceTransformer loaded from
embedder/encodes the question and an SVM (clasificador_svm.pkl) predicts a category. - Model routing: Uses
tasa_aciertos.csvto choose the least expensive model that meets a configurable accuracyTHRESHOLD(default 0.52). If no data exists, a safe fallback model is chosen. - Multiple LLM integrations: mapping to providers (Groq /
llama-3.1-8b-instant) and Azure/other OpenAI endpoints for other models.
Getting started (backend)
- Create and activate a Python virtual environment (recommended):
python -m venv .venv
.venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Provide API keys and configuration.
- The app reads
CONFIG.jsonfor keys (groq_key,openai_key) by default. For security, prefer setting environment variables and modifyingapi.pyto read them instead of storing secrets inCONFIG.json. - If
CONFIG.jsonalready contains keys in your local copy, rotate them and remove secrets from the repo before sharing.
- Required artifacts (place in repository root or adjust paths in
api.py):
clasificador_svm.pkl: trained SVM model used byget_category.embedder/: local SentenceTransformer files (already present). The backend usesSentenceTransformer('embedder').
- Run the backend (development):
uvicorn api:app --host 0.0.0.0 --port 8000 --reloadThe backend listens on port 8000 by default. The FastAPI app includes CORS settings that allow http://localhost:5173 (the Vite dev server).
API Usage
- GET
/— health check, returns a tiny JSON message. - POST
/question— process a question. Request body (JSON):
{
"question": "What is photosynthesis?",
"answers": ["A", "B", "C"] // optional
}Response (JSON) contains at least:
model: the vendor model name used for the query (e.g.llama-3.1-8b-instant,gpt-4o-mini,o4-mini).category: category predicted by the classifier.difficulty: difficulty label (Very low/Low/Medium/High/Very high).output: the string returned by the selected LLM (final answer only).
Example curl request:
curl -X POST http://localhost:8000/question -H "Content-Type: application/json" -d "{\"question\":\"What is photosynthesis?\",\"answers\":null}"Notes on internals
- Difficulty calculation: uses
textstat.flesch_reading_easeand length. If answer choices are provided, similarity between options is considered — as text similarity or a scaled numerical difference when all options are numeric. - Category prediction:
embedder.encode([input])produces a vector passed toclf.predict(emb)whereclfis loaded viajoblib.load('clasificador_svm.pkl'). - Routing:
route_questionlooks up the row intasa_aciertos.csvfor the(category, difficulty)pair and iteratesmodel_order = ['Llama-3.1-8B', 'GPT-4o-mini', 'o4-mini']to return the least expensive model that satisfies theTHRESHOLD. The backend contains a model mapping to provider names and special-case request flows for Groq/llama-3.1-8b-instantand Azure OpenAI-like endpoints.
Frontend (development)
- Move into the
frontend/folder:
cd frontend- Install dependencies (requires Node.js + npm):
npm install- Run dev server:
npm run dev- Open
http://localhost:5173in the browser. The frontend communicates with the backend athttp://localhost:8000by default (CORS allowed inapi.py). If you change backend port/origin, update CORS and frontend API base URL accordingly.
Files and artifacts
tasa_aciertos.csv: required for routing decisions. Do not edit structure unless you understand howroute_questionmatchesCATEGORYanddifficulty_labels.embedder/: contains SentenceTransformer assets. The current code callsSentenceTransformer('embedder'), so the folder must be readable by the Python process.clasificador_svm.pkl: SVM classifier pickle file — required. If missing, category prediction will fail.
Troubleshooting & common errors
- Import errors for
sentence_transformersor missing weights: Ensureembedder/is complete or change theSentenceTransformerinstantiation to download a model from HuggingFace if desired. joblib.load('clasificador_svm.pkl')fails: place the trained pickle file at repository root or change the path inapi.py.requeststo external APIs return auth errors: verifyCONFIG.jsonor environment variables and that the keys are valid and not expired.- CORS failures: ensure the origin used by the frontend is included in
originsinsideapi.pyor use a wildcard carefully for development.
Security
- Never commit real API keys to the repo.
CONFIG.jsonis convenient for local dev but insecure for shared repos — prefer reading keys from environment variables, or use a secrets manager.
License & Contributing
- License: This project is available under the MIT License — see the top-level
LICENSEfile for full terms. - Attribution: Include the
LICENSEfile in redistributed copies and retain the copyright notice. - Contributions: If you contribute, please open a pull request and include tests where applicable. Avoid committing credentials or large model artifacts directly into the repo; instead provide instructions for obtaining them.