The V-Coin Oracle is an AI/ML system designed for the Skill Economy app at VIT Vellore. It intelligently suggests fair and data-driven credit prices for student tasks based on four core metrics:
- Complexity (1-10)
- Supply (1-10)
- Demand (1-10)
- Urgency (1-10)
The V-Coin Oracle consists of a robust pipeline that handles data generation, model training (with hyperparameter tuning and explainability), and real-time inference via a FastAPI.
Generates synthetic, realistic task data reflecting expected market conditions, including noise and baseline formulas. This creates the foundational vit_skills_data.csv.
Trains a machine learning model on the generated data. It performs Hyperparameter Tuning to find the optimal configuration and uses SHAP to analyze feature importance, outputting the best model (oracle_model.pkl) and a visual summary (shap_summary.png).
A fast, asynchronous web server that loads the trained model into memory upon startup. It exposes a /predict-price endpoint to accept real-time task metrics and return a fair, calculated prediction.
The project relies on several key Python modules:
- FastAPI / Uvicorn: Powers the high-performance, asynchronous REST API infrastructure for serving predictions.
- Pydantic: Enforces strict data validation and typing for the API request/response models.
- Scikit-learn (
sklearn): The core machine learning library used for splitting data, building theRandomForestRegressor, hyperparameter tuning (RandomizedSearchCV), and evaluating performance metrics (MSE, RMSE, MAE, R2). - Pandas: Used for robust data manipulation, CSV I/O, and structuring the dataset as DataFrames.
- NumPy: Facilitates efficient numerical operations, vectorization, and generating synthetic data with realistic noise distributions.
- Joblib: Provides fast, lightweight serialization to save and load the trained Random Forest model (
oracle_model.pkl). - SHAP: Generates game-theoretic explainability values (SHAP values) to understand exactly how each feature (e.g., complexity, urgency) influences the model's predictions.
- Matplotlib: Used alongside SHAP to render and save the
shap_summary.pngfeature importance plot.
The real world is messy! Real-time task posting data often contains noise (random variations in price negotiation) and outliers (extremes, like a highly urgent task priced incredibly low due to mistakes or incredibly high for various unrelated reasons).
We intentionally model realistic variations when generating our dataset using a normal distribution of variance.
We utilize a Random Forest Regressor to predict prices because it handles 'outliers' and 'noise' remarkably well compared to simpler algorithms (like Linear Regression):
- Robust to Outliers: Random Forests use decision trees, which segment and branch data based on feature thresholds (e.g.,
urgency > 5) rather than absolute magnitudes. Extreme outlier values are isolated in rare decision paths and do not skew the main decision boundaries or predictions significantly. - Reduces Variance (Noise): A Random Forest is an ensemble technique. It averages the predictions of hundreds of independent decision trees trained on different random subsets of data (Bagging). This averaging naturally cancels out random 'noise' and prevents the model from overfitting to random anomalies, ensuring stable and reliable pricing logic.
-
Install Dependencies
pip install fastapi uvicorn scikit-learn pandas numpy joblib pydantic shap matplotlib
-
Generate Dataset
python create_dataset.py
-
Train the Model & Generate SHAP explanations
python train_oracle.py
(This step creates
oracle_model.pklandshap_summary.png) -
Start the API Server
fastapi dev main.py # or natively: # python main.py
-
Test the Endpoint Send a POST request to
/predict-price.High Complexity (8) + Low Supply (2) + High Urgency (10) = High expected output!
curl -X POST "http://127.0.0.1:8000/predict-price" \ -H "Content-Type: application/json" \ -d '{"complexity": 8, "supply": 2, "demand": 9, "urgency": 10}'