Skip to content

RRonium/V-Coin-Oracle

Repository files navigation

V-Coin Oracle 🔮

The V-Coin Oracle is an AI/ML system designed for the Skill Economy app at VIT Vellore. It intelligently suggests fair and data-driven credit prices for student tasks based on four core metrics:

  1. Complexity (1-10)
  2. Supply (1-10)
  3. Demand (1-10)
  4. Urgency (1-10)

Infrastructure & Workflow

The V-Coin Oracle consists of a robust pipeline that handles data generation, model training (with hyperparameter tuning and explainability), and real-time inference via a FastAPI.

1. Data Generation (create_dataset.py)

Generates synthetic, realistic task data reflecting expected market conditions, including noise and baseline formulas. This creates the foundational vit_skills_data.csv.

2. Model Training & Explainability (train_oracle.py)

Trains a machine learning model on the generated data. It performs Hyperparameter Tuning to find the optimal configuration and uses SHAP to analyze feature importance, outputting the best model (oracle_model.pkl) and a visual summary (shap_summary.png).

3. API Inference (main.py)

A fast, asynchronous web server that loads the trained model into memory upon startup. It exposes a /predict-price endpoint to accept real-time task metrics and return a fair, calculated prediction.

Modules & Dependencies

The project relies on several key Python modules:

  • FastAPI / Uvicorn: Powers the high-performance, asynchronous REST API infrastructure for serving predictions.
  • Pydantic: Enforces strict data validation and typing for the API request/response models.
  • Scikit-learn (sklearn): The core machine learning library used for splitting data, building the RandomForestRegressor, hyperparameter tuning (RandomizedSearchCV), and evaluating performance metrics (MSE, RMSE, MAE, R2).
  • Pandas: Used for robust data manipulation, CSV I/O, and structuring the dataset as DataFrames.
  • NumPy: Facilitates efficient numerical operations, vectorization, and generating synthetic data with realistic noise distributions.
  • Joblib: Provides fast, lightweight serialization to save and load the trained Random Forest model (oracle_model.pkl).
  • SHAP: Generates game-theoretic explainability values (SHAP values) to understand exactly how each feature (e.g., complexity, urgency) influences the model's predictions.
  • Matplotlib: Used alongside SHAP to render and save the shap_summary.png feature importance plot.

Handling Outliers and Noise

The real world is messy! Real-time task posting data often contains noise (random variations in price negotiation) and outliers (extremes, like a highly urgent task priced incredibly low due to mistakes or incredibly high for various unrelated reasons).

We intentionally model realistic variations when generating our dataset using a normal distribution of variance.

Why Random Forest? 🌲

We utilize a Random Forest Regressor to predict prices because it handles 'outliers' and 'noise' remarkably well compared to simpler algorithms (like Linear Regression):

  • Robust to Outliers: Random Forests use decision trees, which segment and branch data based on feature thresholds (e.g., urgency > 5) rather than absolute magnitudes. Extreme outlier values are isolated in rare decision paths and do not skew the main decision boundaries or predictions significantly.
  • Reduces Variance (Noise): A Random Forest is an ensemble technique. It averages the predictions of hundreds of independent decision trees trained on different random subsets of data (Bagging). This averaging naturally cancels out random 'noise' and prevents the model from overfitting to random anomalies, ensuring stable and reliable pricing logic.

How to Run

  1. Install Dependencies

    pip install fastapi uvicorn scikit-learn pandas numpy joblib pydantic shap matplotlib
  2. Generate Dataset

    python create_dataset.py
  3. Train the Model & Generate SHAP explanations

    python train_oracle.py

    (This step creates oracle_model.pkl and shap_summary.png)

  4. Start the API Server

    fastapi dev main.py
    # or natively:
    # python main.py
  5. Test the Endpoint Send a POST request to /predict-price.

    High Complexity (8) + Low Supply (2) + High Urgency (10) = High expected output!

    curl -X POST "http://127.0.0.1:8000/predict-price" \
         -H "Content-Type: application/json" \
         -d '{"complexity": 8, "supply": 2, "demand": 9, "urgency": 10}'

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages