V-Coin Oracle 🔮

The V-Coin Oracle is an AI/ML system designed for the Skill Economy app at VIT Vellore. It intelligently suggests fair and data-driven credit prices for student tasks based on four core metrics:

Complexity (1-10)
Supply (1-10)
Demand (1-10)
Urgency (1-10)

Infrastructure & Workflow

The V-Coin Oracle consists of a robust pipeline that handles data generation, model training (with hyperparameter tuning and explainability), and real-time inference via a FastAPI.

1. Data Generation (`create_dataset.py`)

Generates synthetic, realistic task data reflecting expected market conditions, including noise and baseline formulas. This creates the foundational vit_skills_data.csv.

2. Model Training & Explainability (`train_oracle.py`)

Trains a machine learning model on the generated data. It performs Hyperparameter Tuning to find the optimal configuration and uses SHAP to analyze feature importance, outputting the best model (oracle_model.pkl) and a visual summary (shap_summary.png).

3. API Inference (`main.py`)

A fast, asynchronous web server that loads the trained model into memory upon startup. It exposes a /predict-price endpoint to accept real-time task metrics and return a fair, calculated prediction.

Modules & Dependencies

The project relies on several key Python modules:

FastAPI / Uvicorn: Powers the high-performance, asynchronous REST API infrastructure for serving predictions.
Pydantic: Enforces strict data validation and typing for the API request/response models.
Scikit-learn (sklearn): The core machine learning library used for splitting data, building the RandomForestRegressor, hyperparameter tuning (RandomizedSearchCV), and evaluating performance metrics (MSE, RMSE, MAE, R2).
Pandas: Used for robust data manipulation, CSV I/O, and structuring the dataset as DataFrames.
NumPy: Facilitates efficient numerical operations, vectorization, and generating synthetic data with realistic noise distributions.
Joblib: Provides fast, lightweight serialization to save and load the trained Random Forest model (oracle_model.pkl).
SHAP: Generates game-theoretic explainability values (SHAP values) to understand exactly how each feature (e.g., complexity, urgency) influences the model's predictions.
Matplotlib: Used alongside SHAP to render and save the shap_summary.png feature importance plot.

Handling Outliers and Noise

The real world is messy! Real-time task posting data often contains noise (random variations in price negotiation) and outliers (extremes, like a highly urgent task priced incredibly low due to mistakes or incredibly high for various unrelated reasons).

We intentionally model realistic variations when generating our dataset using a normal distribution of variance.

Why Random Forest? 🌲

We utilize a Random Forest Regressor to predict prices because it handles 'outliers' and 'noise' remarkably well compared to simpler algorithms (like Linear Regression):

Robust to Outliers: Random Forests use decision trees, which segment and branch data based on feature thresholds (e.g., urgency > 5) rather than absolute magnitudes. Extreme outlier values are isolated in rare decision paths and do not skew the main decision boundaries or predictions significantly.
Reduces Variance (Noise): A Random Forest is an ensemble technique. It averages the predictions of hundreds of independent decision trees trained on different random subsets of data (Bagging). This averaging naturally cancels out random 'noise' and prevents the model from overfitting to random anomalies, ensuring stable and reliable pricing logic.

How to Run

Install Dependencies

pip install fastapi uvicorn scikit-learn pandas numpy joblib pydantic shap matplotlib

Generate Dataset
```
python create_dataset.py
```
Train the Model & Generate SHAP explanations
```
python train_oracle.py
```
(This step creates oracle_model.pkl and shap_summary.png)

Start the API Server

fastapi dev main.py
# or natively:
# python main.py

Test the Endpoint Send a POST request to /predict-price.

High Complexity (8) + Low Supply (2) + High Urgency (10) = High expected output!

curl -X POST "http://127.0.0.1:8000/predict-price" \
     -H "Content-Type: application/json" \
     -d '{"complexity": 8, "supply": 2, "demand": 9, "urgency": 10}'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

V-Coin Oracle 🔮

Infrastructure & Workflow

1. Data Generation (`create_dataset.py`)

2. Model Training & Explainability (`train_oracle.py`)

3. API Inference (`main.py`)

Modules & Dependencies

Handling Outliers and Noise

Why Random Forest? 🌲

How to Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
create_dataset.py		create_dataset.py
main.cpython-313.pyc		main.cpython-313.pyc
main.py		main.py
oracle_model.pkl		oracle_model.pkl
shap_summary.png		shap_summary.png
train_oracle.py		train_oracle.py
vit_skills_data.csv		vit_skills_data.csv

Folders and files

Latest commit

History

Repository files navigation

V-Coin Oracle 🔮

Infrastructure & Workflow

1. Data Generation (create_dataset.py)

2. Model Training & Explainability (train_oracle.py)

3. API Inference (main.py)

Modules & Dependencies

Handling Outliers and Noise

Why Random Forest? 🌲

How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Data Generation (`create_dataset.py`)

2. Model Training & Explainability (`train_oracle.py`)

3. API Inference (`main.py`)

Packages