This project is an end-to-end machine learning application designed to predict the outcomes of professional boxing matches. It features a complete data pipeline, from data collection and feature engineering to model training, evaluation, and explainability.
To make the model accessible, it's deployed as an interactive web application using Streamlit. The app uses a hybrid data system, relying on a local dataset for core stats while performing live lookups on BoxRec and Wikipedia to ensure predictions are based on the most current fighter information available.
Try the interactive predictor yourself: boxing-match-predictor.streamlit.app
-
Ensemble Model Predictions: Utilises a powerful ensemble of XGBoost, Random Forest, and Logistic Regression models to achieve ~87% prediction accuracy on the test set.
-
Live Data Integration: Fetches up-to-date fighter stats (age, wins, losses) in real-time by scraping BoxRec and Wikipedia, ensuring predictions are always current.
-
Explainable AI: Generates SHAP feature contribution charts to explain why a prediction was made, providing transparency and insight into the model's decision-making process.
-
Fallback System: If a fighter isn't in the local dataset, the app automatically scrapes their data and imputes any missing stats using dataset averages, allowing it to make a reasonable prediction for almost any professional boxer.
-
Interactive UI: A clean and user-friendly interface built with Streamlit that allows anyone to easily input two fighters and get an instant prediction.
-
Data Science & Machine Learning:
Python,Pandas,Scikit-learn,XGBoost,SHAP,Imbalanced-learn -
Web Application & Scraping:
Streamlit,Selenium,Beautiful Soup,Requests,Wikipedia -
Version Control:
Git,Git LFS(for handling large model files)
Follow these steps to set up and run the project on your local machine.
-
Prerequisites
Ensure you have Python 3.9 or later and Git installed on your system.
-
Clone the Repository
Open your terminal, navigate to your desired directory, and clone the repository.
git clone [https://github.com/Rokuu010/Boxing-Match-Predictor.git] cd Boxing-Match-Predictor -
Set Up a Virtual Environment
It's highly recommended to use a virtual environment to manage project dependencies.
# Create the virtual environment python -m venv venv # Activate the environment # On Windows: venv\Scripts\activate # On macOS/Linux: source venv/bin/activate
-
Install Dependencies
Install all the required libraries from the
requirements.txtfile.pip install -r requirements.txt
(Note: Selenium will automatically download the correct Chrome driver for your browser.)
-
Train the Model
Run the training script. This will process the data and generate the machine learning models.
python train.py
This will create the necessary model files and save them in the
models/directory. -
Run the Web App
Finally, launch the Streamlit application.
streamlit run app.py
Your browser should automatically open with the application running locally.