This project contains a simple, step-by-step notebook implementation for the assignment requirements using the MAGIC dataset.
src\assignment2_classification.ipynb- main notebookdata\raw\magic04.data- raw dataset inputdata\processed\processed_balanced.csv- balanced datasetdata\processed\train.csv- training split (70%)data\processed\test.csv- testing split (30%)
- Python 3.10 or newer
- pandas
- matplotlib
- scikit-learn
- jupyter
Install:
pip install pandas matplotlib scikit-learn jupyter- Put the raw file in
data\raw\magic04.data. - Open
src\assignment2_classification.ipynb. - Run all cells in order.
- Load the raw dataset with explicit column names.
- Balance classes by downsampling
gto matchh. - Split data into stratified train/test (70/30).
- Tune
n_estimatorsfor AdaBoost and Random Forest using cross-validation. - Train and evaluate:
- Decision Tree
- Naive Bayes
- AdaBoost
- Random Forest
- Report accuracy, precision, recall, F1-score, and confusion matrix for each model.
After running the notebook, processed CSV files are saved to data\processed\.