A machine learning project that predicts weather type (Rainy, Sunny, Cloudy, Snowy) from meteorological features using Support Vector Machine (SVM) classifiers.
Source: Weather Type Classification – Kaggle (Nikhil Narayan)
File: weather_classification_data.csv
Size: 13,200 rows × 11 columns
| Feature | Type | Description |
|---|---|---|
temperature |
int | Temperature in °C |
humidity |
int | Humidity percentage |
wind_speed |
float | Wind speed in km/h |
precipitation (%) |
int | Precipitation percentage |
cloud_cover |
str | Cloud cover description |
atmospheric_pressure |
float | Atmospheric pressure in hPa |
uv_index |
int | UV index |
season |
str | Season of recording |
visibility (km) |
float | Visibility in kilometres |
location |
str | Type of location |
weather_type |
str | Target — Rainy / Sunny / Cloudy / Snowy |
weather_classification_data.csv # Dataset
weather_classification.ipynb # Main notebook
README.md
pip install pandas scikit-learn seaborn matplotlibPython 3.8+ recommended.
The notebook is organised into six tasks:
- Load the CSV into a pandas DataFrame
- Check shape, missing values, and data types
- Visualise key features:
season→ pie charttemperature,humidity,wind_speed→ histogramsprecipitation (%)→ box plot
- One-hot encode categorical columns:
cloud_cover,location,season - StandardScaler applied to all seven numerical features
- 70/30 train-test split (
random_state=42) - Train
SVC(kernel='linear') - Evaluate with accuracy score, classification report, and confusion matrix
- Train
SVC(kernel='rbf')on the same split - Compare accuracy and evaluation metrics against the linear kernel
- Train a custom RBF SVM with
C=0.5,gamma='auto',degree=2 - Observe the effect on accuracy and per-class metrics
- Build a
Pipeline([StandardScaler → SVC(rbf)]) - Fits and evaluates end-to-end in a single object — clean, reproducible, and leak-proof
jupyter notebook weather_classification.ipynbRun all cells in order. Each task section is self-contained with inline comments.
- SVM (Support Vector Machine): Finds the optimal hyperplane that maximises the margin between classes.
- Linear kernel: Works well when classes are linearly separable.
- RBF kernel: Maps data to a higher-dimensional space; handles non-linear boundaries.
- Pipeline: Chains preprocessing and modelling into one estimator, preventing data leakage during cross-validation.