Refactor K-Means clustering to use Wine Quality dataset by kaleb-kebede · Pull Request #11 · softwareWCU/Unsupervised-Machine-Learning-Clustering-

kaleb-kebede · 2025-12-24T22:49:57Z

Summary

This PR updates the Unsupervised Learning assignment to use the wine-quality-white-and-red.csv dataset. The previous medicineData.csv contained only 4 rows, which was insufficient for training a valid K-Means model.

Key Changes

Dataset Switch: Changed source URL to the Wine Quality dataset (~6,000 rows) to generate meaningful clusters.
Preprocessing: * Added duplicate removal.
- Implemented StandardScaler to normalize feature variance (crucial for K-Means).
Model Tuning:
- Generated a valid Elbow Method graph to determine the optimal number of clusters ($k$).
- Selected optimal_k = 4 based on the inertia plot.
Visualization: * Used PCA to reduce dimensionality to 2D for plotting.
- Added centroids to the scatter plot for clarity.

Results

The model now successfully segments wine samples into distinct clusters and provides a valid prediction for new input data.

Refactor K-Means clustering to use Wine Quality dataset

802eac1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor K-Means clustering to use Wine Quality dataset#11

Refactor K-Means clustering to use Wine Quality dataset#11
kaleb-kebede wants to merge 1 commit intosoftwareWCU:mainfrom
kaleb-kebede:main

kaleb-kebede commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kaleb-kebede commented Dec 24, 2025

Summary

Key Changes

Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant