Skip to content

Refactor K-Means clustering to use Wine Quality dataset#11

Open
kaleb-kebede wants to merge 1 commit intosoftwareWCU:mainfrom
kaleb-kebede:main
Open

Refactor K-Means clustering to use Wine Quality dataset#11
kaleb-kebede wants to merge 1 commit intosoftwareWCU:mainfrom
kaleb-kebede:main

Conversation

@kaleb-kebede
Copy link
Copy Markdown

Summary

This PR updates the Unsupervised Learning assignment to use the wine-quality-white-and-red.csv dataset. The previous medicineData.csv contained only 4 rows, which was insufficient for training a valid K-Means model.

Key Changes

  • Dataset Switch: Changed source URL to the Wine Quality dataset (~6,000 rows) to generate meaningful clusters.
  • Preprocessing: * Added duplicate removal.
    • Implemented StandardScaler to normalize feature variance (crucial for K-Means).
  • Model Tuning:
    • Generated a valid Elbow Method graph to determine the optimal number of clusters ($k$).
    • Selected optimal_k = 4 based on the inertia plot.
  • Visualization: * Used PCA to reduce dimensionality to 2D for plotting.
    • Added centroids to the scatter plot for clarity.

Results

The model now successfully segments wine samples into distinct clusters and provides a valid prediction for new input data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant