Wine Quality Profiling using K-Means Clustering (Unsupervised Learning)-by Cherinet Bekele#13
Open
chere-collab wants to merge 1 commit intosoftwareWCU:mainfrom
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Wine Quality Profiling and Classification using K-Means Clustering
Data Cleaning & Preparation
Encoding: The model converted the type column from text into numerical values (0 for Red, 1 for White).
Scaling: The model standardized all numerical columns (such as alcohol, density, and residual sugar) to ensure they all contribute equally to the distance calculation.
Exploratory Visualization
Correlation Analysis: The model analyzed the relationships between all chemical columns (like pH and fixed acidity) using a heatmap.
Initial Plotting: The model visualized the distribution of specific columns, such as alcohol and density, to see how the wine types naturally separated.
3.Determining the Optimal Clusters (K)Mathematical Calculation: The model computed the WCSS (error) for the entire set of features across different cluster counts.
Partitioning: The model sorted all wines into three distinct groups based on the multi-dimensional values of all 12 chemical columns.
Centroid Calculation: The model calculated the arithmetic mean for every column (e.g., average volatile acidity, sulphates, and quality) to establish the new center of each cluster.
5.New Data Prediction
Classification: The model processed a new set of wine features through the same scaling rules used for the original columns.
Assignment: The model predicted the cluster by comparing the new wine's chemical measurements against the final means (centroids) of each established group.