This project applies K-Means Clustering, an unsupervised machine learning algorithm, to cluster universities based on their academic and institutional characteristics.
The project uses the College Dataset and evaluates how well the generated clusters match the actual classification of universities as Private or Public.
- Perform data preprocessing
- Normalize numerical features
- Apply K-Means Clustering
- Visualize clusters
- Compare predicted clusters with actual labels
- Evaluate clustering performance
Dataset: College.csv
The dataset contains various university attributes such as:
- Number of Applications
- Acceptance Rate
- Enrollment
- Tuition Fees
- Graduation Rate
- Student-Faculty Ratio
- Out-of-State Tuition
- Undergraduate Students
- Private/Public Status
- Python
- Jupyter Notebook
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-Learn
- Import Libraries
- Load Dataset
- Data Cleaning
- Feature Scaling
- Apply K-Means Clustering
- Visualize Clusters
- Evaluate using:
- Confusion Matrix
- Classification Report
- Accuracy Score
K-Means is an unsupervised machine learning algorithm that groups similar data points into predefined clusters.
In this project:
- Number of Clusters = 2
- Random State = 42
- Feature Scaling using StandardScaler
The notebook includes a scatter plot showing the clustering results using:
- Out-of-State Tuition
- Full-Time Undergraduate Students
The generated clusters are compared with the original university categories (Private/Public) using:
- Confusion Matrix
- Classification Report
- Accuracy Score
This helps evaluate how effectively K-Means grouped the universities.
University-Clustering/
│
├── College.csv
├── cluster universities.ipynb
├── README.md
├── requirements.txt
├── .gitignore
└── LICENSE
Clone the repository
git clone https://github.com/yourusername/University-Clustering.gitMove into the project
cd University-ClusteringInstall dependencies
pip install -r requirements.txtRun the notebook
jupyter notebook- Elbow Method for selecting optimal K
- Silhouette Score
- PCA Visualization
- Interactive Plots
- Hyperparameter Optimization
Om Patil
Machine Learning Project
This project is licensed under the MIT License.