Skip to content

SHREYAPATIL4/Cluster-Universities

Repository files navigation

🎓 University Clustering using K-Means Machine Learning

📌 Project Overview

This project applies K-Means Clustering, an unsupervised machine learning algorithm, to cluster universities based on their academic and institutional characteristics.

The project uses the College Dataset and evaluates how well the generated clusters match the actual classification of universities as Private or Public.


🎯 Objectives

  • Perform data preprocessing
  • Normalize numerical features
  • Apply K-Means Clustering
  • Visualize clusters
  • Compare predicted clusters with actual labels
  • Evaluate clustering performance

📂 Dataset

Dataset: College.csv

The dataset contains various university attributes such as:

  • Number of Applications
  • Acceptance Rate
  • Enrollment
  • Tuition Fees
  • Graduation Rate
  • Student-Faculty Ratio
  • Out-of-State Tuition
  • Undergraduate Students
  • Private/Public Status

🛠 Technologies Used

  • Python
  • Jupyter Notebook
  • Pandas
  • NumPy
  • Matplotlib
  • Seaborn
  • Scikit-Learn

📊 Machine Learning Workflow

  1. Import Libraries
  2. Load Dataset
  3. Data Cleaning
  4. Feature Scaling
  5. Apply K-Means Clustering
  6. Visualize Clusters
  7. Evaluate using:
    • Confusion Matrix
    • Classification Report
    • Accuracy Score

📈 Algorithm Used

K-Means Clustering

K-Means is an unsupervised machine learning algorithm that groups similar data points into predefined clusters.

In this project:

  • Number of Clusters = 2
  • Random State = 42
  • Feature Scaling using StandardScaler

📷 Visualization

The notebook includes a scatter plot showing the clustering results using:

  • Out-of-State Tuition
  • Full-Time Undergraduate Students

📋 Results

The generated clusters are compared with the original university categories (Private/Public) using:

  • Confusion Matrix
  • Classification Report
  • Accuracy Score

This helps evaluate how effectively K-Means grouped the universities.


📁 Project Structure

University-Clustering/
│
├── College.csv
├── cluster universities.ipynb
├── README.md
├── requirements.txt
├── .gitignore
└── LICENSE

▶️ Installation

Clone the repository

git clone https://github.com/yourusername/University-Clustering.git

Move into the project

cd University-Clustering

Install dependencies

pip install -r requirements.txt

Run the notebook

jupyter notebook

Future Improvements

  • Elbow Method for selecting optimal K
  • Silhouette Score
  • PCA Visualization
  • Interactive Plots
  • Hyperparameter Optimization

Author

Om Patil

Machine Learning Project


License

This project is licensed under the MIT License.

About

K-Means clustering project that groups universities into Private and Public categories using unsupervised machine learning, data preprocessing, and visualization techniques.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors