This project was developed as part of the IBM SkillBuild AIML Internship. It aims to predict employee salary classes based on demographic and professional features using various machine learning algorithms.
To build a machine learning model that can accurately predict employee salaries using features such as education, experience, work class, and occupation. The model is intended to assist HR departments in making fair, data-driven salary decisions.
- Name: Adult Income Dataset
- Source: UCI Machine Learning Repository
- Attributes Used:
- Education
- Years of Experience
- Work Class
- Occupation
- Marital Status
- etc.
- Python 3.x
- Jupyter Notebook
- pandas, numpy
- matplotlib, seaborn
- scikit-learn
- Data Collection β Used Adult Income dataset
- Data Cleaning β Removed missing values and duplicates
- Feature Engineering β Label encoding of categorical columns
- Model Building β Trained Linear Regression, Decision Tree, and Random Forest
- Evaluation β Measured model performance using RΒ² Score and MSE
- Result β Random Forest Regressor gave the highest accuracy
| Model | RΒ² Score |
|---|---|
| Linear Regression | 0.39 |
| Decision Tree | 0.56 |
| Random Forest | 0.78 |
Random Forest Regressor provided the most accurate predictions. The project demonstrates that machine learning can support fair and efficient salary estimation based on employee attributes.
- Include more features like job performance, location, and company size
- Improve accuracy using hyperparameter tuning and advanced models
- Deploy as a web-based HR tool for real-time salary prediction