Water Potability Prediction Using Machine Learning

This project focuses on predicting whether water is safe for drinking based on its physicochemical properties using machine learning classification techniques.

Problem Statement

Access to safe drinking water is a critical public health concern. The goal of this project is to build and compare machine learning models that can classify water samples as potable or non-potable based on measured water quality parameters.

Dataset

Source: Kaggle – Water Potability Dataset
Link: https://www.kaggle.com/datasets/adityakadiwal/water-potability
Description:
The dataset contains water quality metrics such as pH, hardness, solids, chloramines, sulfate, conductivity, organic carbon, trihalomethanes, and turbidity, along with a binary target variable indicating potability.

Note: The dataset is not included in this repository. Please download it directly from Kaggle using the link above.

Approach

Data loading and initial exploration
Handling missing values using statistical imputation
Feature scaling where required
Training and evaluation of multiple classification models
Performance comparison across models

Machine Learning Models Used

Logistic Regression
Random Forest Classifier
Support Vector Machine (SVM)

Evaluation

Model performance was evaluated using classification metrics such as accuracy and confusion matrix analysis. The results demonstrate how different models perform on real-world environmental data and highlight their strengths and limitations.

Tools and Technologies

Python
NumPy
Pandas
Matplotlib
Seaborn
Scikit-learn
Google Colab

Project Structure

water-potability-prediction/
├── notebooks/
│ └── water_potability.ipynb
├── requirements.txt
├── .gitignore
└── README.md

How to Run

Clone the repository
Install dependencies:
```
pip install -r requirements.txt
```
Download the dataset from Kaggle
Open and run the notebook in Jupyter Notebook or Google Colab

Limitations

Mean imputation may reduce feature variance
Accuracy alone may not fully capture real-world risk, especially for false negative predictions
Further tuning and additional evaluation metrics could improve model reliability

Program Context

This project was completed as part of the IBM SkillsBuild – Mastering Data with Machine Learning program in collaboration with CSRBOX.

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Water Potability Prediction Using Machine Learning

Problem Statement

Dataset

Approach

Machine Learning Models Used

Evaluation

Tools and Technologies

Project Structure

How to Run

Limitations

Program Context

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Water Potability Prediction Using Machine Learning

Problem Statement

Dataset

Approach

Machine Learning Models Used

Evaluation

Tools and Technologies

Project Structure

How to Run

Limitations

Program Context

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages