Prediction of Water Potability Using Machine Learning Models

Ravishankara Kulamarva; Suresha D; Anantha Krishna Kamath; Arjun Bhat BS; Niranjan Sandesh Nayak; Ashwini A Kamath

Authors

Ravishankara Kulamarva Department of CSE, A J institute of Engineering and Technology, Mangalore, India, Canara Engineering College, Sudheendra Nagar, Benjanapadavu, Visvesvaraya Technological University, Belagavi, Karnataka, India Author
Suresha D Department of CSE, A J institute of Engineering and Technology, Mangalore, India Author
Anantha Krishna Kamath Canara Engineering College, Sudheendra Nagar, Benjanapadavu, Visvesvaraya Technological University, Belagavi, Karnataka, India Author
Arjun Bhat BS Canara Engineering College, Sudheendra Nagar, Benjanapadavu, Visvesvaraya Technological University, Belagavi, Karnataka, India Author
Niranjan Sandesh Nayak Canara Engineering College, Sudheendra Nagar, Benjanapadavu, Visvesvaraya Technological University, Belagavi, Karnataka, India Author
Ashwini A Kamath Mangalore Institute of Technology and Engineering, Badaga Mijar, Moodbidri, Karnataka, India Author

Keywords:

Water Potability, Machine Learning, Random Forest, XGBoost, Stacking Ensemble, Water Quality Prediction, Data Preprocessing

Abstract

Despite advancements made to improve water quality, contamination and water-related diseases continue to pose a serious threat to the world population. While classical techniques of water quality evaluation in the lab environment yield reliable results, they suffer from high costs and time delays while being inappropriate for usage in remote locations. For overcoming these disadvantages, we suggest implementing a machine learning-based approach for assessing water potability by considering physicochemical parameters. The suggested framework is based on application of Random Forest, XGBoost, and a Stacking classifier trained on UCI Water Quality Dataset featuring 3,276 instances and 10 important characteristics. Various data preprocessing strategies including KNN imputation, managing outliers, normalizing values, and balancing data using SMOTE were used for improving predictive capability of algorithms. Experiments show that the implementation of the best performing algorithm achieves 92.8% accuracy, 91.2% precision, 90.4% recall rate, and 90.8% F1-score. The most influential predictors according to the feature importance analysis were determined to be pH, turbidity, and sulfate content. We have successfully proposed an intelligent and cost-efficient approach to water quality assessment which could also be integrated with IoT-based technologies for real-time evaluation.

Prediction of Water Potability Using Machine Learning Models

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

Article Publishing Options

Open Access

Subscription