Prediction of Water Potability Using Machine Learning Models

Authors

  • Ravishankara Kulamarva Department of CSE, A J institute of Engineering and Technology, Mangalore, India, Canara Engineering College, Sudheendra Nagar, Benjanapadavu, Visvesvaraya Technological University, Belagavi, Karnataka, India Author
  • Suresha D Department of CSE, A J institute of Engineering and Technology, Mangalore, India Author
  • Anantha Krishna Kamath Canara Engineering College, Sudheendra Nagar, Benjanapadavu, Visvesvaraya Technological University, Belagavi, Karnataka, India Author
  • Arjun Bhat BS Canara Engineering College, Sudheendra Nagar, Benjanapadavu, Visvesvaraya Technological University, Belagavi, Karnataka, India Author
  • Niranjan Sandesh Nayak Canara Engineering College, Sudheendra Nagar, Benjanapadavu, Visvesvaraya Technological University, Belagavi, Karnataka, India Author
  • Ashwini A Kamath Mangalore Institute of Technology and Engineering, Badaga Mijar, Moodbidri, Karnataka, India Author

Keywords:

Water Potability, Machine Learning, Random Forest, XGBoost, Stacking Ensemble, Water Quality Prediction, Data Preprocessing

Abstract

 Despite advancements made to improve water quality, contamination and water-related diseases continue to pose a serious threat to the world population. While classical techniques of water quality evaluation in the lab environment yield reliable results, they suffer from high costs and time delays while being inappropriate for usage in remote locations. For overcoming these disadvantages, we suggest implementing a machine learning-based approach for assessing water potability by considering physicochemical parameters. The suggested framework is based on application of Random Forest, XGBoost, and a Stacking classifier trained on UCI Water Quality Dataset featuring 3,276 instances and 10 important characteristics. Various data preprocessing strategies including KNN imputation, managing outliers, normalizing values, and balancing data using SMOTE were used for improving predictive capability of algorithms. Experiments show that the implementation of the best performing algorithm achieves 92.8% accuracy, 91.2% precision, 90.4% recall rate, and 90.8% F1-score. The most influential predictors according to the feature importance analysis were determined to be pH, turbidity, and sulfate content. We have successfully proposed an intelligent and cost-efficient approach to water quality assessment which could also be integrated with IoT-based technologies for real-time evaluation. 

Downloads

Published

2026-05-08