Prediction of Water Potability Using Machine Learning Models
Keywords:
Water Potability, Machine Learning, Random Forest, XGBoost, Stacking Ensemble, Water Quality Prediction, Data PreprocessingAbstract
Despite advancements made to improve water quality, contamination and water-related diseases continue to pose a serious threat to the world population. While classical techniques of water quality evaluation in the lab environment yield reliable results, they suffer from high costs and time delays while being inappropriate for usage in remote locations. For overcoming these disadvantages, we suggest implementing a machine learning-based approach for assessing water potability by considering physicochemical parameters. The suggested framework is based on application of Random Forest, XGBoost, and a Stacking classifier trained on UCI Water Quality Dataset featuring 3,276 instances and 10 important characteristics. Various data preprocessing strategies including KNN imputation, managing outliers, normalizing values, and balancing data using SMOTE were used for improving predictive capability of algorithms. Experiments show that the implementation of the best performing algorithm achieves 92.8% accuracy, 91.2% precision, 90.4% recall rate, and 90.8% F1-score. The most influential predictors according to the feature importance analysis were determined to be pH, turbidity, and sulfate content. We have successfully proposed an intelligent and cost-efficient approach to water quality assessment which could also be integrated with IoT-based technologies for real-time evaluation.