Estimation of Water Quality Parameters With Data-Driven Model (original) (raw)

2016, Journal - American Water Works Association

Assessment of surface water quality is important in the management of water resources (Dogan et al. 2009). Water quality in rivers is paramount to the well-being of nature and humans, and surface water quality is usually related to the type of surrounding industries, agriculture, and human activities. Water is withdrawn from the hydrologic cycle to meet various needs and then is returned (Banejad & Olyaie 2011). Given the essential role of rivers to agricultural, industrial, and urban needs, it is necessary to regularly monitor and evaluate water quality in rivers. As rivers pass through different regions, changes in water quality and the level of hydrochemical parameters are observed in these regions. Because of the gradual decline in water quality over time, regulatory bodies in various countries have made decisions to mitigate the damage. Ecologically acceptable water management calls for accurate modeling, forecasting, and analyzing water quality in rivers (Durdu 2010). Numerous models have been developed for management of water quality, such as QUAL2E, Water Quality Analysis Simulation, and the US Army Corps of Engineers' Hydrologic Engineering Center-5Q (Chen et al. 2003). Using these models is time-consuming and expensive; therefore, development of cost-effective models is encouraged. Because of the propensity of varied standards for water quality, different parameters are used as quality indicators. The quantity of ammonia, cadmium, chemical oxygen demand, chlorine, copper, dissolved phosphorus, lead, nitrogen dioxide, suspended solids, total nitrogen, total phosphorus, zinc, sodium, sodium adsorption ratio, sulfate ions, bicarbonate ions, electrical conductivity (EC), total dissolved solids (TDS), and pH is frequently measured at water quality monitoring stations. EC and TDS levels in water are two of the main parameters used to determine quality of drinking and agricultural water because they directly represent the total concentration of salt in water. High EC and TDS values are not desirable in water used for irrigation because salt affects plant growth through osmosis (Phocaides 2000). Advances in data science and data mining methods such as neural networks (NNs), fuzzy inference methods, support vector machines (SVMs), and k-nearest neighbors (k-NN), have made it possible to solve complex problems in high dimensions. The general principle behind these methods lies in exploring hidden relationships in large volumes of data and building models that reflect physical processes governing the system under study. A data-derived model represents a relationship between input variables and output variables. Such a model can be highly accurate because it captures relationships of any kind that are expressed in data, including the underlying physics and chemistry.