Machine learning algorithms for predicting air pollutants (original) (raw)

An atmospheric particular matter, commonly recognized as PM, contains solid particles and liquid droplets suspending in an ambient air. A high concentration of PM is known to seriously cause adverse health effects to humans especially a small-sized particle, known as PM2.5. Not only health effects, environmental effects are also obviously observed. This work aims to estimate a likelihood of PM2.5 exceeding a pre-defined safety threshold. Multiple machine learning models are explored in this work. Particularly, classification models are implemented based on meteorological data and air pollutant features measured at different altitudes above a ground level. These features are shifted back to various time steps resulting in more insightful time-lagged features. Furthermore, a feature selection technique is implemented to specify a desirable set of important features. A re-sampling technique is also employed to address an unbalancing level of the response value in an original data set. The proposed models are evaluated on a case study whose data set is collected from an air monitoring station located in Bangkok, Thailand.