Machine Learning algorithms: a study on noise (original) (raw)

Comparative Analysis of Machine Learning Regression Algorithms on Air Pollution Dataset

International Journal of Scientific Research in Computer Science, Engineering and Information Technology

Air pollution has both acute and chronic effects on human health, affecting a number of different systems and organs. Examining and protecting air quality has become one of the most essential activities for the government in many industrial and urban areas today. Air pollutants, such as carbon monoxide (CO), sulfur dioxide (SO(2)), nitrogen oxides (NOx), volatile organic compounds (VOCs), ozone (O(3)), heavy metals, and respirable particulate matter (PM2.5 and PM10), differ in their chemical composition, reaction properties, emission, time of disintegration and ability to diffuse in long or short distances. The main objective of this paper to build a model for predicting Air Quality Index(AQI) of the specific cities using various types of machine learning algorithms namely Multiple Linear Regression, K Nearest Neighbours(KNN), Support Vector Machine(SVM) and Decision Tree. And also evaluate and compare the performance of every algorithm based on their accuracy score and errors. Air ...

Machine Learning Algorithms: A study on noise sensitivity

2003

Abstract. In this study, results of a variety of ML algorithms are tested against artificially polluted datasets with noise. Two noise models are tested, each of these studied on a range of noise levels from 0 to 50algorithm, a linear regression algorithm, a decision tree, a M5 algorithm, a decision table classifier, a voting interval scheme as well as a hyper pipes classifier.

Machine learning algorithms in air quality modeling

Global Journal of Environmental Science and Management, 2019

Modern studies in the field of environment science and engineering show that deterministic models struggle to capture the relationship between the concentration of atmospheric pollutants and their emission sources. The recent advances in statistical modeling based on machine learning approaches have emerged as solution to tackle these issues. It is a fact that, input variable type largely affect the performance of an algorithm, however, it is yet to be known why an algorithm is preferred over the other for a certain task. The work aims at highlighting the underlying principles of machine learning techniques and about their role in enhancing the prediction performance. The study adopts, 38 most relevant studies in the field of environmental science and engineering which have applied machine learning techniques during last 6 years. The review conducted explores several aspects of the studies such as: 1) the role of input predictors to improve the prediction accuracy; 2) geographically...

Machine learning algorithms for predicting air pollutants

E3S Web of Conferences, 2019

An atmospheric particular matter, commonly recognized as PM, contains solid particles and liquid droplets suspending in an ambient air. A high concentration of PM is known to seriously cause adverse health effects to humans especially a small-sized particle, known as PM2.5. Not only health effects, environmental effects are also obviously observed. This work aims to estimate a likelihood of PM2.5 exceeding a pre-defined safety threshold. Multiple machine learning models are explored in this work. Particularly, classification models are implemented based on meteorological data and air pollutant features measured at different altitudes above a ground level. These features are shifted back to various time steps resulting in more insightful time-lagged features. Furthermore, a feature selection technique is implemented to specify a desirable set of important features. A re-sampling technique is also employed to address an unbalancing level of the response value in an original data set. The proposed models are evaluated on a case study whose data set is collected from an air monitoring station located in Bangkok, Thailand.

Investigation on Machine Learning Approaches for Environmental Noise Classifications

Journal of Electrical and Computer Engineering, 2023

Tis project aims to investigate the best machine learning (ML) algorithm for classifying sounds originating from the environment that were considered noise pollution in smart cities. Sound collection was carried out using necessary sound capture tools, after which ML classifcation models were utilized for sound recognition. Additionally, noise pollution monitoring using Python was conducted to provide accurate results for sixteen diferent types of noise that were collected in sixteen cities in Malaysia. Te numbers on the diagonal represent the correctly classifed noises from the test set. Using these correlation matrices, the F1 score was calculated, and a comparison was performed for all models. Te best model was found to be random forest.

A Comparative Study on Air Quality Analysis and Prediction Using Machine Learning Techniques

Serial Publication, 2019

Air pollution is a complex mixture of toxic components with considerable impacts on humans. The increase in air pollution is of concern for many urban cities in India and other developing countries around the world. In this paper air pollution analysis and prediction is done using the machine learning techniques. Bengaluru is one of India's fastest growing metropolises and, although benefiting economically due to its rapid development, has a rapidly deteriorating environment. The data sets were collected from government repositories of Bengaluru region i.e Karnataka Pollution Control Board (KPCB). The collected datasets were pre-processed. Pre-processing includes, clustering of datasets using K-Means algorithm. The clustered datasets were further labelled. These labelled datasets were subjected to various machine learning algorithms such as Multinominal logistic regression, Decision tree and Random forest in order to analyze the air pollution. Comparisons were made using the three models to obtain the best fit model for analysis. To predict the next day air pollutant data, normalization is applied upon the datasets. Using the standard deviation and mean value the next day air pollutant data was obtained. The algorithms were compared based on the accuracy. Generally Random forest algorithm gives good result but in this case Decision tree and Multinominal logistic regression have given high accuracy.

Air Quality Prediction based on Supervised Machine Learning Methods

International Journal of Innovative Technology and Exploring Engineering, 2019

Generally, Air pollution alludes to the issue of toxins into the air that are harmful to human well being and the entire planet. It can be described as one of the most dangerous threats that the humanity ever faced. It causes damage to animals, crops, forests etc. To prevent this problem in transport sectors have to predict air quality from pollutants using machine learning techniques. Subsequently, air quality assessment and prediction has turned into a significant research zone. The aim is to investigate machine learning based techniques for air quality prediction. The air quality dataset is preprocessed with respect to univariate analysis, bi-variate and multi-variate analysis, missing value treatments, data validation, data cleaning/preparing. Then, air quality is predicted using supervised machine learning techniques like Logistic Regression, Random Forest, K-Nearest Neighbors, Decision Tree and Support Vector Machines. The performance of various machine learning algorithms is ...

Air Pollution Prediction using Machine Learning

2020 IEEE Bombay Section Signature Conference (IBSSC), 2020

Industrial pollution is one of the most serious problems faced today. Long-term exposure to air pollution causes severe health issues including respiratory and lung disorders. Presently laws regarding industrial pollution monitoring and control are not stringent enough. The working dataset includes parameters of air in terms of ambient air as well as of the stack emission. On this data, various Machine Learning (ML) algorithms were applied for prediction of emission rate, and comparative analysis is done. These algorithms were implemented using python and the mean square error of each of these was measured to check for accuracy. It was observed that among all classifiers, the Multi-layer perceptron model was seen to have the least error. The air dispersion models are then applied to the predicted emission rate to calculate the dispersion of pollutants from the source that is at the stack level.

Air Quality Prediction by Machine Learning

International Journal of Scientific Research in Science and Technology, 2021

The air quality observing framework estimates different air toxins in different areas to keep up great air quality. It is the consuming issue in the current situation. Air is defiled by the appearance of risky gases into the environment from the enterprises, vehicular outflows, and so forth These days, air contamination has arrived at basic levels and the air contamination level in many significant urban areas has crossed the air quality list esteem as set by the public authority. It significantly affects the soundness of the human. With the headway in innovation of ML, it is currently conceivable to anticipate the poisons dependent on the past information. In this paper we are presenting a gadget that can proceed with that can take present poisons and with the assistance of past toxins, we are running a calculation dependent on the ML to anticipate the future information of contaminations. The detected information is saved inside the Excel sheet for additional assessment. These sen...

The Influence of Data Length on the Performance of Artificial Intelligence Models in Predicting Air Pollution

Advances in Meteorology

Air pollution is one of humanity's most critical environmental issues and is considered contentious in several countries worldwide. As a result, accurate prediction is critical in human health management and government decision-making for environmental management. In this study, three artificial intelligence (AI) approaches, namely group method of data handling neural network (GMDHNN), extreme learning machine (ELM), and gradient boosting regression (GBR) tree, are used to predict the hourly concentration of PM2.5 over a Dorset station located in Canada. The investigation has been performed to quantify the effect of data length on the AI modeling performance. Accordingly, nine different ratios (50/50, 55/45, 60/40, 65/35, 70/30, 75/25, 80/20, 85/15, and 90/10) are employed to split the data into training and testing datasets for assessing the performance of applied models. The results showed that the data division significantly impacted the model's capacity, and the 60/40 ra...