Developing machine learning models for air temperature estimation using MODIS data (original) (raw)

8-Day and Daily Maximum and Minimum Air Temperature Estimation via Machine Learning Method on a Climate Zone to Global Scale

Remote Sensing

Air temperature (Ta) is a required input in a wide range of applications, e.g., agriculture. Land Surface Temperature (LST) products from Moderate Resolution Imaging Spectroradiometer (MODIS) are widely used to estimate Ta. Previous studies of these products in Ta estimation, however, were generally applied in small areas and with a small number of meteorological stations. This study designed both temporal and spatial experiments to estimate 8-day and daily maximum and minimum Ta (Tmax and Tmin) on three spatial scales: climate zone, continental and global scales from 2009 to 2018, using the Random Forest (RF) method based on MODIS LST products and other auxiliary data. Factors contributing to the relation between LST and Ta were determined based on physical models and equations. Temporal and spatial experiments were defined by the rules of dividing the training and validation datasets for the RF method, in which the stations selected in the training dataset were all included or not...

A Neural Network Regression Model for Estimating Maximum Daily Air Temperature Using LANDSAT-8 Data

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

Urban Heat Islands (UHI) phenomenon is a pressing problem for highly industrialized areas with serious risks for public health. Weather stations guarantee long-term accurate observations of weather parameters, such Air Temperature (AT), but lack appropriate spatial coverage. Numerous studies have argued that satellite Land Surface Temperature (LST) is a relevant parameter for estimating AT maps, exploring both linear regression and Machine Learning algorithms. This study proposes a Neural Network (NN) regression model for estimating the maximum AT from Landsat-8 data. The approach has been tested in a variegated morphological region (Puglia, Italy) using a large stack of data acquired from 2018 to 2020. The algorithm uses the median values of LST and Normalized Difference Vegetation Index (NDVI) computed using different buffer radius around the location of each reference weather station (250 m, 1000 m, and 2000 m) to train the NN model with a K-fold cross-validation strategy. The reference dataset was split into three sets using a stratified sampling approach considering the different station categories: rural, High-and Low-density Urban areas respectively. The algorithm was tested with different learning rates (LR) (0.001 and 0.005). The results show that our NN model accuracy improves with the increase of the buffer radius, minimizing the difference in terms of R^2 between training and evaluation data, with an overall accuracy consistently higher than 0.84. Future research could investigate more input variables in the NN model such as morphology or climate variables and test the algorithm on larger areas.

Modeling Air Temperature in Forested Areas using Machine Learning

WSEAS TRANSACTIONS ON COMPUTERS

Air Temperature is a fundamental measure of the Earth’s climate but is only measured at fixed locations. Land surface temperature can be measured widely using satellites. To estimate air temperature (Ta) from the surface temperature (Ts) measured on the forested slopes of Kilimanjaro, four models with unique sets of inputs were tested using five machine learning algorithms. The RMSE for each model was compared with a benchmark model. Models and algorithms were ranked according to their RMSE (Root Mean Square Error) The models and algorithms reliability and consistency ranking were calculated. The best model and algorithm were determined. Novel models results were compared with the benchmark model. All models outperformed the benchmark model in the consistency ranking while three out of four models outperformed the benchmark model in the reliability ranking. Thus machine learning improves the estimation of air temperature in this forested environment.

Interpolation of Instantaneous Air Temperature Using Geographical and MODIS Derived Variables with Machine Learning Techniques

2019

Several methods have been tried to estimate air temperature using satellite imagery. In this paper, the results of two machine learning algorithms, Support Vector Machine and Random Forest, are compared with Multivariate Linear Regression, TVX and Ordinary kriging. Several geographic, remote sensing and time variables are used as predictors. The validation is carried out using four different statistics on a daily basis allowing the use of ANOVA to compare the results. The main conclusion is that Random Forest with residual kriging produces the best results (R$^2$=0.612 pm\pmpm 0.019, NSE=0.578 pm\pmpm 0.025, RMSE=1.068 pm\pmpm 0.027, PBIAS=-0.172 pm\pmpm 0.046), whereas TVX produces the least accurate results. The environmental conditions in the study area are not really suited to TVX, moreover this method only takes into account satellite data. On the other hand, regression methods (Support Vector Machine, Random Forest and Multivariate Linear Regression) use several parameters that are ea...

MODIS-Based Estimation of Terrestrial Latent Heat Flux over North America Using Three Machine Learning Algorithms

Remote Sensing, 2017

Terrestrial latent heat flux (LE) is a key component of the global terrestrial water, energy, and carbon exchanges. Accurate estimation of LE from moderate resolution imaging spectroradiometer (MODIS) data remains a major challenge. In this study, we estimated the daily LE for different plant functional types (PFTs) across North America using three machine learning algorithms: artificial neural network (ANN); support vector machines (SVM); and, multivariate adaptive regression spline (MARS) driven by MODIS and Modern Era Retrospective Analysis for Research and Applications (MERRA) meteorology data. These three predictive algorithms, which were trained and validated using observed LE over the period 2000-2007, all proved to be accurate. However, ANN outperformed the other two algorithms for the majority of the tested configurations for most PFTs and was the only method that arrived at 80% precision for LE estimation. We also applied three machine learning algorithms for MODIS data and MERRA meteorology to map the average annual terrestrial LE of North America during 2002-2004 using a spatial resolution of 0.05 • , which proved to be useful for estimating the long-term LE over North America.

Evaluating machine learning approaches for the interpolation of monthly air temperature at Mt. Kilimanjaro, Tanzania

Spatial Statistics, 2015

Spatially high resolution climate information is required for a variety of applications in but not limited to functional biodiversity research. In order to scale the generally plot-based research findings to a landscape level, spatial interpolation methods of meteorological variables are required. Based on a network of temperature observation plots across the southern slopes of Mt. Kilimanjaro, the skill of 14 machine learning algorithms in predicting spatial temperature patterns is tested and evaluated against the heavily utilized kriging approach. Based on a 10-fold cross-validation testing design, regression trees generally perform better than linear and non-linear regression models. The best individual performance has been observed by the stochastic gradient boosting model followed by Cubist, random forest and model averaged neural networks which except for the latter are all regression tree-based algorithms. While these machine learning algorithms perform better than kriging in a quantitative evaluation, the overall visual interpretation of the resulting air temperature maps is ambiguous. Here, a combined Cubist and residual kriging approach can be considered the best solution.

Creating Air Temperature Models for High- Elevation Desert Areas Using Machine Learning

Journal of Computational Innovation and Analytics (JCIA)

The standard way to measure the air temperature (Ta) as the key variable in climate change studies is at 2m height above the surface at a fixed location (weather station). In contrast, the surface temperature (Ts) can be measured by satellites over large areas. Estimation of Ta from Ts is one potential way of overcoming shortages due to uneven or irregular distributions of weather stations. However, whether this is successful has not been assessed in high-elevation regions. This is particularly important in high-elevation regions. In this study, we estimate Ta in the high-elevation desert zone of Kilimanjaro (>4500m) using four models (five models including the benchmark model) with unique sets of inputs using five machine learning (ML) algorithms. Note that different combinations of Ta and Ts were tested as inputs to evaluate the potential of Ts as a proxy for Ta. The Root Mean Square Error (RMSE) for each model was compared with a benchmark model and ranked according to their R...

Comparative Between Neural Networks Generate Predictions for Global Solar Radiation and Air Temperature

Blucher Engineering Proceedings, 2021

Technology is becoming an increasingly important and indispensable tool in human life, making it necessary to develop various forms of renewable energies. However, over time it became necessary to improve this technology so that it becomes more advanced and efficient. The purpose of the research is to compare the results of three distinct neural networks, to forecast in two hours, using the database available by the Instituto Nacional de Meteorologia (INMET). The results indicate that the K-Nearest Neighbors Regression network proved to be more effective for estimating Global Solar Radiation (W/m²) and Multi-LayerPerceptron for forecasting Air Temperature (ºC).

Prediction of daily global solar radiation and air temperature using six machine learning algorithms; a case of 27 European countries

Ecological Informatics

The prediction of global solar radiation in a region is of great importance as it provides investors and politicians with more detailed knowledge about the solar resource of that region, which can be very beneficial for largescale solar energy development. In this sense, the main objective of this study is to predict the daily global solar radiation data of 27 cities (Brussels, Paris, Lisbon, Madrid…), located in 27 countries, which have mostly different solar radiation distributions in Europe. In this research, six different machine-learning algorithms (Linear model (LM), Decision Tree (DT), Support Vector Machine (SVM), Deep Learning (DL), Random Forest (RF) and Gradient Boosted Trees (GBT)) are used. In the training of these algorithms, daily air temperature(Ta), wind speed(Va), relative humidity(RH) and solar radiation of these cities are used. The data is supplied from the Meteonorm tool and cover the last years grouped in two periods (1960-1990; 2000-2019). To decide on the success of these algorithms, four different statistical metrics (Average Relative Error (ARE), Average absolute Error (AAE), Root Mean Squared Error (RMSE), and R 2 (R-Squared)) are discussed in the study. In addition, the forecasting of air temperature and global solar radiation of these cities in 2050 and 2100 were made using three of the most recent Intergovernmental Panel on Climate Change (IPCC) scenarios (RCP2.6; RCP 4.5, and RCP 8.5). The results show that ARE, R, 2 and RMSE values of all algorithms are ranging from 0.114 to 6.321, from 0.382 to 0.985, from 0.145 to 2.126 MJ/m 2 , respectively. By analysing all the algorithms, it is noticed that the Decision tree exhibited the worst result in terms of R, 2 and RMSE metrics. Among the six prediction algorithms, the DL was recognized as the only algorithm that exceeded the t-critical value (The t-critical value is the cutoff between retaining or rejecting the null hypothesis). Globally, all the six machine learning algorithms used in this research can be applied to predict the daily global solar radiation data with good accuracy. Despite this, the SVM model is the best model among all the six models used. It is followed by the DL, LM, GB, RF and DT, respectively.

A Generic Machine Learning-Based Framework for Predictive Modeling of Land Surface Temperature

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2023

In the realm of data analytics, machine learning (ML) is one of the most successful techniques for making predictions. The ability of ML algorithms has also been studied in various aspects of land surface temperature (LST) besides its retrieval. The few investigations on LST retrieval using ML algorithms suggested that it may potentially obtain the LST values incorporating relevant variables of land surface parameters; however, the variables and ML models used differ, and so do their accuracies. The accuracy of the model is affected by the variable's contribution, its quality and quantity, and the fulfilment of each technique's assumptions. Hence this study provides a wide range of LST indicators to be employed for LST retrieval using a widely used ML algorithm, random forest. The ML algorithm framework for LST prediction is illustrated with significant spectral indices and terrain parameters across the highly industrialised district of Jharkhand, India. With the exception of one (aspect) variable, the analysis shows that all 20 variables that were included as independent factors were significant and equally contributed to the model. The model built with all the variables including the aspect of the terrain obtained an RMSE of 1.13 degree Celsius and R 2 of 0.48. However, after the removal of aspect, the model obtained an R 2 of 0.89 and RMSE of 0.74º C. The performance of the model on consecutive removal of lesser significant variables are evaluated and the study made clear how crucial it is to consider several environmental or land-use factors that could be pertinent to LST.