Regression analysis of spatial data (original) (raw)

Analyzing spatial ecological data using linear regression and wavelet analysis

Stochastic Environmental Research and Risk Assessment, 2007

Spatial (two-dimensional) distributions in ecology are often influenced by spatial autocorrelation. In standard regression models, however, observations are assumed to be statistically independent. In this paper we present an alternative to other methods that allow for autocorrelation. We show that the theory of wavelets provides an efficient method to remove autocorrelations in regression models using data sampled on a regular grid. Wavelets are particularly suitable for data analysis without any prior knowledge of the underlying correlation structure. We illustrate our new method, called wavelet-revised model, by applying it to multiple regression for both normal linear models and logistic regression. Results are presented for computationally simulated data and real ecological data (distribution of species richness and distribution of the plant species Dianthus carthusianorum throughout Germany). These results are compared to those of generalized linear models and models based on generalized estimating equations. We recommend wavelet-revised models, in particular, as a method for logistic regression using large datasets.

Spatial Regression Models for Field Trials: A Comparative Study and New Ideas

Frontiers in Plant Science, 2022

Naturally occurring variability within a study region harbors valuable information on relationships between biological variables. Yet, spatial patterns within these study areas, e.g., in field trials, violate the assumption of independence of observations, setting particular challenges in terms of hypothesis testing, parameter estimation, feature selection, and model evaluation. We evaluate a number of spatial regression methods in a simulation study, including more realistic spatial effects than employed so far. Based on our results, we recommend generalized least squares (GLS) estimation for experimental as well as for observational setups and demonstrate how it can be incorporated into popular regression models for high-dimensional data such as regularized least squares. This new method is available in the BioConductor R-package pengls. Inclusion of a spatial error structure improves parameter estimation and predictive model performance in low-dimensional settings and also improves feature selection in high-dimensional settings by reducing "red-shift": the preferential selection of features with spatial structure. In addition, we argue that the absence of spatial autocorrelation (SAC) in the model residuals should not be taken as a sign of a good fit, since it may result from overfitting the spatial trend. Finally, we confirm our findings in a case study on the prediction of winter wheat yield based on multispectral measurements.

A Wavelet-Based Extension of Generalized Linear Models to Remove the Effect of Spatial Autocorrelation. 基于小波扩展广义线性模型消除空间自相关的影响

Geographical Analysis, 2010

Biogeographical studies are often based on a statistical analysis of data sampled in a spatial context. However, in many cases standard analyses such as regression models violate the assumption of independently and identically distributed errors. In this article, we show that the theory of wavelets provides a method to remove autocorrelation in generalized linear models (GLMs). Autocorrelation can be described by smooth wavelet coefficients at small scales. Therefore, data can be decomposed into uncorrelated and correlated parts. Using an appropriate linear transformation, we are able to extend GLMs to autocorrelated data. We illustrate our new method, called the wavelet-revised model (WRM), by applying it to multiple regression with response variables conforming to various distributions. Results are presented for simulated data and real biogeographical data (species counts of the plant genus Utricularia [bladderworts] in grid cells throughout Germany). The results of our WRM are compared with those of GLMs and models based on generalized estimating equations. We recommend WRMs, especially as a method that allows for spatial nonstationarity. The technique developed for lattice data is applicable without any prior knowledge of the real autocorrelation structure.

The Use of Geographically Weighted Regression for Spatial Prediction: An Evaluation of Models Using Simulated Data Sets

Mathematical Geosciences, 2010

Increasingly, the geographically weighted regression (GWR) model is being used for spatial prediction rather than for inference. Our study compares GWR as a predictor to (a) its global counterpart of multiple linear regression (MLR); (b) traditional geostatistical models such as ordinary kriging (OK) and universal kriging (UK), with MLR as a mean component; and (c) hybrids, where kriging models are specified with GWR as a mean component. For this purpose, we test the performance of each model on data simulated with differing levels of spatial heterogeneity (with respect to data relationships in the mean process) and spatial autocorrelation (in the residual process). Our results demonstrate that kriging (in a UK form) should be the preferred predictor, reflecting its optimal statistical properties. However the GWRkriging hybrids perform with merit and, as such, a predictor of this form may provide a worthy alternative to UK for particular (non-stationary relationship) situations when UK models cannot be reliably calibrated. GWR predictors tend to perform more poorly than their more complex GWR-kriging counterparts, but both GWR-based models are useful in that they provide extra information on the spatial processes generating the data that are being predicted.

A Robust Test of Spatial Predictive Models: Geographic Cross-Validation

Journal of Environmental Informatics, 2011

Predictive modeling is an important tool for identifying areas for conservation prioritization. But the reliability of any model depends on how well its predictions can be generalized beyond the area surveyed. Recent work points to the potential for enhancing predictive power by incorporating such spatial processes as autocorrelation or the influence of location, so this study addressed two questions: (1) what affect does model complexity, spatial autocorrelation and spatial location have on model accuracy? (2) how generalizable are different methods when applied to new geographic test regions? On average, predictive power declined 22.7% ± 2.7% SE when models were used to predict occurrences in "unsampled" geographic test regions. Overall variability in performance depended on the method used. AUTO and GAM models tended to be amongst the least variable, but results depended upon species. Our results suggest that models with complex functional relationships between the response and predictor variables (such as GAMs fit with up to 5 knots) tended to either improve accuracy, or perform more consistently across species, but not both at the same time. In general, it is very difficult to accurately extrapolate model predictions into unsampled geographic areas. However, we found that habitat specialists such as the Sedge Wren were consistently well predicted, regardless of method, and that autocorrelated regression (using a Gibbs sampler and simulation of presence/absence) could be more reliably generalized for species showing strong social structure (e.g., patchiness). GWR was especially sensitive to the plots used to train the model.

Modeling spatially-varying ecological relationships using geographically weighted generalized linear model: A simulation study based on longline seabird bycatch

Fisheries Research, 2016

Geographically weighted regression (GWR) is a relatively new technique to explore spatiallyvarying relationships between biological and environmental processes. It allows parameters to vary over space and assumes data to follow a normal distribution. We extend GWR to a geographically weighted generalized linear model (GW-GLM) by incorporating statistical distributions other than the normal distribution (i.e., the binomial distribution). We demonstrate the application of GW-GLM with an empirical example, U.S. Atlantic pelagic longline seabird bycatch. Due to the high percentage of zero observations in the seabird bycatch data, we analyzed the positive catch rates (number of seabirds caught per 1000 hooks) and the probability of catching a seabird separately. Parameter estimates exhibited considerable spatial variation, especially for target catch rate when analyzing the positive catch data, and for intercept, water depth and water temperature when estimating the probability of catching seabirds. We compared model performance of GW-GLM with a global generalized linear model, a mixed effect model with a random areal effect, and a spatial expansion model that is an early technique to model spatially-varying ecological relationships by modeling each of the parameters as a function of location. The GW-GLM performed best. Simulations with hypothetical datasets having different percentages of zeros showed that, regardless of the zero percentage in the data, GW-GLM performed best on average. Applying a range of bandwidth indicated that the GW-GLM was more robust to an overestimated bandwidth than an underestimated bandwidth.

Comment on “Methods to account for spatial autocorrelation in the analysis of species distributional data: a review”

Ecography, 2009

In a recent paper, (hereafter Dormann et al.) conducted a review of approaches to account for spatial autocorrelation in species distribution models. As the review was the first of its kind in the ecological literature it has the potential to be an important and influential source of information guiding research. Although many spatial autocovariance approaches may seem redundant in the spatial processes they reflect, seemingly subtle differences in approach can have major implications for the resulting description of the data and conclusions drawn. Though Dormann et al.'s review of the available approaches was a step in the right direction, we think that their simulation study ignored important concepts which leads us to question some of their conclusions.

Coefficient shifts in geographical ecology: an empirical evaluation of spatial and non-spatial regression

Ecography, 2009

A major focus of geographical ecology and macroecology is to understand the causes of spatially structured ecological patterns. However, achieving this understanding can be complicated when using multiple regression, because the relative importance of explanatory variables, as measured by regression coefficients, can shift depending on whether spatially explicit or non-spatial modeling is used. However, the extent to which coefficients may shift and why shifts occur are unclear. Here, we analyze the relationship between environmental predictors and the geographical distribution of species richness, body size, range size and abundance in 97 multi-factorial data sets. Our goal was to compare standardized partial

Spatial Econometric Issues for Bio-Economic and Land-Use Modelling

Journal of Agricultural Economics, 2007

We survey the literature on spatial bio-economic and land-use modelling and assess its thematic development. Unobserved site-specific heterogeneity is a feature of almost all the surveyed works, and this feature, it seems, has stimulated significant methodological innovation. In an attempt to improve the suitability with which the prototype incorporates heterogeneity, we consider modelling alternatives and extensions. We discuss solutions and conjecture others.