Hierarchical models in ecology: confidence intervals, hypothesis testing, and model selection using data cloning (original) (raw)

Data cloning: easy maximum likelihood estimation for complex ecological models using Bayesian Markov chain Monte Carlo methods

Ecology Letters, 2007

We introduce a new statistical computing method, called data cloning, to calculate maximum likelihood estimates and their standard errors for complex ecological models. Although the method uses the Bayesian framework and exploits the computational simplicity of the Markov chain Monte Carlo (MCMC) algorithms, it provides valid frequentist inferences such as the maximum likelihood estimates and their standard errors. The inferences are completely invariant to the choice of the prior distributions and therefore avoid the inherent subjectivity of the Bayesian approach. The data cloning method is easily implemented using standard MCMC software. Data cloning is particularly useful for analysing ecological situations in which hierarchical statistical models, such as state-space models and mixed effects models, are appropriate. We illustrate the method by fitting two nonlinear population dynamics models to data in the presence of process and observation noise.

Binomial-beta hierarchical models for ecological inference

1999

Abstract The authors develop binomial-beta hierarchical models for ecological inference using insights from the literature on hierarchical models based on Markov chain Monte Carlo algorithms and King's ecological inference model. The new approach reveals some features of the data that King's approach does not, can be easily generalized to more complicated problems such as general R× C tables, allows the data analyst to adjust for covariates, and provides a formal evaluation of the significance of the covariates.

Accounting for uncertainty in ecological analysis: the strengths and limitations of hierarchical statistical modeling

Ecological Applications, 2009

Analyses of ecological data should account for the uncertainty in the process(es) that generated the data. However, accounting for these uncertainties is a difficult task, since ecology is known for its complexity. Measurement and/or process errors are often the only sources of uncertainty modeled when addressing complex ecological problems, yet analyses should also account for uncertainty in sampling design, in model specification, in parameters governing the specified model, and in initial and boundary conditions. Only then can we be confident in the scientific inferences and forecasts made from an analysis. Probability and statistics provide a framework that accounts for multiple sources of uncertainty. Given the complexities of ecological studies, the hierarchical statistical model is an invaluable tool. This approach is not new in ecology, and there are many examples (both Bayesian and non-Bayesian) in the literature illustrating the benefits of this approach. In this article, we provide a baseline for concepts, notation, and methods, from which discussion on hierarchical statistical modeling in ecology can proceed. We have also planted some seeds for discussion and tried to show where the practical difficulties lie. Our thesis is that hierarchical statistical modeling is a powerful way of approaching ecological analysis in the presence of inevitable but quantifiable uncertainties, even if practical issues sometimes require pragmatic compromises.

Ecological non-linear state space model selection via adaptive particle Markov chain Monte Carlo (AdPMCMC)

2010

We develop a novel advanced Particle Markov chain Monte Carlo algorithm that is capable of sampling from the posterior distribution of non-linear state space models for both the unobserved latent states and the unknown model parameters. We apply this novel methodology to five population growth models, including models with strong and weak Allee effects, and test if it can efficiently sample from the complex likelihood surface that is often associated with these models. Utilising real and also synthetically generated data sets we examine the extent to which observation noise and process error may frustrate efforts to choose between these models.

The Binomial–Beta Hierarchical Model for Ecological Inference: Methodological Issues and Fast Implementation via the ECM Algorithm

2002

The binomial-beta hierarchical model from is a recent contribution to ecological inference. Developed for the 2x2 tables case and from a bayesian perspective, the model is featured by the compounding of binomial and beta distributions into a hierarchical structure. From a sample of aggregate observations, inference with this model can be made regarding values of unobservable disaggregate variables. The paper reviews this EI model with two purposes: First, a faster approach to use it in practice, based on explicit modeling of the disaggregate data generation process along with posterior maximization implemented via the ECM algorithm, is proposed and illustrated with an application to a real dataset; second, limitations concerning the use of marginal posteriors for binomial probabilities as the vehicle of inference (basically, the failure to respect the accounting identity) instead of the predictive distributions for the disaggregate proportions are pointed. In the concluding section, principles for EI model building in general and directions for further research are suggested.

Bayesian and frequentist inference for ecological inference: The R× C case

2001

In this paper we propose Bayesian and frequentist approaches to ecological inference, based on R 3 C contingency tables, including a covariate. The proposed Bayesian model extends the binomial-beta hierarchical model developed by KING, ROSEN and TANNER (1999) from the 2 3 2 case to the R 3 C case. As in the 2 3 2 case, the inferential procedure employs Markov chain Monte Carlo (MCMC) methods. As such, the resulting MCMC analysis is rich but computationally intensive.

A unified approach to model selection using the likelihood ratio test

Methods in Ecology and Evolution, 2011

1. Ecological count data typically exhibit complexities such as overdispersion and zero-inflation, and are often weakly associated with a relatively large number of correlated covariates. The use of an appropriate statistical model for inference is therefore essential. A common selection criteria for choosing between nested models is the likelihood ratio test (LRT). Widely used alternatives to the LRT are based on information-theoretic metrics such as the Akaike Information Criterion. 2. It is widely believed that the LRT can only be used to compare the performance of nested models -i.e. in situations where one model is a special case of another. There are many situations in which it is important to compare non-nested models, so, if true, this would be a substantial drawback of using LRTs for model comparison. In reality, however, it is actually possible to use the LRT for comparing both nested and non-nested models. This fact is well-established in the statistical literature, but not widely used in ecological studies. 3. The main obstacle to the use of the LRT with non-nested models has, until relatively recently, been the fact that it is difficult to explicitly write down a formula for the distribution of the LRT statistic under the null hypothesis that one of the models is true. With modern computing power it is possible to overcome this difficulty by using a simulation-based approach. 4. To demonstrate the practical application of the LRT to both nested and non-nested model comparisons, a case study involving data on questing tick (Ixodes ricinus) abundance is presented. These data contain complexities typical in ecological analyses, such as zero-inflation and overdispersion, for which comparison between models of differing structure -e.g. non-nested models -is of particular importance. 5. Choosing between competing statistical models is an essential part of any applied ecological analysis. The LRT is a standard statistical test for comparing nested models. By use of simulation the LRT can also be used in an analogous fashion to compare non-nested models, thereby providing a unified approach for model comparison within the null hypothesis testing paradigm. A simple practical guide is provided in how to apply this approach to the key models required in the analyses of count data.

Multi-Model Inference in Biogeography

Multi-model inference (MMI) aims to contribute to the production of scientific knowledge by simultaneously comparing the evidence data provide for multiple hypotheses, each represented as a model. With roots in the method of ‘multiple working hypotheses’, MMI techniques have been advocated as an alternative to null-hypothesis significance testing. In this paper, we review two complementary MMI techniques – model selection and model averaging – and highlight examples of their use by biogeographers. Model selection provides a means to simultaneously compare multiple models to evaluate how well each is supported by data, and potentially to identify the best supported model(s). When model selection indicates no clear ‘best’ model, model averaging is useful to account for parameter uncertainty. Both techniques can be implemented in information theoretic and Bayesian frameworks and we outline the debate about interpretations of the different approaches. We summarise recommendations for avoiding philosophical and methodological pitfalls, and suggest when each technique is best used. We advocate a pragmatic approach to MMI, one that emphasises the ‘thoughtful, science-based, a priori’ modelling that others have argued is vital to ensure valid scientific inference.