The normal model 1 Simplicial regression . The normal model (original) (raw)

Simplicial regression. The normal model

2011

Regression models with compositional response have been studied from the beginning of the log-ratio approach for analysing compositional data. These early approaches suggested the statistical hypothesis of logistic-normality of the compositional residuals to test the model and its coefficients. Also, the Dirichlet distribution has been proposed as an alternative model for compositional residuals, but it leads to restrictive and not easy-to-use regressions. Recent advances on the Euclidean geometry of the simplex and on the logistic-normal distribution allow re-formulating simplicial regression with logistic-normal residuals. Estimation of the model is presented as a least-squares problem in the simplex and is formulated in terms of orthonormal coordinates. This estimation decomposes into simple linear regression models which can be assessed independently. Marginal normality of the coordinate-residuals suffices to check influence of covariables using standard regression tests. Examples illustrate the proposed procedures.

Simplicial regression. The normal model 1 Simplicial regression. The normal model

2013

Regression models with compositional response have been studied from the beginning of the log-ratio approach for analysing compositional data. These early approaches suggested the statistical hypothesis of logistic-normality of the compositional residuals to test the model and its coefficients. Also, the Dirichlet distribution has been proposed as an alternative model for compositional residuals, but it leads to restrictive and not easy-to-use regressions. Recent advances on the Euclidean geometry of the simplex and on the logistic-normal distribution allow re-formulating simplicial regression with logistic-normal residuals. Estimation of the model is presented as a least-squares problem in the simplex and is formulated in terms of orthonormal coordinates. This estimation decomposes into simple linear regression models which can be assessed independently. Marginal normality of the coordinate-residuals suffices to check influence of covariables using standard regression tests. Exampl...

Modelling compositional data using dirichlet regression models

Compositional data are non-negative proportions with unit-sum. These types of data arise whenever we classify objects into disjoint categories and record their resulting relative frequencies, or partition a whole measurement into percentage contributions from its various parts. Under the unit-sum constraint, the elementary concepts of covariance and correlation are mis-leading. Therefore, compositional data are rarely analyzed with the usual multivariate statistical methods. Aitchison (1986) introduced the logratio analysis to model compositional data. Campbell and Mosimann (1987a) suggested the Dirichlet Covariate Model as a null model for such data. In this paper we investigate the Dirichlet Covariate Model and compare it to the logratio analysis. Maximum likelihood estimation methods are developed and the sampling distributions of these estimates are investigated. Measures of total variability and goodness of fit are proposed to assess the adequacy of the suggested models in anal...

Modelling Compositional Data. The Sample Space Approach

Handbook of Mathematical Geosciences, 2018

Compositions describe parts of a whole and carry relative information. Compositional data appear in all fields of science, and their analysis requires paying attention to the appropriate sample space. The log-ratio approach proposes the simplex, endowed with the Aitchison geometry, as an appropriate representation of the sample space. The main characteristics of the Aitchison geometry are presented, which open the door to statistical analysis addressed to extract the relative, not absolute, information. As a consequence, compositions can be represented in Cartesian coordinates by using an isometric log-ratio transformation. Standard statistical techniques can be used with these coordinates.

Geometric approach to statistical analysis on the simplex

Stochastic Environmental Research and Risk Assessment, 2001

The geometric interpretation of the expected value and the variance in real Euclidean space is used as a starting point to introduce metric counterparts on an arbitrary ®nite dimensional Hilbert space. This approach allows us to de®ne general reasonable properties for estimators of parameters, like metric unbiasedness and minimum metric variance, resulting in a useful tool to better understand the logratio approach to the statistical analysis of compositional data, who's natural sample space is the simplex.

A folded model for compositional data analysis

A folded type model is developed for analyzing compositional data. The proposed model, which is based upon the α-transformation for compositional data, provides a new and flexible class of distributions for modeling data defined on the simplex sample space. Despite its rather seemingly complex structure, employment of the EM algorithm guarantees efficient parameter estimation. The model is validated through simulation studies and examples which illustrate that the proposed model performs better in terms of capturing the data structure, when compared to the popular logistic normal distribution.

Analysis of compositional data using Dirichlet covariate models /

Typescript. Thesis (Ph. D.) -- American University, 2003. American University, Dept. of Mathematics and Statistics. Dissertation advisor: Robert W. Jernigan. Includes bibliographical references (leaves 148-151). Dissertation Abstracts: 64:1788B, Oct. 2003. University Microfilms, Inc. order no. 30-87067.

Modeling asymmetric compositional data

Acta Scientiarum. Technology, 2014

Compositional data belong to the simplex sample space, but they are transformed to the sample space of the real numbers using the additive log-ratio transformation to allow the application of standard statistical techniques. This study aims to model compositional skewed data of three soil components after additive log-ratio transformation. The current modeling was done for compositional data of sand, silt and clay (simplex), and bivariate data (real) using the standard skew theory with and without the inclusion of the covariate soil porosity. The analyses were run using the R statistical software and the package sn, and the goodness-of-fit was found after applying the covariate.

The alpha\alphaalpha-$k$-$NN$ regression for compositional data

arXiv: Methodology, 2020

Compositional data arise in many real-life applications and versatile methods for properly analyzing this type of data in the regression context are needed. This paper, through use of the alpha\alphaalpha-transformation, extends the classical kkk-$NN$ regression to what is termed alpha\alphaalpha-$k$-$NN$ regression, yielding a highly flexible non-parametric regression model for compositional data. Unlike many of the recommended regression models for compositional data, zeros values (which commonly occur in practice) are not problematic and they can be incorporated into the proposed model without modification. Extensive simulation studies and real-life data analyses highlight the advantage of using alpha\alphaalpha-$k$-$NN$ regression for complex relationships between the compositional response data and Euclidean predictor variables. Both suggest that alpha\alphaalpha-$k$-$NN$ regression can lead to more accurate predictions compared to current regression models which assume a, sometimes restrictive, parametric re...

Compositional data: the sample space and its structure

TEST, 2019

The log-ratio approach to compositional data (CoDa) analysis has now entered a mature phase. The principles and statistical tools introduced by J. Aitchison in the eighties have proven successful in solving a number of applied problems. The algebraic-geometric structure of the sample space, tailored to those principles, was developed at the beginning of the millennium. Two main ideas completed the J. Aitchison's seminal work: the conception of compositions as equivalence classes of proportional vectors, and their representation in the simplex endowed with an interpretable Euclidean structure. These achievements allowed the representation of compositions in meaningful coordinates (preferably Cartesian), as well as orthogonal projections compatible with the Aitchison distance introduced two decades before. These ideas and concepts are reviewed up to the normal distribution on the simplex and the associated central limit theorem. Exploratory tools, specifically designed for CoDa, are also reviewed. To illustrate the adequacy and interpretability of the sample space structure, a new inequality index, based on the Aitchison norm, is proposed. Most concepts are illustrated with an example of mean household gross income per capita in Spain.