Compositional data: the sample space and its structure (original) (raw)
Related papers
Modelling Compositional Data. The Sample Space Approach
Handbook of Mathematical Geosciences, 2018
Compositions describe parts of a whole and carry relative information. Compositional data appear in all fields of science, and their analysis requires paying attention to the appropriate sample space. The log-ratio approach proposes the simplex, endowed with the Aitchison geometry, as an appropriate representation of the sample space. The main characteristics of the Aitchison geometry are presented, which open the door to statistical analysis addressed to extract the relative, not absolute, information. As a consequence, compositions can be represented in Cartesian coordinates by using an isometric log-ratio transformation. Standard statistical techniques can be used with these coordinates.
Aitchison's Compositional Data Analysis 40 Years On: A Reappraisal
2022
The development of John Aitchison's approach to compositional data analysis is followed since his paper read to the Royal Statistical Society in 1982. Aitchison's logratio approach, which was proposed to solve the problematic aspects of working with data with a fixed sum constraint, is summarized and reappraised. It is maintained that the principles on which this approach was originally built, the main one being subcompositional coherence, are not required to be satisfied exactly -- quasi-coherence is sufficient in practice. This opens up the field to using simpler data transformations with easier interpretations and also for variable selection to be possible to make results parsimonious. The additional principle of exact isometry, which was subsequently introduced and not in Aitchison's original conception, imposed the use of isometric logratio transformations, but these have been shown to be problematic to interpret. If this principle is regarded as important, it can b...
Compositional data and their analysis: an introduction
Geological Society, London, Special Publications, 2006
Compositional data are those which contain only relative information. They are parts of some whole. In most cases they are recorded as closed data, i.e. data summing to a constant, such as 100% -whole-rock geochemical data being classic examples. Compositional data have important and particular properties that preclude the application of standard statistical techniques on such data in raw form. Standard techniques are designed to be used with data that are free to range from -oo to +oo. Compositional data are always positive and range only from 0 to 100, or any other constant, when given in closed form. If one component increases, others must, perforce, decrease, whether or not there is a genetic link between these components. This means that the results of standard statistical analysis of the relationships between raw components or parts in a compositional dataset are clouded by spurious effects. Although such analyses may give apparently interpretable results, they are, at best, approximations and need to be treated with considerable circumspection. The methods outlined in this volume are based on the premise that it is the relative variation of components which is of interest, rather than absolute variation. Log-ratios of components provide the natural means of studying compositional data. In this contribution the basic terms and operations are introduced using simple numerical examples to illustrate their computation and to familiarize the reader with their use.
Groups of Parts and Their Balances in Compositional Data Analysis
Mathematical Geology, 2005
Amalgamation of parts of a composition has been extensively used as a technique of analysis to achieve reduced dimension, as was discussed during the CoDaWork'03 meeting (Girona, Spain, 2003). It was shown to be a non-linear operation in the simplex that does not preserve distances under perturbation. The discussion motivated the introduction in the present paper of concepts such as group of parts, balance between groups, and sequential binary partition, which are intended to provide tools of compositional data analysis for dimension reduction. Key concepts underlying this development are the established tools of subcomposition, coordinates in an orthogonal basis of the simplex, balancing element and, in general, the Aitchison geometry in the simplex. Main new results are: a method to analyze grouped parts of a compositional vector through the adequate coordinates in an ad hoc orthonormal basis; and the study of balances of groups of parts (inter-group analysis) as an orthogonal projection similar to that used in standard subcompositional analysis (intra-group analysis). A simulated example compares results when testing equal centers of two populations using amalgamated parts and balances; it shows that, in certain circumstances, results from both analysis can disagree.
The Mathematics of Compositional Analysis
Austrian Journal of Statistics, 2016
The term compositional data analysis is historically associated to the approach based on the logratio transformations introduced in the eighties. Two main principles of this methodology are scale invariance and subcompositional coherence. New developments and concepts emerged in the last decade revealed the need to clarify the concepts of compositions, compositional sample space and subcomposition. In this work the mathematics of compositional analysis based on equivalence relation is presented. A logarithmic isomorphism between quotient spaces induces a metric space structure for compositions. The logratio compositional analysis is the statistical analysis of compositions based on this structure, consisting of analysing logratio coordinates.
Compositional Data Analysis: Where Are We and Where Should We Be Heading?
Mathematical Geology, 2005
We take stock of the present position of compositional data analysis, of what has been achieved in the last 20 years, and then make suggestions as to what may be sensible avenues of future research. We take an uncompromisingly applied mathematical view, that the challenge of solving practical problems should motivate our theoretical research; and that any new theory should be thoroughly investigated to see if it may provide answers to previously abandoned practical considerations. Indeed a main theme of this lecture will be to demonstrate this applied mathematical approach by a number of challenging examples.
Foreword: Advances in Compositional Data
Mathematical Geology, 2005
This issue of Mathematical Geology presents some results of CODAWORK' 03, a workshop on compositional data analysis held at the University of Girona (Spain) in October 2003. The aim of this workshop was to bring together people from different branches of sciences-mathematicians, statisticians, geologists, archeologists, economists, biologists, sociologists, computer scientists-with one common denominator: their interest in compositional data analysis. About 40 scientists from all over the world worked together for 3 days, putting forward not only new advances both in theory and applications, but also doubts and problems regarding how to deal with this kind of data. An intense and fruitful debate was the consequence and a general wish to get together again in due time to follow up the progress made in the meanwhile. The papers collected in this special issue are a selection of those presented at the meeting, updated and reviewed, which reflect the essential parts of the state of the art of compositional data analysis in the geosciences. A short note introduces first the basic concepts underlying compositional data analysis, and shows how to compute them with simple examples. The next four papers are essentially case studies; two of them use hydrological data, and two petrological data to show how available methods can be applied and what can be get out of them. The next paper introduces the reader to a software package which can be useful in starting a proper analysis of compositional data and which can be downloaded from internet for free (the address is included in the paper). An overview follows, which should be of interest to all those who would like either to introduce themselves to this fascinating field, or who want to keep track in general of what is going on. Finally, the last paper suggests an answer to a point of intense debate at the meeting, namely: is amalgamation a dimension reducing technique which is compatible with the Aitchison geometry on the simplex? Although it is an essentially theoretical paper, it opens new paths for applied research.
Exploring Compositional Data with the CoDa-Dendrogram
2011
Within the special geometry of the simplex, the sample space of compositional data, compositional orthonormal coordinates allow the application of any multivariate statistical approach. The search for meaningful coordinates has suggested balances (between two groups of parts)-based on a sequential binary partition of a D-part composition-and a representation in form of a CoDa-dendrogram. Projected samples are represented in a dendrogram-like graph showing: (a) the way of grouping parts; (b) the explanatory role of subcompositions generated in the partition process; (c) the decomposition of the variance; (d) the center and quantiles of each balance. The representation is useful for the interpretation of balances and to describe the sample in a single diagram independently of the number of parts. Also, samples of two or more populations, as well as several samples from the same population, can be represented in the same graph, as long as they have the same parts registered. The approach is illustrated with an example of food consumption in Europe.