CoDaPack. An Excel and Visual Basic based software of compositional data analysis. Current version and discussion form upcoming versions (original) (raw)

Compositional data and their analysis: an introduction

Geological Society, London, Special Publications, 2006

Compositional data are those which contain only relative information. They are parts of some whole. In most cases they are recorded as closed data, i.e. data summing to a constant, such as 100% -whole-rock geochemical data being classic examples. Compositional data have important and particular properties that preclude the application of standard statistical techniques on such data in raw form. Standard techniques are designed to be used with data that are free to range from -oo to +oo. Compositional data are always positive and range only from 0 to 100, or any other constant, when given in closed form. If one component increases, others must, perforce, decrease, whether or not there is a genetic link between these components. This means that the results of standard statistical analysis of the relationships between raw components or parts in a compositional dataset are clouded by spurious effects. Although such analyses may give apparently interpretable results, they are, at best, approximations and need to be treated with considerable circumspection. The methods outlined in this volume are based on the premise that it is the relative variation of components which is of interest, rather than absolute variation. Log-ratios of components provide the natural means of studying compositional data. In this contribution the basic terms and operations are introduced using simple numerical examples to illustrate their computation and to familiarize the reader with their use.

Why and how should geologists use compositional data analysis?

Wikibooks, 2008

Compositional data arise naturally in several branches of science, including geology. In geochemistry, for example, these constrained data seem to occur typically, when one normalizes raw data or when one obtains the output from a constrained estimation procedure, such as parts per one, percentages, ppm, ppb, molar concentrations, etc. Compositional data have proved difficult to handle statistically because of the awkward constraint that the components of each vector must sum to unity. The special property of compositional data (the fact that the determinations on each specimen sum to a constant) means that the variables involved in the study occur in constrained space defined by the simplex, a restricted part of real space. Pearson was the first to point out dangers that may befall the analyst who attempts to interpret correlations between Ratios whose numerators and denominators contain common parts. More recently, Aitchison, Pawlowsky-Glahn, S. Thió, and other statisticians have develop the concept of Compositional Data Analysis, pointing out the dangers of misinterpretation of closed data when treated with “normal” statistical methods It is important for geochemists and geologists in general to be aware that the usual multivariate statistical techniques are not applicable to constrained data. It is also important for us to have access to appropriate techniques as they become available. This is the principal aim of this book. From a hypothetical model of a copper mineralization associated to a felsic intrusive, with specific relationships between certain elements, I will show how “normal” correlation methods fail to identify some of such embedded relationships and how we can obtain other spurious correlations. From there, I will test the same model after transforming the data using the CRL, ARL, and IRL transformations with the aid of the CoDaPack software. Since I addressed this publication to geologists and geoscientists in general, I have kept to a minimum the mathematical formulae and did not include any theoretical demonstration. The “mathematical curios geologist”, if such category exists, can find all of those in a list of recommended sources in the reference section.

Compositional Data in Geostatistics: A Log-Ratio Based Framework to Analyze Regionalized Compositions

Mathematical Geosciences

Problems with compositional data, like spurious correlation and negative bias, are well known in the Geosciences. Not so well known is the fact that the same problems appear when dealing with regionalized compositions. Here, these problems are illustrated, and a solution, based on the principle of working in coordinates using orthonormal logratio representations, is presented. This approach offers a tool for standard geostatistical studies. One of the advantages the method has is that it allows the usual inconsistencies with indicator kriging to be overcome through simplicial indicator kriging. A general way of modelling crossvariograms of coordinates, based on the matrix valued variation variogram, is discussed. In summary, the main aspects related to the modelling and analysis of regionalized compositions have had satisfactory solutions found for them. The proposed methodology is illustrated with public data from a survey concerning arsenic contamination in underground water in Bangladesh. This research has received financial support from the Spanish Ministry of Education and Science under project METhods for COmpositional analysis of DAta (CODAMET) (Ref: RTI2018-095518-B-C21, 2019-2021). We thank E. Grunsky and an anonymous reviewer for their constructive comments, which contributed to improving the paper.

Foreword: Advances in Compositional Data

Mathematical Geology, 2005

This issue of Mathematical Geology presents some results of CODAWORK' 03, a workshop on compositional data analysis held at the University of Girona (Spain) in October 2003. The aim of this workshop was to bring together people from different branches of sciences-mathematicians, statisticians, geologists, archeologists, economists, biologists, sociologists, computer scientists-with one common denominator: their interest in compositional data analysis. About 40 scientists from all over the world worked together for 3 days, putting forward not only new advances both in theory and applications, but also doubts and problems regarding how to deal with this kind of data. An intense and fruitful debate was the consequence and a general wish to get together again in due time to follow up the progress made in the meanwhile. The papers collected in this special issue are a selection of those presented at the meeting, updated and reviewed, which reflect the essential parts of the state of the art of compositional data analysis in the geosciences. A short note introduces first the basic concepts underlying compositional data analysis, and shows how to compute them with simple examples. The next four papers are essentially case studies; two of them use hydrological data, and two petrological data to show how available methods can be applied and what can be get out of them. The next paper introduces the reader to a software package which can be useful in starting a proper analysis of compositional data and which can be downloaded from internet for free (the address is included in the paper). An overview follows, which should be of interest to all those who would like either to introduce themselves to this fascinating field, or who want to keep track in general of what is going on. Finally, the last paper suggests an answer to a point of intense debate at the meeting, namely: is amalgamation a dimension reducing technique which is compatible with the Aitchison geometry on the simplex? Although it is an essentially theoretical paper, it opens new paths for applied research.

Aitchison’s Compositional Data Analysis 40 Years on: A Reappraisal

Statistical Science

The development of John Aitchison's approach to compositional data analysis is followed since his paper read to the Royal Statistical Society in 1982. Aitchison's logratio approach, which was proposed to solve the problematic aspects of working with data with a fixed sum constraint, is summarized and reappraised. It is maintained that the principles on which this approach was originally built, the main one being subcompositional coherence, are not required to be satisfied exactly-quasi-coherence is sufficient in practice. This opens up the field to using simpler data transformations with easier interpretations and also for variable selection to be possible to make results parsimonious. The additional principle of exact isometry, which was subsequently introduced and not in Aitchison's original conception, imposed the use of isometric logratio transformations, but these have been shown to be problematic to interpret. If this principle is regarded as important, it can be relaxed by showing that simpler transformations are quasi-isometric. It is concluded that the isometric and related logratio transformations such as pivot logratios are not a prerequisite for good practice, and this conclusion is fully supported by a case study in geochemistry provided as an appendix.

An Experimental Comparison of Cokriging of Regionalized Compositional Data Using Four Different Methods. Case Study: Bauxites in Hungary

An important problem in the geosciences is the estimation or prediction of regionalized compositions. In fact, it is usual to deal with data such as percentages, concentrations, ppm,...., and use them to estimate values in other locations. Compositional data have been regarded as difficult to work with because of the so-called constant sum constraint. Following Aitchison (1986), any meaningful statement about a composition can be expressed in terms of logratios, but those transformations, and their backtransformations, are not always easy to deal with. The aim of this paper is to compare results obtained applying different methodologies developed in geostatistics, with samples of compositional data from a bauxite deposit in Halimba II (Hungary). Firstly, a classical geostatistics study is done using raw data; secondly applying two wellknown transformations in compositional data analysis: additive logratio (ALR) and centered logratio (CLR); thirdly, the Fast Fourier Transform (FFT) m...

An experimental comparison of cokriging of regionalised compositional data using four different methods: case study3A Bauxites in Hungary-2

2004

An important problem in the geosciences is the etimation or prediction or regionalized compositions. In fact, it is usual to deal with data such as percentages, concentrations, ppm,..., and use them to estimate values in other locations. Compositional data have been regarded as difficult to work with because of the so-called constant sum constraint. Following Aitchison(1986), any meaningful satatement abaut a composition can be expressed in terms of logratios, but those transformations, and their backtransformations, are not always easy to deal with. The aim of this paper is to compare results obtained applying different methodologies developed in geostatistics, with samples of compositional data from a bauxite deposit in Halimba II (Hungary). Firstly, a classical geostatistics study is done usin raw data; secondly applyin two wellknown transormations in compositional data analysis: additive logratio (ALR) and centered logratio (CLR); thirdly, the Fast Fourier Transform (FFT) method...