Cluster analysis applied to regional geochemical data: Problems and possibilities (original) (raw)

Cluster analysis applied to regional geochemical

A large regional geochemical data set of O-horizon samples from a 188,000 km 2 area in the European Arctic, analysed for 38 chemical elements, pH, electrical conductivity (both in a water extraction) and loss on ignition (LOI, 480 o C), was used to test the influence of different variants of cluster analysis on the results obtained. Due to the nature of regional geochemical data (neither normal nor log-normal, strongly skewed, often multi-modal data distributions), cluster analysis results usually strongly depend on the clustering algorithm selected. Deleting or adding just one element (variable) in the input matrix can also drastically change the results of cluster analysis. Different variants of cluster analysis can lead to surprisingly different results even when using exactly the same input data. Given that selection of elements is often based on availability of analytical packages (or detection limits) rather than on geochemical reasoning this is a disturbing result. Cluster analysis can be used to group samples and to develop ideas about the multivariate geochemistry of the data set at hand. It should not be misused as a statistical "proof" of certain relationships in the data. The use of cluster analysis as an exploratory data analysis tool requires a powerful program system, able to present the results in a number of easy to grasp graphics. In the context of this work, such a tool has been developed as a package for the R statistical software.

Cluster analysis for time series based on organic geochemical proxies

Organic Geochemistry, 2020

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Cluster analysis of a regional-scale soil geochemical dataset in northern California

Applied Geochemistry, 2011

a b s t r a c t A regional-scale soil geochemical study was conducted within a 22,000 km 2 area in northern California including the Sierra Nevada, Sacramento Valley, and northern Coast Range. Over 1300 soil samples were chemically analyzed for 42 elements. The distribution of distinct groups of elements demonstrates the interplay of geologic, hydrologic, geomorphologic and anthropogenic factors; however, it is difficult to fully appreciate the complexity of geochemical transport and weathering processes on a landscape-scale in an area of very complex geology with such a large dataset containing more than 40 variables. To examine the data from a perspective of multi-element groupings, cluster analyses were applied to the dataset. The analysis identified several groups of elements whose spatial patterns could be related to specific geologic sources.

The application of fuzzy c-means cluster analysis and non-linear mapping to geochemical datasets: examples from Portugal

Applied Geochemistry, 1988

In the interpretation of relatively small multivariate datasets, deviations from homogeneity may cause severe problems. In these cases fuzzy c-means cluster analysis (FCM) and non-linear mapping (NLM) are conceptionally suited to discern structure in the datasets. Particularly, the combined use of FCM and NLM furnishes a powerful method to find meaningful data groupings within a dataset. This is illustrated with two case studies, for water and combined water and stream sediment analyses, respectively, where FCM and NLM were applied. The results are easily related to geology, mineral occurrences and environmental factors.

Geochemical data handling, using multivariate statistical methods for environmental monitoring and pollution studies

Environmental Technology & Innovation, 2020

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

The management and analysis of regional geochemical data

Journal of Geochemical Exploration, 1980

Experiences in the U.S. National Uranium Resource Evaluation Program and Canadian Uranium Reconnaissance Program related to the use of geochemical exploration methods are drawn upon to review concepts and standards for data management and analysis. The topics discussed include: (a) field data acquisition; (b) quality control; (c) data management; (d) univariate statistical analysis; (e) initial data presentation; (f) multivariate statistical analysis; and (g) derivative maps. It is concluded that computer usage is essential if the data are to be compiled to desired levels of quality assurance in a timely and efficient fashion for interpretation and distribution. In general the costs for data management and analysis are some 10% of total costs; of this 10% some three-quarters is expended on compilation, quality control, editing and archiving. The use of the mathematical and statistical methods, with appropriate presentation techniques, can greatly assist the geochemist in interpretation. The objective of geochemical data analysis in the context of this paper is to identify that small proportion of the samples which relate to mineralization. A variety of tools, both simple and complex, are illustrated and discussed from this viewpoint. However, it is stressed that data analysis is only a tool and the results must be critically reviewed in terms of their geochemical implications before acceptance and incorporation into an interpretation.

Geochemical regional surveys: comparative analysis of data from soils and stream sediments

Geochemical maps based on soils and on stream sediments were compared. It was found that the natural background for elements in the two media is quite similar. The same applies to their spatial patterns, except for P. The ratio stream sediment/soil element concentrations increases with increasing soil acidity for Ca, Cu, K, Ba, and also for Mn, Sr and Zn, suggesting that element mobilization under low pH is an important process controlling sediment composition.

Multivariate analysis of regional-scale geochemical data for environmental monitoring

2016

A compositional multivariate approach is used to analyse regional scale soil geochemical data obtained as part of the Tellus Project generated by the Geological Survey Northern Ireland (GSNI). The multi-element total concentration data presented comprise XRF analyses of 6862 rural soil samples collected at 20cm depths on a non-aligned grid at one site per 2 km. Censored data were imputed using published detection limits. Using these imputed values for 46 elements (including LOI), each soil sample site was assigned to the regional geology map provided by GSNI initially using the dominant lithology for the map polygon. Northern Ireland includes a diversity of geology representing a stratigraphic record from the Mesoproterozoic, up to and including the Palaeogene. However, the advance of ice sheets and their meltwaters over the last 100,000 years has left at least 80% of the bedrock covered by superficial deposits, including glacial till and post-glacial alluvium and peat. The question...

Application of a normalization procedure in determining regional geochemical baselines

Environmental Earth Sciences, 1997

The regional variability of some geochemical parameters in the Gulf of Trieste is considered in terms of their relationship with Al, used as a normalization factor. Baselines calculated from these relationships are used to determine a simple enrichment factor for each element, defined as the ratio between the actual and predicted baseline value. The normalization procedure permits a new non dimensional reference baseline to be obtained that could help to assess the size of possible anomalies and to provide information on the diffusion and dispersion patterns of pollutants inside the monitored area.