PARAFAC. Tutorial and applications (original) (raw)

Trilinear least-squares and unfolded-PLS coupled to residual trilinearization: New chemometric tools for the analysis of four-way instrumental data

Chemometrics and Intelligent Laboratory Systems, 2006

A new chemometric algorithm, trilinear least-squares (TLLS) coupled to residual trilinearization (RTL), developed as an extension of one of the variants of bilinear least-squares (BLLS), the so-called singular value decomposition-least-squares (SVD-LS) coupled to residual bilinearization (RBL), is described for the analysis of four-way data. Monte Carlo numerical simulations are employed to compare its performance with that of the well-known parallel factor analysis (PARAFAC) model, and also with the combination of unfolded partial leastsquares (PLS) with RTL. This latter method has been developed as an extension of unfolded-PLS coupled to RBL. An experimental system based on the kinetic measurement of the evolution of the excitation-emission fluorescence matrices with time for mixtures of two anticancer drugs is also studied. The kinetic reaction is based on the oxidation of leucovorin and metotrexate with potassium permanganate to give highly fluorescent compounds. Both chemometric methodologies exploit the second-order advantage of the employed multi-way data, allowing analyte concentrations to be estimated even in the presence of uncalibrated components in the samples. The new methods herein described constitute new competitors of PARAFAC for this type of analyses, in which four-way instrumental data are used.

Chemometrics and Intelligent Laboratory Systems

1997

This paper explains the multi-way decomposition method PARAFAC and its use in chemometrics. PARAFAC is a generalization of PCA to higher order arrays, but some of the characteristics of the method are quite different from the ordinary two-way case. There is no rotation problem in PARAFAC, and e.g., pure spectra can be recovered from multi-way spectral data. One cannot as in PCA estimate components successively as this will give a model with poorer fit, than if the simultaneous solution is estimated. Finally scaling and centering is not as straightforward in the multi-way case as in the two-way case. An important advantage of using multi-way methods instead of unfolding methods is that the estimated models are very simple in a mathematical sense, and therefore more robust and easier to interpret. All these aspects plus more are explained in this tutorial and an implementation in Matlab code is available, that contains most of the features explained in the text. Three examples show ho...

Multi-way principal components-and PLS-analysis

Journal of Chemometrics, 1987

The Lohmoller-Wold decomposition of multi-way (three-way, four-way, etc.) data arrays is combined with the non-linear partial least squares (NIPALS) algorithms to provide multi-way solutions of principal components analysis (PCA) and partial least squares modelling in latent variables (PLS).

A 1999 Multiway data in chemometrics— exploratory data analysis in chemistry with soft multi-linear modelling (http://www.stat.fi/isi99/proceedings/ arkisto/varasto/ande0487.pdf

2014

The most popular models for the analysis of multi-way data structures, i.e., the Tucker and CANDECOMP-PARAFAC (CP) models, were developed in the domain of numerical psychology. However, data collected by many modern analytical chemical instruments are multi-way data structures per definition, and many applications have shown that data analysis gains in robustness and information outcome by deriving meaningful parameters from appropriately chosen multi-linear models of such measurements. The current presentation covers the two basic models of widest general interest and two applications from the sugar industry. Both applications are based on fluorescence measurements of aqueous samples. In the first application, we motivate that the parameters estimated by the three-way CP model are estimates of the pure underlying spectral profiles of the observed chemical species. Hence, the CP model is applied in a curve-resolution approach and the estimated parameters allow us to uniquely identif...

A chemometrics toolbox based on projections and latent variables

Journal of Chemometrics, 2014

A personal view is given about the gradual development of projection methods-also called bilinear, latent variable, and more-and their use in chemometrics. We start with the principal components analysis (PCA) being the basis for more elaborate methods for more complex problems such as soft independent modeling of class analogy, partial least squares (PLS), hierarchical PCA and PLS, PLS-discriminant analysis, Orthogonal projection to latent structures (OPLS), OPLS-discriminant analysis and more. From its start around 1970, this development was strongly influenced by Bruce Kowalski and his group in Seattle, and his realization that the multidimensional data profiles emerging from spectrometers, chromatographs, and other electronic instruments, contained interesting information that was not recognized by the current one variable at a time approaches to chemical data analysis. This led to the adoption of what in statistics is called the data analytical approach, often called also the data driven approach, soft modeling, and more. This approach combined with PCA and later PLS, turned out to work very well in the analysis of chemical data. This because of the close correspondence between, on the one hand, the matrix decomposition at the heart of PCA and PLS and, on the other hand, the analogy concept on which so much of chemical theory and experimentation are based. This extends to numerical and conceptual stability and good approximation properties of these models. The development is informally summarized and described and illustrated by a few examples and anecdotes.

Do Spectra Live in the Matrix? A Brief Tutorial on Applications of Factor Analysis to Resolving Spectral Datasets of Mixtures

Journal of Fluorescence

In spite of a rapid growth of data processing software, that has allowed for a huge advancement in many fields of chemistry, some research issues still remain problematic. A standard example of a troublesome challenge is the analysis of multi-component mixtures. The classical approach to such a problem consists of separating each component from a sample and performing individual measurements. The advent of computers, however, gave rise to a relatively new domain of data processing – chemometry – focused on decomposing signal recorded for the sample rather than the sample itself. Regrettably, still a very few chemometric methods are practically used in everyday laboratory routines. The Authors believe that a brief ‘user-friendly’ guide-like article on several ‘flagship’ algorithms of chemometrics may, at least partly, stimulate an increased interest in the use of these techniques among researchers specializing in many fields of chemistry. In the paper, five different techniques of fa...

Multi‐Way Analysis with Applications in the Chemical Sciences

2004

is multi-way analysis? 1.2 Conceptual aspects of multi-way data analysis 1.3 Hierarchy of multivariate data structures in chemistry 1.4 Principal component analysis and PARAFAC 1.5 Summary 2 Array definitions and properties 2.1 Introduction 2.2 Rows, columns and tubes; frontal, lateral and horizontal slices 2.3 Elementary operations 2.4 Linearity concepts 2.5 Rank of two-way arrays 2.6 Rank of three-way arrays 2.7 Algebra of multi-way analysis 2.8 Summary Appendix 2.A 3 Two-way component and regression models 3.1 Models for two-way one-block data analysis: component models 3.2 Models for two-way two-block data analysis: regression models 3.3 Summary Appendix 3.A: some PCA results Appendix 3.B: PLS algorithms vi Contents 4 Three-way component and regression models 4.1 Historical introduction to multi-way models 4.2 Models for three-way one-block data: three-way component models 4.3 Models for three-way two-block data: three-way regression models 4.4 Summary Appendix 4.A: alternative notation for the PARAFAC model Appendix 4.B: alternative notations for the Tucker3 model 5 Some properties of three-way component models 5.1 Relationships between three-way component models 5.2 Rotational freedom and uniqueness in three-way component models 5.3 Properties of Tucker3 models 5.4 Degeneracy problem in PARAFAC models 5.5 Summary 6 Algorithms 6.1 Introduction 6.2 Optimization techniques 6.3 PARAFAC algorithms 6.4 Tucker3 algorithms 6.5 Tucker2 and Tucker1 algorithms 6.6 Multi-linear partial least squares regression 6.7 Multi-way covariates regression models 6.8 Core rotation in Tucker3 models 6.9 Handling missing data 6.10 Imposing non-negativity 6.11 Summary Appendix 6.A: closed-form solution for the PARAFAC model Appendix 6.B: proof that the weights in trilinear PLS1 can be obtained from a singular value decomposition 7 Validation and diagnostics 7.1 What is validation? 7.2 Test-set and cross-validation 7.3 Selecting which model to use 7.4 Selecting the number of components 7.

A comparison of algorithms for fitting the PARAFAC model

Computational Statistics & Data Analysis, 2006

A multitude of algorithms have been developed to fit a trilinear PARAFAC model to a three-way array. Limits and advantages of some of the available methods (i.e. GRAM-DTLD, PARAFAC-ALS, ASD, SWATLD, PMF3 and dGN) are compared. The algorithms are explained in general terms together with two approaches to accelerate them: line search and compression. In order to compare the different methods, 720 sets of artificial data were generated with varying level and type of noise, collinearity of the factors and rank. Two PARAFAC models were fitted on each data set: the first having the correct number of factors F and the second with F + 1 components (the objective being to assess the sensitivity of the different approaches to the over-factoring problem, i.e. when the number of extracted components exceeds the rank of the array). The algorithms have also been tested on two real data sets of fluorescence measurements, again by extracting both the right and an exceeding number of factors. The evaluations are based on: number of iterations necessary to reach convergence, time consumption, quality of the solution and amount of resources required for the calculations (primarily memory).

Principal component analysis

Principal component analysis is one of the most important and powerful methods in chemometrics as well as in a wealth of other areas. This paper provides a description of how to understand, use, and interpret principal component analysis. The paper focuses on the use of principal component analysis in typical chemometric areas but the results are generally applicable.