Jack-knife technique for outlier detection and estimation of standard errors in PARAFAC models (original) (raw)

Detecting outlying samples in a parallel factor analysis model

Analytica Chimica Acta, 2011

To explore multi-way data, different methods have been proposed. Here, we study the popular PARAFAC (Parallel factor analysis) model, which expresses multi-way data in a more compact way, without ignoring the underlying complex structure. To estimate the score and loading matrices, an alternating least squares procedure is typically used. It is however well known that least squares techniques suffer from outlying observations, making the models useless when outliers are present in the data. In this paper, we present a robust PARAFAC method. Essentially, it searches for an outlier-free subset of the data, on which we can then perform the classical PARAFAC algorithm. An outlier map is constructed to identify outliers. Simulations and examples show the robustness of our approach.

Standard error of prediction in parallel factor analysis of three-way data

Chemometrics and intelligent laboratory systems, 2004

A simple approach is described to calculate sample-specific standard errors for the concentrations predicted by a three-way parallel factor (PARAFAC) analysis model. It involves a first-order error propagation equation in which the correct sensitivity and leverage values are introduced. A comparison is made with a related unidimensional partial least-squares (PLS) model, specifically as regards the required leverage values. Monte Carlo simulation results obtained by adding random noise to both concentrations and instrumental signals for theoretical binary mixtures are in good agreement with the proposed approach. An experimental multicomponent example was studied by a similar Monte Carlo approach, and the obtained standard errors are also in agreement with the calculated values. Implications concerning the limit of detection are discussed. D

2003 A new efficient method for determining the number of components in PARAFAC models.pdf

A new diagnostic called the core consistency diagnostic (CORCONDIA) is suggested for determining the proper number of components for multiway models. It applies especially to the parallel factor analysis (PARAFAC) model, but also to other models that can be considered as restricted Tucker3 models. It is based on scrutinizing the`appropriateness' of the structural model based on the data and the estimated parameters of gradually augmented models. A PARAFAC model (employing dimension-wise combinations of components for all modes) is called appropriate if adding other combinations of the same components does not improve the fit considerably. It is proposed to choose the largest model that is still sufficiently appropriate. Using examples from a range of different types of data, it is shown that the core consistency diagnostic is an effective tool for determining the appropriate number of components in e.g. PARAFAC models. However, it is also shown, using simulated data, that the theoretical understanding of CORCONDIA is not yet complete.

A comparison of algorithms for fitting the PARAFAC model

Computational Statistics & Data Analysis, 2006

A multitude of algorithms have been developed to fit a trilinear PARAFAC model to a three-way array. Limits and advantages of some of the available methods (i.e. GRAM-DTLD, PARAFAC-ALS, ASD, SWATLD, PMF3 and dGN) are compared. The algorithms are explained in general terms together with two approaches to accelerate them: line search and compression. In order to compare the different methods, 720 sets of artificial data were generated with varying level and type of noise, collinearity of the factors and rank. Two PARAFAC models were fitted on each data set: the first having the correct number of factors F and the second with F + 1 components (the objective being to assess the sensitivity of the different approaches to the over-factoring problem, i.e. when the number of extracted components exceeds the rank of the array). The algorithms have also been tested on two real data sets of fluorescence measurements, again by extracting both the right and an exceeding number of factors. The evaluations are based on: number of iterations necessary to reach convergence, time consumption, quality of the solution and amount of resources required for the calculations (primarily memory).

A new efficient method for determining the number of components in PARAFAC models

Journal of Chemometrics, 2003

A new diagnostic called the core consistency diagnostic (CORCONDIA) is suggested for determining the proper number of components for multiway models. It applies especially to the parallel factor analysis (PARAFAC) model, but also to other models that can be considered as restricted Tucker3 models. It is based on scrutinizing the`appropriateness' of the structural model based on the data and the estimated parameters of gradually augmented models. A PARAFAC model (employing dimension-wise combinations of components for all modes) is called appropriate if adding other combinations of the same components does not improve the fit considerably. It is proposed to choose the largest model that is still sufficiently appropriate. Using examples from a range of different types of data, it is shown that the core consistency diagnostic is an effective tool for determining the appropriate number of components in e.g. PARAFAC models. However, it is also shown, using simulated data, that the theoretical understanding of CORCONDIA is not yet complete.

On the Detection of the Correct Number of Factors in Two-Facet Models by Means of Parallel Analysis

Educational and Psychological Measurement, 2021

Methods for optimal factor rotation of two-facet loading matrices have recently been proposed. However, the problem of the correct number of factors to retain for rotation of two-facet loading matrices has rarely been addressed in the context of exploratory factor analysis. Most previous studies were based on the observation that two-facet loading matrices may be rank deficient when the salient loadings of each factor have the same sign. It was shown here that full-rank two-facet loading matrices are, in principle, possible, when some factors have positive and negative salient loadings. Accordingly, the current simulation study on the number of factors to extract for two-facet models was based on rank-deficient and full-rank two-facet population models. The number of factors to extract was estimated from traditional parallel analysis based on the mean of the unreduced eigenvalues as well as from nine other rather traditional versions of parallel analysis (based on the 95th percentile of eigenvalues, based on reduced eigenvalues, based on eigenvalue differences). Parallel analysis based on the mean eigenvalues of the correlation matrix with the squared multiple correlations of each variable with the remaining variables inserted in the main diagonal had the highest detection rates for most of the two-facet factor models. Recommendations for the identification of the correct number of factors are based on the simulation results, on the results of an empirical example data set, and on the conditions for approximately rank-deficient and full-rank two-facet models.

A closed-form solution for Parallel Factor (PARAFAC) Analysis

2008 IEEE International Conference on Acoustics, Speech and Signal Processing, 2008

Parallel Factor Analysis (PARAFAC) is a branch of multi-way signal processing that has received increased attention recently. This is due to the large class of applications as well as the milestone identifiability results demonstrating the superiority to matrix (two-way) analysis approaches. A significant amount of research was dedicated to iterative methods to estimate the factors from noisy data. In many situations these require many iterations and are not guaranteed to converge to the global optimum. Therefore, suboptimal closed-form solutions were proposed as initializations.

Robust PARAFAC for incomplete data

Journal of Chemometrics, 2012

Different methods exist to explore multi-way data. In this article we focus on the widely used PARAFAC (Parallel factor analysis) model, which expresses multi-way data in a more compact way without ignoring the underlying complex structure. An alternating least squares procedure is typically used to fit the PARAFAC model. It is however well known that least squares techniques are very sensitive to outliers and hence the PARAFAC model as a whole is a nonrobust method. Therefore Engelen and Hubert [2011] have proposed a robust alternative, which can deal with fully observed data, possibly contaminated by outlying samples. In this paper we present an approach to perform PARAFAC on data which contain both outlying cases and missing elements.

PARAFAC. Tutorial and applications

This paper explains the multi-way decomposition method PARAFAC and its use in chemometrics. PARAFAC is a ge:ner-alization of PCA to higher order arrays, but some of the characteristics of the method are quite different from the ordinary two-way case. There is no rotation problem in PARAFAC, and e.g., pure spectra can be recovered from multi-way spectral data. One cannot as in PCA estimate components successively as this will give a model with poorer fit, than if the simultaneous solution is estimated. Finally scaling and centering is not as straightforward in the multi-way case as in the two-way case. An important advantage of using multi-way methods instead of unfolding methods is that the estimated models are very simple in a mathematical sense, and therefore more robust and easier to interpret. All these aspects plus more are explained in this tutorial and an implementation in Matlab code is available, that contains most of the features explained in the text. Three examples show how PARAFAC can be used for specific problems. 1he applications include subjects as: Analysis of variance by PARAFAC, a five-way application of PARAFAC, PARAFAC with half the elements missing, PARAFAC constrained to positive solutions and PARAFAC for regression as in principal component regression.