Detecting outlying samples in a parallel factor analysis model (original) (raw)

Robust factor analysis in the presence of normality violations, missing data, and outliers: Empirical questions and possible solutions

Although a mainstay of psychometric methods, several reviews suggest factor analysis is often applied without testing whether data support it, and that decision-making process or guiding principles providing evidential support for FA techniques are seldom reported. Researchers often defer such decision-making to the default settings on widely-used software packages, and unaware of their limitations, might unwittingly misuse FA. This paper discusses robust analytical alternatives for answering nine important questions in exploratory factor analysis (EFA), and provides R commands for running complex analysis in the hope of encouraging and empowering substantive researchers on a journey of discovery towards more knowledgeable and judicious use of robust alternatives in FA. It aims to take solutions to problems like skewness, missing values, determining the number of factors to extract, and calculation of standard errors of loadings, and make them accessible to the general substantive researcher.

A New Algorithm for Computing Disjoint Orthogonal Components in the Parallel Factor Analysis Model with Simulations and Applications to Real-World Data

Mathematics

In this paper, we extend the use of disjoint orthogonal components to three-way table analysis with the parallel factor analysis model. Traditional methods, such as scaling, orthogonality constraints, non-negativity constraints, and sparse techniques, do not guarantee that interpretable loading matrices are obtained in this model. We propose a novel heuristic algorithm that allows simple structure loading matrices to be obtained by calculating disjoint orthogonal components. This algorithm is also an alternative approach for solving the well-known degeneracy problem. We carry out computational experiments by utilizing simulated and real-world data to illustrate the benefits of the proposed algorithm.

Outliers, Leverage Observations, and Influential Cases in Factor Analysis: Using Robust Procedures to Minimize Their Effect

Sociological Methodology, 2008

Parallel to the development in regression diagnosis, this paper defines good and bad leverage observations in factor analysis. Outliers are observations that deviate from the factor model, not from the center of the data cloud. The effects of each kind of outlying observations on the normal distribution-based maximum likelihood estimator and the associated likelihood ratio statistic are studied through analysis. The distinction between outliers and leverage observations also clarifies the roles of three robust procedures based on different Mahalanobis distances. All the robust procedures are designed to minimize the effect of certain outlying observations. Only the robust procedure with a residual-based distance properly controls the effect of outliers. Empirical results illustrate the strength or weakness of each procedure and support those obtained

Robust PARAFAC for incomplete data

Journal of Chemometrics, 2012

Different methods exist to explore multi-way data. In this article we focus on the widely used PARAFAC (Parallel factor analysis) model, which expresses multi-way data in a more compact way without ignoring the underlying complex structure. An alternating least squares procedure is typically used to fit the PARAFAC model. It is however well known that least squares techniques are very sensitive to outliers and hence the PARAFAC model as a whole is a nonrobust method. Therefore Engelen and Hubert [2011] have proposed a robust alternative, which can deal with fully observed data, possibly contaminated by outlying samples. In this paper we present an approach to perform PARAFAC on data which contain both outlying cases and missing elements.

Jack-knife technique for outlier detection and estimation of standard errors in PARAFAC models

In the last years, multi-way analysis has become increasingly important because it has proved to be a valuable tool, e.g. in interpreting data provided by instrumental methods that describe the multivariate and complex reality of a given problem. Parallel factor analysis (PARAFAC) is one of the most widely used multi-way models. Despite its usefulness in many applications, up to date there is no available tool in the literature to estimate the standard errors associated with the parameter estimates. In this study, we apply the so-called jack-knife technique to PARAFAC in order to find the associated standard errors to the parameter estimates from the PARAFAC model. The jack-knife technique is also shown to be useful for detecting outliers. An example of fluorescence data (emission/excitation landscapes) is used to show the applicability of the method. D

Parallel factor analysis with constraints on the configurations: An overview

The purpose of the paper is to present an overview of recent developments with respect to the use of constraints in conjunction with the Parallel Factor Analysis parafac model . Constraints and the way they can be incorporated in the estimation process of the model are reviewed. Emphasis is placed on the relatively new triadic algorithm which provides a large number of new ways to use the parafac model.

Maximum likelihood estimation of factor analysis using the ECME algorithm with complete and incomplete data

1998

Factor analysis is a standard tool in educational testing contexts, which can be fit using the EM algorithm (Dempster, Laird and Rubin (1977)). An extension of EM, called the ECME algorithm (Liu and Rubin (1994)), can be used to obtain ML estimates more efficiently in factor analysis models. ECME has an E-step, identical to the E-step of EM, but instead of EM's M-step, it has a sequence of CM (conditional maximization) steps, each of which maximizes either the constrained expected complete-data log-likelihood, as with the ECM algorithm (Meng and Rubin (1993)), or the constrained actual log-likelihood. For factor analysis, we use two CM steps: the first maximizes the expected complete-data log-likelihood over the factor loadings given fixed uniquenesses, and the second maximizes the actual likelihood over the uniquenesses given fixed factor loadings. We also describe EM and ECME for ML estimation of factor analysis from incomplete data, which arise in applications of factor analysis in educational testing contexts. ECME shares with EM its monotone increase in likelihood and stable convergence to an ML estimate, but converges more quickly than EM. This more rapid convergence not only can shorten CPU time, but at least as important, it allows for a substantially easier assessment of convergence, as shown by examples. We believe that the application of ECME to factor analysis illustrates the role that extended EM-type algorithms, such as the even more general AECM algorithm (Meng and van Dyk (1997)) and the PX-EM algorithm (Liu, Rubin and Wu (1997)), can play in fitting complex models that can arise in educational testing contexts.

On the Detection of the Correct Number of Factors in Two-Facet Models by Means of Parallel Analysis

Educational and Psychological Measurement, 2021

Methods for optimal factor rotation of two-facet loading matrices have recently been proposed. However, the problem of the correct number of factors to retain for rotation of two-facet loading matrices has rarely been addressed in the context of exploratory factor analysis. Most previous studies were based on the observation that two-facet loading matrices may be rank deficient when the salient loadings of each factor have the same sign. It was shown here that full-rank two-facet loading matrices are, in principle, possible, when some factors have positive and negative salient loadings. Accordingly, the current simulation study on the number of factors to extract for two-facet models was based on rank-deficient and full-rank two-facet population models. The number of factors to extract was estimated from traditional parallel analysis based on the mean of the unreduced eigenvalues as well as from nine other rather traditional versions of parallel analysis (based on the 95th percentile of eigenvalues, based on reduced eigenvalues, based on eigenvalue differences). Parallel analysis based on the mean eigenvalues of the correlation matrix with the squared multiple correlations of each variable with the remaining variables inserted in the main diagonal had the highest detection rates for most of the two-facet factor models. Recommendations for the identification of the correct number of factors are based on the simulation results, on the results of an empirical example data set, and on the conditions for approximately rank-deficient and full-rank two-facet models.

Shifted factor analysis?Part I: Models and properties

Journal of Chemometrics, 2003

The factor model is modified to deal with the problem of factor shifts. This problem arises with sequential data (e.g. time series, spectra, digitized images) if the profiles of the latent factors shift position up or down the sequence of measurements: such shifts disturb multilinearity and so standard factor/component models no longer apply. To deal with this, we modify the model(s) to include explicit mathematical representation of any factor shifts present in a data set; in this way the model can both adjust for the shifts and describe/recover their patterns. Shifted factor versions of both two-and three (or higher)-way factor models are developed. The results of applying them to synthetic data support the theoretical argument that these models have stronger uniqueness properties; they can provide unique solutions in both two-way and three-way cases where equivalent non-shifted versions are under-identified. For uniqueness to hold, however, the factors must shift independently; two or more factors that show the same pattern of shifts will not be uniquely resolved if not already uniquely determined. Another important restriction is that the models, in their current form, do not work well when the shifts are accompanied by substantial changes in factor profile shape. Three-way factor models such as Parafac, and shifted factor models such as described here, may be just two of many ways that factor analysis can incorporate additional information to make the parameters identifiable.

Robust Latent Factor Analysis for Precise Representation of High-Dimensional and Sparse Data

IEEE/CAA Journal of Automatica Sinica, 2021

High-dimensional and sparse (HiDS) matrices commonly arise in various industrial applications, e.g., recommender systems (RSs), social networks, and wireless sensor networks. Since they contain rich information, how to accurately represent them is of great significance. A latent factor (LF) model is one of the most popular and successful ways to address this issue. Current LF models mostly adopt L2-norm-oriented Loss to represent an HiDS matrix, i.e., they sum the errors between observed data and predicted ones with L2-norm. Yet L2-norm is sensitive to outlier data. Unfortunately, outlier data usually exist in such matrices. For example, an HiDS matrix from RSs commonly contains many outlier ratings due to some heedless/malicious users. To address this issue, this work proposes a smooth L1-norm-oriented latent factor (SL-LF) model. Its main idea is to adopt smooth L1-norm rather than L2-norm to form its Loss, making it have both strong robustness and high accuracy in predicting the missing data of an HiDS matrix. Experimental results on eight HiDS matrices generated by industrial applications verify that the proposed SL-LF model not only is robust to the outlier data but also has significantly higher prediction accuracy than state-of-the-art models when they are used to predict the missing data of HiDS matrices.