Inference for robust canonical variate analysis (original) (raw)

Fast and robust discriminant analysis

Computational Statistics & Data Analysis, 2004

The goal of discriminant analysis is to obtain rules that describe the separation between groups of observations. Moreover it allows to classify new observations into one of the known groups. In the classical approach discriminant rules are often based on the empirical mean and covariance matrix of the data, or of parts of the data. But because these estimates are highly influenced by outlying observations, they become inappropriate at contaminated data sets. Robust discriminant rules are obtained by inserting robust estimates of location and scatter into generalized maximum likelihood rules at normal distributions. This approach allows to discriminate between several populations, with equal or unequal covariance structure, and with equal or unequal membership probabilities. In particular the highly robust MCD estimator is used as it can be computed very fast for large data sets. Also the probability of misclassification is estimated in a robust way. The performance of the new method is investigated through several simulations and by applying it to some real data sets.

Robust Linear Discriminant Analysis

Journal of Mathematics and Statistics

Linear Discriminant Analysis (LDA) is the most commonly employed method for classification. This method which creates a linear discriminant function yields optimal classification rule between two or more groups under the assumptions of normality and homoscedasticity (equal covariance matrices). However, the calculation of parametric LDA highly relies on the sample mean vectors and pooled sample covariance matrix which are sensitive to non-normality. To overcome the sensitivity of this method towards non-normality as well as homoscedasticity, this study proposes two new robust LDA models. In these models, an automatic trimmed mean and its corresponding winsorized mean are employed to replace the mean vector in the parametric LDA. Meanwhile, for the covariance matrix, this study introduces two robust approaches namely the winsorization and the multiplication of Spearman's rho with the corresponding robust scale estimator used in the trimming process. Simulated and real financial data are used to test the performance of the proposed methods in terms of misclassification rate. The numerical result shows that the new method performs better if compared to the parametric LDA and the robust LDA with S-estimator. Thus, these new models can be recommended as alternatives to the parametric LDA when non-normality and heteroscedasticity (unequal covariance matrices) exist.

Robust Linear Discriminant Analysis with Highest Breakdown Point Estimator

Journal of Telecommunication, Electronic and Computer Engineering, 2018

Linear Discriminant Analysis (LDA) is a supervised classification technique concerned with the relationship between a categorical variable and a set of interrelated variables. The main objective of LDA is to create a rule to distinguish between populations and allocating future observations to previously defined populations. The LDA yields optimal discriminant rule between two or more groups under the assumptions of normality and homoscedasticity. Nevertheless, the classical estimates, sample mean and sample covariance matrix, are highly affected when the ideal conditions are violated. To abate these problems, a new robust LDA rule using high breakdown point estimators has been proposed in this article. A winsorized approach used to estimate the location measure while the multiplication of Spearman's rho and the rescaled median absolute deviation were used to estimate the scatter measure to replace the sample mean and sample covariance matrix, respectively. Simulation and real data study were conducted to evaluate the performance of the proposed model measured in terms of misclassification error rates. The computational results showed that the proposed LDA is always better than the classical LDA and were comparable with the existing robust LDAs.

Robust generalised quadratic discriminant analysis

Pattern Recognition, 2021

Quadratic discriminant analysis (QDA) is a widely used statistical tool to classify observations from different multivariate Normal populations. The generalized quadratic discriminant analysis (GQDA) classification rule/classifier, which generalizes the QDA and the minimum Mahalanobis distance (MMD) classifiers to discriminate between populations with underlying elliptically symmetric distributions competes quite favorably with the QDA classifier when it is optimal and performs much better when QDA fails under non-Normal underlying distributions, e.g. Cauchy distribution. However, the classification rule in GQDA is based on the sample mean vector and the sample dispersion matrix of a training sample, which are extremely non-robust under data contamination. In real world, since it is quite common to face data highly vulnerable to outliers, the lack of robustness of the classical estimators of the mean vector and the dispersion matrix reduces the efficiency of the GQDA classifier significantly, increasing the misclassification errors. The present paper investigates the performance of the GQDA classifier when the classical estimators of the mean vector and the dispersion matrix used therein are replaced by various robust counterparts. Applications to various real data sets as well as simulation studies reveal far better performance of the proposed robust versions of the GQDA classifier. A Comparative study has been made to advocate the appropriate choice of the robust estimators to be used in a specific situation of the degree of contamination of the data sets.

Comparative performance of several robust linear discriminant analysis methods

REVSTAT Statistical Journal, 2007

• The problem of the non-robustness of the classical estimates in the setting of the quadratic and linear discriminant analysis has been addressed by many authors: Todorov et al. [19, 20], Chork and Rousseeuw [1], Hawkins and McLachlan [4], He and Fung [5], Croux and Dehon [2], ...

Robust classification with flexible discriminant analysis in heterogeneous data

ArXiv, 2022

Linear and Quadratic Discriminant Analysis are well-known classical methods but can heavily suffer from non-Gaussian distributions and/or contaminated datasets, mainly because of the underlying Gaussian assumption that is not robust. To fill this gap, this paper presents a new robust discriminant analysis where each data point is drawn by its own arbitrary Elliptically Symmetrical (ES) distribution and its own arbitrary scale parameter. Such a model allows for possibly very heterogeneous, independent but non-identically distributed samples. After deriving a new decision rule, it is shown that maximum-likelihood parameter estimation and classification are very simple, fast and robust compared to state-of-the-art methods.

Discriminant analysis for compositional data and robust parameter estimation

Computational Statistics, 2012

Compositional data, i.e. data including only relative information, need to be transformed prior to applying the standard discriminant analysis methods that are designed for the Euclidean space. Here it is investigated for linear, quadratic, and Fisher discriminant analysis, which of the transformations lead to invariance of the resulting discriminant rules. Moreover, it is shown that for robust parameter estimation not only an appropriate transformation, but also affine equivariant estimators of location and covariance are needed. An example and simulated data demonstrate the effects of working in an inappropriate space for discriminant analysis.

Discriminant procedures based on efficient robust discriminant coordinates

Journal of Nonparametric Statistics, 2007

For multivariate data collected over groups, discriminant analysis is a two-stage procedure: separation and allocation. For the traditional least squares procedure, separation of training data into groups is accomplished by the maximization of the Lawley–Hotelling test for differences between group means. This produces a set of discriminant coordinates which are used to visualize the data. Using the nearest center rule,

On the consistency and robustness properties of linear discriminant analysis

2002

Strong consistency of linear discriminant analysis is established under wide assumptions on the class conditional densities. Robustness to the presence of a mild degree of class dispersion heterogeneity is also analyzed. Results obtained may help to explain analytically the frequent good behavior in applications of linear discrimination techniques.

Unbiased bootstrap error estimation for linear discriminant analysis

EURASIP Journal on Bioinformatics and Systems Biology, 2014

Convex bootstrap error estimation is a popular tool for classifier error estimation in gene expression studies. A basic question is how to determine the weight for the convex combination between the basic bootstrap estimator and the resubstitution estimator such that the resulting estimator is unbiased at finite sample sizes. The well-known 0.632 bootstrap error estimator uses asymptotic arguments to propose a fixed 0.632 weight, whereas the more recent 0.632+ bootstrap error estimator attempts to set the weight adaptively. In this paper, we study the finite sample problem in the case of linear discriminant analysis under Gaussian populations. We derive exact expressions for the weight that guarantee unbiasedness of the convex bootstrap error estimator in the univariate and multivariate cases, without making asymptotic simplifications. Using exact computation in the univariate case and an accurate approximation in the multivariate case, we obtain the required weight and show that it...