Fast and robust discriminant analysis (original) (raw)

Robust Linear Discriminant Analysis with Highest Breakdown Point Estimator

Journal of Telecommunication, Electronic and Computer Engineering, 2018

Linear Discriminant Analysis (LDA) is a supervised classification technique concerned with the relationship between a categorical variable and a set of interrelated variables. The main objective of LDA is to create a rule to distinguish between populations and allocating future observations to previously defined populations. The LDA yields optimal discriminant rule between two or more groups under the assumptions of normality and homoscedasticity. Nevertheless, the classical estimates, sample mean and sample covariance matrix, are highly affected when the ideal conditions are violated. To abate these problems, a new robust LDA rule using high breakdown point estimators has been proposed in this article. A winsorized approach used to estimate the location measure while the multiplication of Spearman's rho and the rescaled median absolute deviation were used to estimate the scatter measure to replace the sample mean and sample covariance matrix, respectively. Simulation and real data study were conducted to evaluate the performance of the proposed model measured in terms of misclassification error rates. The computational results showed that the proposed LDA is always better than the classical LDA and were comparable with the existing robust LDAs.

Robust generalised quadratic discriminant analysis

Pattern Recognition, 2021

Quadratic discriminant analysis (QDA) is a widely used statistical tool to classify observations from different multivariate Normal populations. The generalized quadratic discriminant analysis (GQDA) classification rule/classifier, which generalizes the QDA and the minimum Mahalanobis distance (MMD) classifiers to discriminate between populations with underlying elliptically symmetric distributions competes quite favorably with the QDA classifier when it is optimal and performs much better when QDA fails under non-Normal underlying distributions, e.g. Cauchy distribution. However, the classification rule in GQDA is based on the sample mean vector and the sample dispersion matrix of a training sample, which are extremely non-robust under data contamination. In real world, since it is quite common to face data highly vulnerable to outliers, the lack of robustness of the classical estimators of the mean vector and the dispersion matrix reduces the efficiency of the GQDA classifier significantly, increasing the misclassification errors. The present paper investigates the performance of the GQDA classifier when the classical estimators of the mean vector and the dispersion matrix used therein are replaced by various robust counterparts. Applications to various real data sets as well as simulation studies reveal far better performance of the proposed robust versions of the GQDA classifier. A Comparative study has been made to advocate the appropriate choice of the robust estimators to be used in a specific situation of the degree of contamination of the data sets.

Robust Linear Discriminant Analysis

Journal of Mathematics and Statistics

Linear Discriminant Analysis (LDA) is the most commonly employed method for classification. This method which creates a linear discriminant function yields optimal classification rule between two or more groups under the assumptions of normality and homoscedasticity (equal covariance matrices). However, the calculation of parametric LDA highly relies on the sample mean vectors and pooled sample covariance matrix which are sensitive to non-normality. To overcome the sensitivity of this method towards non-normality as well as homoscedasticity, this study proposes two new robust LDA models. In these models, an automatic trimmed mean and its corresponding winsorized mean are employed to replace the mean vector in the parametric LDA. Meanwhile, for the covariance matrix, this study introduces two robust approaches namely the winsorization and the multiplication of Spearman's rho with the corresponding robust scale estimator used in the trimming process. Simulated and real financial data are used to test the performance of the proposed methods in terms of misclassification rate. The numerical result shows that the new method performs better if compared to the parametric LDA and the robust LDA with S-estimator. Thus, these new models can be recommended as alternatives to the parametric LDA when non-normality and heteroscedasticity (unequal covariance matrices) exist.

Discriminant procedures based on efficient robust discriminant coordinates

Journal of Nonparametric Statistics, 2007

For multivariate data collected over groups, discriminant analysis is a two-stage procedure: separation and allocation. For the traditional least squares procedure, separation of training data into groups is accomplished by the maximization of the Lawleyโ€“Hotelling test for differences between group means. This produces a set of discriminant coordinates which are used to visualize the data. Using the nearest center rule,

Robust Linear Discriminant Rule using Double Trimming Location Estimator with Robust Mahalanobis Squared Distance

Pertanika Journal of Science and Technology

The commonly employed classical linear discriminant rule, based on classical mean and covariance, are highly sensitive to outliers. Therefore, outlier influence on location and scale estimation will affect the accuracy of a discriminant rule and lead to high misclassification rates. The past studies used classical Mahalanobis Squared Distance (MSD) to alleviate the problem. However, the highly sensitive mean and covariance shortcoming can still affect the distance computation, causing masking and swamping effects. In a previous study, researchers proposed a double trimming procedure that adopted MSD-based ฮฑ-trimmed mean into MSD-based ฮฑ-trimmed median to construct a robust classifier. However, the proposed procedure has an overlooked flaw because the procedure employed the MSD in the computation. Thus, this study proposed to employ a robust MSD for the distance-based trimmed median procedure. The improvised trimmed median was then used to construct a robust linear discriminant rule ...

Robust multiple discriminant rule using Harrell-Davis median estimator: A distribution-free approach to cellwise-casewise outliers coexistence

AIP Conf. Proc. of The 7th International Conference on Quantitative Sciences and its Applications (ICOQSIA2022), 2023

Multivariate data may be contaminated by cellwise and/or casewise outliers. Cellwise outliers are individual data points within a variable that are extreme whereas casewise outliers are observations that come from a different distribution. Similar to other parametric methods, the Classical Multiple Discriminant Rule (CMDR) achieve optimal performance only when the normality assumption is fulfilled. The coexistence of cellwise-casewise outliers can disrupt the data distribution of the sample. Thus, in order to alleviate the problem, this paper employed a distribution-free estimator, Harrell-Davis Median (๐›‰ฬ‚๐‡๐ƒ), together with Robust Covariance (๐’๐‘) to construct Robust MDR (RMDRHD). The MDRs were evaluated based on misclassification rate via simulation study. The simulation results show that RMDRHD is able to achieve consistently lower misclassification rate than CMDR. Overall, the findings confirmed that the use of the distribution-free ๐›‰ฬ‚๐‡๐ƒ to robustify MDR is practical when dealing with both cellwise and casewise outliers.

Discriminant analysis for compositional data and robust parameter estimation

Computational Statistics, 2012

Compositional data, i.e. data including only relative information, need to be transformed prior to applying the standard discriminant analysis methods that are designed for the Euclidean space. Here it is investigated for linear, quadratic, and Fisher discriminant analysis, which of the transformations lead to invariance of the resulting discriminant rules. Moreover, it is shown that for robust parameter estimation not only an appropriate transformation, but also affine equivariant estimators of location and covariance are needed. An example and simulated data demonstrate the effects of working in an inappropriate space for discriminant analysis.

Robust discriminant analysis for classification of remote sensing data

2014 International Conference on Advanced Computer Science and Information System, 2014

This paper discusses the classic and robust discriminant analysis algorithm applied to the classification of rice fields, water, buildings, and bare land areas. Discriminant Analysis for multiple groups is often done. This method relies on the sample averages and covariance matrices computed from the training sample. Since sample averages and covariance matrices are not robust, it has been proposed to use robust estimators and covariance instead. In order to obtain a robust procedure with high breakdown point for discriminant analysis, the classical estimators are replaced by Feasible Solution Algorithm (FSA). The input data is a time-series of Landsat 8 Normalize Difference Vegetation Index (NDVI). The classification process is guided over two steps, training and classification. The purpose of the training step is to produce discriminant functions using FSA estimators, and the purpose of the classification step is to classify rice fields, water, buildings and bare land areas. The aim of this paper is to measure the accuracy of Classic and Robust Discriminant Analysis to classify the rice fields, water, buildings and bare land areas from Landsat 8 NDVI time series.

Robust classification with flexible discriminant analysis in heterogeneous data

ArXiv, 2022

Linear and Quadratic Discriminant Analysis are well-known classical methods but can heavily suffer from non-Gaussian distributions and/or contaminated datasets, mainly because of the underlying Gaussian assumption that is not robust. To fill this gap, this paper presents a new robust discriminant analysis where each data point is drawn by its own arbitrary Elliptically Symmetrical (ES) distribution and its own arbitrary scale parameter. Such a model allows for possibly very heterogeneous, independent but non-identically distributed samples. After deriving a new decision rule, it is shown that maximum-likelihood parameter estimation and classification are very simple, fast and robust compared to state-of-the-art methods.

Comparative performance of several robust linear discriminant analysis methods

REVSTAT Statistical Journal, 2007

ย• The problem of the non-robustness of the classical estimates in the setting of the quadratic and linear discriminant analysis has been addressed by many authors: Todorov et al. [19, 20], Chork and Rousseeuw [1], Hawkins and McLachlan [4], He and Fung [5], Croux and Dehon [2], ...