High Dimensional (Penalised) Discriminant Analysis (original) (raw)

Classification Trees as an Alternative to Linear Discriminant Analysis

Linear discriminant analysis (LDA) is frequently used for classification/prediction problems in physical anthropology, but it is unusual to find examples where researchers consider the statistical limitations and assumptions required for this technique. In these instances, it is difficult to know whether the predictions are reliable. This paper considers a nonparametric alternative to predictive LDA: binary, recursive (or classification) trees. This approach has the advantage that data transformation is unnecessary , cases with missing predictor variables do not require special treatment, prediction success is not dependent on data meeting normality conditions or covariance homogeneity, and variable selection is intrinsic to the methodology. Here I compare the efficacy of classification trees with LDA, using typical morphometric data. With data from modern hominoids, the results show that both techniques perform nearly equally. With complete data sets, LDA may be a better choice, as is shown in this example, but with missing observations, classification trees perform outstandingly well, whereas commercial discriminant analysis programs do not predict classifications for cases with incompletely measured predictor variables and generally are not designed to address the problem of missing data. Testing of data prior to analysis is necessary, and classification trees are recommended either as a replacement for LDA or as a supplement whenever data do not meet relevant assumptions. It is highly recommended as an alternative to LDA whenever the data set contains important cases with missing predictor variables. Am J Phys Anthropol 119:257–275, 2002.

Linear discriminant analysis: A detailed tutorial

Linear Discriminant Analysis (LDA) is a very common technique for dimensionality reduction problems as a pre-processing step for machine learning and pattern classification applications. At the same time, it is usually used as a black box, but (sometimes) not well understood. The aim of this paper is to build a solid intuition for what is LDA, and how LDA works, thus enabling readers of all levels be able to get a better understanding of the LDA and to know how to apply this technique in different applications. The paper first gave the basic definitions and steps of how LDA technique works supported with visual explanations of these steps. Moreover, the two methods of computing the LDA space, i.e. class-dependent and class-independent methods, were explained in details. Then, in a step-by-step approach, two numerical examples are demonstrated to show how the LDA space can be calculated in case of the class-dependent and class-independent methods. Furthermore, two of the most common LDA problems (i.e. Small Sample Size (SSS) and non-linearity problems) were highlighted and illustrated, and state-of-the-art solutions to these problems were investigated and explained. Finally, a number of experiments was conducted with different datasets to (1) investigate the effect of the eigenvectors that used in the LDA space on the robustness of the extracted feature for the classification accuracy, and (2) to show when the SSS problem occurs and how it can be addressed.