The kurtosis coefficient and the linear discriminant function (original) (raw)
Related papers
The kurtosis coeficient and the linear discriminant function
1999
In this note we analyze the relationship between the direction obtained from the minimization of the kurtosis coefficient of the projections of a mixture of multivariate normal distributions and the linear discriminant function. We show that both directions are closely related, and in particular that given two vector random variables having symmetric distributions with unknown means and the same covariance matrix, the direction which minimizes the kurtosis coefficient of the projection is the linear discriminant function. This result provides a way to compute the discriminant function between two normal populations in the case in which the means and the common covariance matrix are unknown.
Characterization and Statistical Estimation of a Discriminant Space for Gaussian Mixtures
1999
The problem of classification of a multivariate observation X drawn from a mixture of Gaussian distributions is considered. A linear subspace of the least dimension containing all information about the cluster structure of X is called a discriminant space (DS). Estimation of DS is based on characterizations of DS via projection pursuit with an appropriate projection index. An estimator of
A comparison of linear and mixture models for discriminant analysis under nonnormality
2009
Classification is of broad interest in science because it "permeates many scientific studies and also arises in the contexts of many applications" (Panel on Discriminant Analysis, Classification, and Clustering, 1989, p. 34). Examples in the educational, social, and behavioral sciences include identifying children in kindergarten at risk for future reading difficulties (Catts, Fey, Zhang, & Tomblin, 2001), identifying individuals at risk for addiction (Flowers & Robinson, 2002), and predicting the crimes that male juvenile offenders may commit according to their personality characteristics (Glaser, Calhoun, & Petrocelli, 2002). In the biological and medical sciences, applications of classification procedures include identifying patients with chronic heart failure (Udris et al., 2001), detecting lung cancer (Phillips et al., 2003), and determining whether certain breast masses are malignant or benign (Sahiner et al., 2004). In the management sciences, methods for classification have been used for such purposes as predicting bankruptcy (Jo, Han, & Lee, 1997) and investigating the product deletion process (Avlonitis, Hart, & Tzokas, 2000). The primary goal of classification is to correctly sort objects into two or more mutually exclusive groups. Classification is often categorized into two subtypessupervised and unsupervised (Hastie, Tibshirani, & Friedman, 2001; Panel, 1989). Supervised classification, also known as discriminant analysis (or, perhaps more appropriately, as predictive discriminant analysis; see Huberty, 1984, 1994), is used to correctly assign future objects to groups that are already known to exist (Johnson & Wichern, 2002). Unsupervised classification, also known as cluster analysis (Panel on Discriminant Analysis, Classification, and Clustering, 1989), is used to assign objects to groups that are not known a priori. We focus on methods for discriminant analysis in the present work-specifically, procedures based on linear and mixture models. With regard to linear methods, we investigate linear discriminant analysis (LDA) and linear logistic discrimination (LLD; Fan & Wang, 1999), along with an extension of LDA based on ranks (LDR; see, e.g., Conover & Iman, 1980). Furthermore, we investigate a lesser known method for discriminant analysis based on mixture models, which can be viewed as an extension of LDA (Fraley & Raftery, 2002). Mixture models are often used to model probability density functions through mixtures of normal distributions (
Discriminant analysis in small and large dimensions
Theory of Probability and Mathematical Statistics
We study the distributional properties of the linear discriminant function under the assumption of normality by comparing two groups with the same covariance matrix but different mean vectors. A stochastic representation for the discriminant function coefficients is derived which is then used to obtain their asymptotic distribution under the high-dimensional asymptotic regime. We investigate the performance of the classification analysis based on the discriminant function in both small and large dimensions. A stochastic representation is established which allows to compute the error rate in an efficient way. We further compare the calculated error rate with the optimal one obtained under the assumption that the covariance matrix and the two mean vectors are known. Finally, we present an analytical expression of the error rate calculated in the high-dimensional asymptotic regime. The finite-sample properties of the derived theoretical results are assessed via an extensive Monte Carlo study.
Robustness of the fisher\u27s discriminant function to skew-curved normal distribution
2014
Discriminant analysis is a widely used multivariate technique with Fisher\u27s discriminant analysis (FDA) being its most venerable form. FDA assumes equality of population covariance matrices, but does not require multivariate normality. Nevertheless, the latter is desirable for optimal classification. To test FDA\u27s performance under non-normality caused by skewness the method was assessed with simulation based on a skew-curved normal (SCN) distribution belonging to the family of skew-generalised normal distributionsadditionally, effects of sample size and rotation were evaluated. Apparent error rate (APER) was used as the measure of classification performance. The analysis was performed using ANOVA with (transformed) mean APER as the dependent variable. Results show the FDA to be highly robust to skewness introduced into the model via the SCN distributed simulated data
Discriminant analysis in space with standard metric
The resulting solution is very easily to convert into a form which is obtained under the canonical model of discriminant analysis. Discriminant coefficient matrix can be defined as a matrix of partial regression coefficients. It is obtained by solving the problem ZW = K + E trag(E t E) = minimum.As , in fact, K = ZR-1/2 X , it is immediately evident that E = 0 and W = R-1/2 X.Therefore, w k vectors of W are proportional to the coordinates of the vector of discriminant functions in the oblique coordinate system consisting of vectors of Z with cosines of the angles between the coordinate axes equal to the elements of the correlation matrix R. As discriminant analysis can also be interpreted as a special case of component analysis with principal components transformed by an admissible singular transformation to maximize distance between the centroids of E p subsets, or canonical correlation ρ k (Cooley and Lohnes , 1971 ; Hadžigalić 1984 ; Momirović and Dobrić , 1984), identification of the content of discriminant functions is customarily based on structural vectors f k of matrix F = Z t K = RW = R 1/2 X = (f k) = (Rw k), analogous to the identification of the content of canonical variables obtained using Hotelling method of biorthogonal canonical correlation analysis, because by easy calculations it can be shown that F is a factor matrix of R matrix (Zorić and Momirović 1996 ; Momirović , 1997). In this metric, the cross structure of discriminant functions will be U = Z t Lρ-1 = Z t PZWρ-1 = Wρ because, naturally, W t Z t PZW = ρ 2 , and it is immediately clear that U is a factor matrix of matrix Z t PZ, or matrix of intergroup covariances defined in a space with the standard I metric.As f jk elements of F matrix and u jk elements of matrix U in act as ordinary product-moment correlation coefficients, and since they are a function of normally distributed variables, therefore they themselves are asymptotically normally distributed, their asymptotic variances are, of course, σ jk 2 ∼ (1-φ jk 2) 2 n-1 j = 1,..., m; k = 1,..., s , or ξ jk 2 ∼ (1-υ jk 2) 2 n-1 j = 1,..., m; k = 1,..., s, and can be used to test hypotheses of type H jk : f jk = φ jk , or H jk : u jk = υ jk , where φ jk and υ jk are some hypothetical correlations between variables of V and discriminant functions in population P because asymptotic distribution of f jk coefficients is f(f jk) ∼ N(φ jk , σ jk 2), and asymptotic distribution of u jk coefficients is f(u jk) ∼ N(υ jk , ξ jk 2) where N is a mark for normal distribution.
Robustness of the Fisher's Discriminant Function to Skew-Curved Normal Distribution
Fisher's discriminant analysis (FDA) being its most venerable form. FDA assumes equality of population covariance matrices, but does not require multivariate normality. Nevertheless, the latter is desirable for optimal classification. To test FDA's performance under non-normality caused by skewness the method was assessed with simulation based on a skew-curved normal (SCN) distribution belonging to the family of skew-generalised normal distributions; additionally, effects of sample size and rotation were evaluated. Apparent error rate (APER) was used as the measure of classification performance. The analysis was performed using ANOVA with (transformed) mean APER as the dependent variable. Results show the FDA to be highly robust to skewness introduced into the model via the SCN distributed simulated data.