Eigenvectors of a kurtosis matrix as interesting directions to reveal cluster structure (original) (raw)

The kurtosis coefficient and the linear discriminant function

Statistics & Probability Letters, 2000

In this note we analyze the relationship between the direction obtained from the minimization of the kurtosis coefficient of the projections of a mixture of multivariate normal distributions and the linear discriminant function. We show that both directions are closely related, and in particular that given two vector random variables having symmetric distributions with unknown means and the same covariance matrix, the direction which minimizes the kurtosis coefficient of the projection is the linear discriminant function. This result provides a way to compute the discriminant function between two normal populations in the case in which the means and the common covariance matrix are unknown.

The kurtosis coeficient and the linear discriminant function

1999

In this note we analyze the relationship between the direction obtained from the minimization of the kurtosis coefficient of the projections of a mixture of multivariate normal distributions and the linear discriminant function. We show that both directions are closely related, and in particular that given two vector random variables having symmetric distributions with unknown means and the same covariance matrix, the direction which minimizes the kurtosis coefficient of the projection is the linear discriminant function. This result provides a way to compute the discriminant function between two normal populations in the case in which the means and the common covariance matrix are unknown.

Detecting Clusters in the Data from Variance Decompositions of Its Projections

Journal of Classification, 2013

A new projection-pursuit index is used to identify clusters and other structures in multivariate data. It is obtained from the variance decompositions of the data's one-dimensional projections, without assuming a model for the data or that the number of clusters is known. The index is affine invariant and successful with real and simulated data. A general result is obtained indicating that clusters' separation increases with the data's dimension. In simulations it is thus confirmed, as expected, that the performance of the index either improves or does not deteriorate when the data's dimension increases, making it especially useful for "large dimension-small sample size" data. The efficiency of this index will increase with the continuously improved computer technology. Several applications are presented.

Cluster Identification Using Projections

Journal of The American Statistical Association, 2001

This article describes a procedure to identify clusters in multivariate data using information obtained from the univariate projections of the sample data onto certain directions. The directions are chosen as those that minimize and maximize the kurtosis coef cient of the projected data. It is shown that, under certain conditions, these directions provide the largest separation for the different clusters. The projected univariate data are used to group the observations according to the values of the gaps or spacings between consecutiveordered observations . These groupings are then combined over all projection directions. The behavior of the method is tested on several examples, and compared to k-means, MCLUST, and the procedure proposed by Jones and Sibson in 1987. The proposed algorithm is iterative, af ne equivariant, exible, robust to outliers, fast to implement, and seems to work well in practice.

Estimation of scatter matrix based on i.i.d. sample from elliptical distributions

Acta Mathematicae Applicatae Sinica, 1995

In this paper, we consider the estimation of a scatter matrix under entropy loss, quadratic loss, when the samples =(1),..-~(~) are i.i.d, and =(l)~ECp(t~,f). With respect to entropy and quadratic losses, we obtain the best estimator of ~ having the form ~, as well as having the form T~.', where So,T~ and A are given in the text, and obtain the minimax estimator of~. and the best equivariant estimator of ~-with respect to the triangular transformations group LT+(p) (the group consisting of lower triangular matrices with positive diagonal elements). Some related discussion are given as its generalizations.

Maximum likelihood clustering via normal mixture models

1996

We present the approach to clustering whereby a normal mixture model is fitted to the data by maximum likelihood. The general case of normal component densities with unrestricted covariance matrices is considered and so it extends the work of Abbas and Fahmy (1994), who imposed the restriction of diagonal component covariance matrices. Attention is also focussed on the problem of testing for the number of clusters within this mixture framework, using the likelihood ratio test.

Submitted to the Annals of Statistics PHASE TRANSITIONS FOR HIGH DIMENSIONAL CLUSTERING AND RELATED PROBLEMS By

2015

Consider a two-class clustering problem where we observe Xi = `iμ + Zi, Zi iid ∼ N(0, Ip), 1 ≤ i ≤ n. The feature vector μ ∈ R is unknown but is presumably sparse. The class labels `i ∈ {−1, 1} are also unknown and the main interest is to estimate them. We are interested in the statistical limits. In the two-dimensional phase space calibrating the rarity and strengths of useful features, we find the precise demarcation for the Region of Impossibility and Region of Possibility. In the former, useful features are too rare/weak for successful clustering. In the latter, useful features are strong enough to allow successful clustering. The results are extended to the case of colored noise using Le Cam’s idea on comparison of experiments. We also extend the study on statistical limits for clustering to that for signal recovery and that for global testing. We compare the statistical limits for three problems and expose some interesting insight. We propose classical PCA and Important Featur...

Dimension reduction for data of unknown cluster structure

For numerous reasons there raises a need for dimension reduction that preserves certain characteristics of data. In this work we focus on data coming from a mixture of Gaussian distributions and we propose a method that preserves distinctness of clustering structure, although the structure is assumed to be yet unknown. The rationale behind the method is the following: (i) had one known the clusters (classes) within the data, one could facilitate further analysis and reduce space dimension by projecting the data to the Fisher's linear subspace, which -- by definition -- preserves the structure of the given classes best (ii) under some reasonable assumptions, this can be done, albeit approximately, without the prior knowledge of the clusters (classes). In the paper, we show how this approach works. We present a method of preliminary data transformation that brings the directions of largest overall variability close to the directions of the best between-class separation. Hence, for...

Clustering Multivariate Normal Distributions

Lecture Notes in Computer Science, 2009

In this paper, we consider the task of clustering multivariate normal distributions with respect to the relative entropy into a prescribed number, k, of clusters using a generalization of Lloyd's k-means algorithm [1]. We revisit this information-theoretic clustering problem under the auspices of mixed-type Bregman divergences, and show that the approach of Davis and Dhillon [2] (NIPS*06) can also be derived directly, by applying the Bregman k-means algorithm, once the proper vector/matrix Legendre transformations are defined. We further explain the dualistic structure of the sided k-means clustering, and present a novel k-means algorithm for clustering with respect to the symmetrical relative entropy, the J-divergence. Our approach extends to differential entropic clustering of arbitrary members of the same exponential families in statistics.

Flexible Generalized Mixture Model Cluster Analysis with Elliptically-Contoured Distributions

i-manager’s Journal on Pattern Recognition

Recognition is the science of making inferences based on data and is the heart of all scientific inquiry, including understanding ourselves and the real-world around us. Growing numbers of applications are starting to use Pattern Recognition as the initial step towards interpreting human actions, intention, and behavior, and as a central part of Next-Generation Smart Environments.