Mixture of Bilateral-Projection Two-Dimensional Probabilistic Principal Component Analysis (original) (raw)
Related papers
Supervised probabilistic principal component analysis
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '06, 2006
Principal component analysis (PCA) has been extensively applied in data mining, pattern recognition and information retrieval for unsupervised dimensionality reduction. When labels of data are available, e.g., in a classification or regression task, PCA is however not able to use this information. The problem is more interesting if only part of the input data are labeled, i.e., in a semi-supervised setting. In this paper we propose a supervised PCA model called SPPCA and a semi-supervised PCA model called S 2 PPCA, both of which are extensions of a probabilistic PCA model. The proposed models are able to incorporate the label information into the projection phase, and can naturally handle multiple outputs (i.e., in multi-task learning problems). We derive an efficient EM learning algorithm for both models, and also provide theoretical justifications of the model behaviors. SPPCA and S 2 PPCA are compared with other supervised projection methods on various learning tasks, and show not only promising performance but also good scalability.
Mixture Models for Exploring Local PCA Structures
2007
Principal component analysis (PCA) is one of the most popular technique for dimensionality reduction of multivariate data. This paper discusses a new learning algorithm to explore local PCA structure in which the observed data follow a mixture of several PCA models, where each model is described by a linear combination of independent and Gaussian sources. The proposed method is based on a mixture of several Gaussian distributions to extract all local PCA structures simultaneously. Parameters are estimated by maximizing likelihood function. The performance of the proposed method is compared with some existing PCA algorithms using synthetic datasets.
Mixture of Probabilistic Principal Component Analyzers for Shapes from Point Sets
IEEE transactions on pattern analysis and machine intelligence, 2017
Inferring a probability density function (pdf) for shape from a population of point sets is a challenging problem. The lack of point-to-point correspondences and the non-linearity of the shape spaces undermine the linear models. Methods based on manifolds model the shape variations naturally, however, statistics are often limited to a single geodesic mean and an arbitrary number of variation modes. We relax the manifold assumption and consider a piece-wise linear form, implementing a mixture of distinctive shape classes. The pdf for point sets is defined hierarchically, modeling a mixture of Probabilistic Principal Component Analyzers (PPCA) in higher dimension. A Variational Bayesian approach is designed for unsupervised learning of the posteriors of point set labels, local variation modes, and point correspondences. By maximizing the model evidence, the numbers of clusters, modes of variations, and points on the mean models are automatically selected. Using the predictive distribu...
IEEE Transactions on Neural Networks, 2000
Visual exploration has proven to be a powerful tool for multivariate data mining and knowledge discovery. Most visualization algorithms aim to find a projection from the data space down to a visually perceivable rendering space. To reveal all of the interesting aspects of multimodal data sets living in a high-dimensional space, a hierarchical visualization algorithm is introduced which allows the complete data set to be visualized at the top level, with clusters and subclusters of data points visualized at deeper levels. The methods involve hierarchical use of standard finite normal mixtures and probabilistic principal component projections, whose parameters are estimated using the expectationmaximization and principal component neural networks under the information theoretic criteria. We demonstrate the principle of the approach on several multimodal numerical data sets, and we then apply the method to the visual explanation in computer-aided diagnosis for breast cancer detection from digital mammograms.
Dynamic competitive probabilistic principal components analysis
International Journal of Neural Systems, 2009
The Mixture of Probabilistic Principal Components Analyzers (MPPCA) is a multivariate analysis technique which defines a Gaussian probabilistic model at each unit. The numbers of units and principal directions in each unit are not learned in the original approach. Variational Bayesian approaches have been proposed for this purpose, which rely on assumptions on the probability distributions of the MPPCA parameters.
Probabilistic Disjoint Principal Component Analysis
Multivariate Behavioral Research, 2018
One of the most relevant problems in principal component analysis and factor analysis is the interpretation of the components/factors. In this paper, disjoint principal component analysis model is extended in a maximum-likelihood framework to allow for inference on the model parameters. A coordinate ascent algorithm is proposed to estimate the model parameters. The performance of the methodology is evaluated on simulated and real data sets.
Multiscale principal component analysis
2014
Principal component analysis (PCA) is an important tool in exploring data. The conventional approach to PCA leads to a solution which favours the structures with large variances. This is sensitive to outliers and could obfuscate interesting underlying structures. One of the equivalent definitions of PCA is that it seeks the subspaces that maximize the sum of squared pairwise distances between data projections. This definition opens up more flexibility in the analysis of principal components which is useful in enhancing PCA. In this paper we introduce scales into PCA by maximizing only the sum of pairwise distances between projections for pairs of datapoints with distances within a chosen interval of values [l,u]. The resulting principal component decompositions in Multiscale PCA depend on point (l,u) on the plane and for each point we define projectors onto principal components. Cluster analysis of these projectors reveals the structures in the data at various scales. Each structure is described by the eigenvectors at the medoid point of the cluster which represent the structure. We also use the distortion of projections as a criterion for choosing an appropriate scale especially for data with outliers. This method was tested on both artificial distribution of data and real data. For data with multiscale structures, the method was able to reveal the different structures of the data and also to reduce the effect of outliers in the principal component analysis.
Lecture Notes in Computer Science, 2002
Classifying unknown objects in familiar, general categories rather than trying to classify them into a certain known, but only similar class, or rejecting them at all is an important aspect in object recognition. Especially in tasks, where it is impossible to model all possibly appearing objects in advance, generic object modeling and recognition is crucial. We present a novel approach to generic object modeling and classification based on probabilistic principal component analysis (PPCA). A data set can be separated into classes during an unsupervised learning step using the expectationmaximization algorithm. In contrast to principal component analysis the feature space is modeled in a locally linear manner. Additionally, Bayesian classification is possible thanks to the underlying probabilistic model. The approach is applied to the COIL-20/100 databases. It shows that PPCA is well suited for appearance based generic object modeling and recognition. The automatic, unsupervised generation of categories matches in most cases the categorization done by humans. Improvements are expected if the categorization is performed in a supervised fashion.
Clustering and disjoint principal component analysis
Computational Statistics & Data Analysis, 2009
A constrained principal component analysis, which aims at a simultaneous clustering of objects and a partitioning of variables is proposed. The new methodology allows to identify components with maximum variance, each one a linear combination of a subset of variables. All the subsets form a partition of variables. Simultaneously, a partition of objects is also computed maximizing the between cluster variance. The methodology is formulated in a semi-parametric least-squares framework as a quadratic mixed continuous and integer problem. An alternating leastsquares algorithm is proposed to solve the clustering and disjoint PCA. Two applications are given to show the features of the methodology.