DISCOMAX: A Proximity-Preserving Distance Correlation Maximization Algorithm (original) (raw)

Supervised Dimensionality Reduction via Distance Correlation Maximization

Electronic Journal of Statistics, 2018

In our work, we propose a novel formulation for supervised dimensionality reduction based on a nonlinear dependency criterion called Statistical Distance Correlation, [Székely et al., 2007]. We propose an objective which is free of distributional assumptions on regression variables and regression model assumptions. Our proposed formulation is based on learning a low-dimensional feature representation z, which maximizes the squared sum of Distance Correlations between low dimensional features z and response y, and also between features z and covariates x. We propose a novel algorithm to optimize our proposed objective using the Generalized Minimization Maximization method of Parizi et al. [2015]. We show superior empirical results on multiple datasets proving the effectiveness of our proposed approach over several relevant state-of-the-art supervised dimensionality reduction methods.

Feature Screening via Distance Correlation Learning

Journal of the American Statistical Association, 2012

This paper is concerned with screening features in ultrahigh dimensional data analysis, which has become increasingly important in diverse scientific fields. We develop a sure independence screening procedure based on the distance correlation (DC-SIS, for short). The DC-SIS can be implemented as easily as the sure independence screening procedure based on the Pearson correlation (SIS, for short) proposed by . However, the DC-SIS can significantly improve the SIS. Fan and Lv established the sure screening property for the SIS based on linear models, but the sure screening property is valid for the DC-SIS under more general settings including linear models. Furthermore, the implementation of the DC-SIS does not require model specification (e.g., linear model or generalized linear model) for responses or predictors. This is a very appealing property in ultrahigh dimensional data analysis.

Supervised feature learning via dependency maximization

2016

A key challenge in machine learning is to automatically extract relevant feature representations of data for a given task. This becomes especially formidable task for structured data like images, which are often highly structured and complex. In this thesis, we propose frameworks for supervised feature learning for structured and unstructured data, via dependency maximization. In the first part of this dissertation we look at the problem of learning kernels for structured prediction. We present a novel framework called Twin Kernel Learning which proposes the idea of polynomial expansions of kernels, to learn kernels over structured data so as to maximize a dependency criterion called Hilbert-Schmidt Independence criterion (HSIC). We also give an efficient, matrix-decomposition based algorithm for learning these expansions and use it to learn covariance kernels of Twin Gaussian Processes. We demonstrate state-of-the-art empirical results on several synthetic and real-world datasets. In the second part of this work, we present a novel framework for supervised dimensionality reduction based on a dependency criterion called Distance Correlation. Our framework is based on learning low-dimensional features which maximize squared sum of Distance Correlations of low dimensional features, with both, the response, and the covariates. We propose a novel algorithm to maximize our proposed objective, and also show superior empirical results over state-of-the-art on multiple datasets.

Feature Selection with Distance Correlation

arXiv (Cornell University), 2022

Choosing which properties of the data to use as input to multivariate decision algorithms-a.k.a. feature selection-is an important step in solving any problem with machine learning. While there is a clear trend towards training sophisticated deep networks on large numbers of relatively unprocessed inputs (so-called automated feature engineering), for many tasks in physics, sets of theoretically well-motivated and well-understood features already exist. Working with such features can bring many benefits, including greater interpretability, reduced training and run time, and enhanced stability and robustness. We develop a new feature selection method based on Distance Correlation (DisCo), and demonstrate its effectiveness on the tasks of boosted top-and W-tagging. Using our method to select features from a set of over 7,000 energy flow polynomials, we show that we can match the performance of much deeper architectures, by using only ten features and two orders-of-magnitude fewer model parameters.

Distance Metric Learning Revisited

2020

The success of many machine learning algorithms (e.g. the nearest neighborhood classification and k-means clustering) depends on the representation of the data as elements in a metric space. Learning an appropriate distance metric from data is usually superior to the default Euclidean distance. In this paper, we revisit the original model proposed by Xing et al. [24] and propose a general formulation of learning a Mahalanobis distance from data. We prove that this novel formulation is equivalent to a convex optimization problem over the spectrahedron. Then, a gradientbased optimization algorithm is proposed to obtain the optimal solution which only needs the computation of the largest eigenvalue of a matrix per iteration. Finally, experiments on various UCI datasets and a benchmark face verification dataset called Labeled Faces in the Wild (LFW) demonstrate that the proposed method compares competitively to those state-of-the-art methods.

Maximum Gradient Dimensionality Reduction

2018 24th International Conference on Pattern Recognition (ICPR), 2018

We propose a novel dimensionality reduction approach based on the gradient of the regression function. Our approach is conceptually similar to Principal Component Analysis, however instead of seeking a low dimensional representation of the predictors that preserve the sample variance, we project onto a basis that preserves those predictors which induce the greatest change in the response. Our approach has the benefits of being simple and easy to implement and interpret, while still remaining very competitive with sophisticated state-of-the-art approaches.

Efficient learning and feature selection in high-dimensional regression

2010

We present a novel algorithm for efficient learning and feature selection in high-dimensional regression problems. We arrive at this model through a modification of the standard regression model, enabling us to derive a probabilistic version of the well-known statistical regression technique of backfitting. Using the expectation-maximization algorithm, along with variational approximation methods to overcome intractability, we extend our algorithm to include automatic relevance detection of the input features.

Linear Dimensionality Reduction: Survey, Insights, and Generalizations

Linear dimensionality reduction methods are a cornerstone of analyzing high dimensional data, due to their simple geometric interpretations and typically attractive computational properties. These methods capture many data features of interest, such as covariance, dynamical structure, correlation between data sets, input-output relationships, and margin between data classes. Methods have been developed with a variety of names and motivations in many fields, and perhaps as a result the connections between all these methods have not been highlighted. Here we survey methods from this disparate literature as optimization programs over matrix manifolds. We discuss principal component analysis, factor analysis, linear multidimensional scaling, Fisher's linear discriminant analysis, canonical correlations analysis, maximum autocorrelation factors, slow feature analysis, sufficient dimensionality reduction, undercomplete independent component analysis, linear regression, distance metric learning, and more. This optimization framework gives insight to some rarely discussed shortcomings of well-known methods, such as the suboptimality of certain eigenvector solutions. Modern techniques for optimization over matrix manifolds enable a generic linear dimensionality reduction solver, which accepts as input data and an objective to be optimized, and returns, as output, an optimal low-dimensional projection of the data. This simple optimization framework further allows straightforward generalizations and novel variants of classical methods, which we demonstrate here by creating an orthogonal-projection canonical correlations analysis. More broadly, this survey and generic solver suggest that linear dimensionality reduction can move toward becoming a blackbox, objective-agnostic numerical technology.

Semi-supervised laplacian regularization of kernel canonical correlation analysis

Machine Learning and Knowledge …, 2008

Kernel canonical correlation analysis (KCCA) is a dimensionality reduction technique for paired data. By finding directions that maximize correlation, KCCA learns representations that are more closely tied to the underlying semantics of the data rather than noise. However, meaningful directions are not only those that have high correlation to another modality, but also those that capture the manifold structure of the data. We propose a method that is simultaneously able to find highly correlated directions that are also located on high variance directions along the data manifold. This is achieved by the use of semi-supervised Laplacian regularization of KCCA. We show experimentally that Laplacian regularized training improves class separation over KCCA with only Tikhonov regularization, while causing no degradation in the correlation between modalities. We propose a model selection criterion based on the Hilbert-Schmidt norm of the semi-supervised Laplacian regularized cross-covariance operator, which we compute in closed form.

DISCOMAX: A Proximity-Preserving Distance Correlation Maximization Algorithm (original) (raw)

Related papers