Missing Data Completion Using Diffusion Maps and Laplacian Pyramids (original) (raw)

Co-manifold learning with missing data

2019

Representation learning is typically applied to only one mode of a data matrix, either its rows or columns. Yet in many applications, there is an underlying geometry to both the rows and the columns. We propose utilizing this coupled structure to perform co-manifold learning: uncovering the underlying geometry of both the rows and the columns of a given matrix, where we focus on a missing data setting. Our unsupervised approach consists of three components. We first solve a family of optimization problems to estimate a complete matrix at multiple scales of smoothness. We then use this collection of smooth matrix estimates to compute pairwise distances on the rows and columns based on a new multi-scale metric that implicitly introduces a coupling between the rows and the columns. Finally, we construct row and column representations from these multi-scale metrics. We demonstrate that our approach outperforms competing methods in both data visualization and clustering.

Heterogeneous datasets representation and learning using diffusion maps and Laplacian pyramids

Proceedings of the 2012 SIAM International Conference on Data Mining, 2012

The diffusion maps together with the geometric harmonics provide a method for describing the geometry of high dimensional data and for extending these descriptions to new data points and to functions, which are defined on the data. This method suffers from two limitations. First, even though real-life data is often heterogeneous , the assumption in diffusion maps is that the attributes of the processed dataset are comparable. Second, application of the geometric harmonics requires careful setting for the correct extension scale and condition number. In this paper, we propose a method for representing and learning heterogeneous datasets by using diffusion maps for unifying and embedding heterogeneous dataset and by replacing the geometric harmonics with the Laplacian pyramid extension. Experimental results on three benchmark datasets demonstrate how the learning process becomes straightforward when the constructed representation smoothly parameterizes the task-related function.

A multi-scale approach for data imputation

2018 IEEE International Conference on the Science of Electrical Engineering in Israel (ICSEE), 2018

A common pre-possessing task in machine learning is to complete missing data entries in order to form a full dataset. In case the dimension of the input data is high, it is often the case that the rows and columns are correlated. In this work, we construct a multi-scale model that is based on the the dual row-column geometry of the dataset and apply it to imputation, which is carried out within the model construction. Experimental results demonstrate the efficiency of our approach on a publicly available dataset.

Invertible Manifold Learning for Dimension Reduction

2021

It is widely believed that a nonlinear dimension reduction (NLDR) process drops information inevitably in most practical scenarios, and even with the manifold assumption, most existing methods are unable to preserve structure of data after DR due to the loss of information, especially in high-dimensional cases. In the context of manifold learning, we think a good low-dimensional representation should preserve topological and geometric properties of data manifold. To achieve this, the inveribility of a NLDR transformation is required such that the learned representation is reconstructible via its inverse transformation. In this paper, we propose a novel method, called invertible manifold learning (inv-ML), to tackle this problem. A locally isometric smoothness (LIS) constraint for preserving local geometry is applied to a two-stage inv-ML algorithm. Firstly, a homeomorphic sparse coordinate transformation is learned to find the low-dimensional representation without loss of topologic...

Missing Data Reconstruction based on Projection Manifolds

2000

New method for missing values reconstruction based on topological properties of data is elaborated. Proposed technique generalizes method for missing data reconstruction based on ordinary principal components by taking into account nonlinearities in a learning sample. Ap- plication of elaborated methodology to the design of metamodels enables to get very accurate surrogates which then can be used for construction of

Handling Missing Data with Graph Representation Learning

ArXiv, 2020

Machine learning with missing data has been approached in two different ways, including feature imputation where missing feature values are estimated based on observed values, and label prediction where downstream labels are learned directly from incomplete data. However, existing imputation models tend to have strong prior assumptions and cannot learn from downstream tasks, while models targeting label prediction often involve heuristics and can encounter scalability issues. Here we propose GRAPE, a graph-based framework for feature imputation as well as label prediction. GRAPE tackles the missing data problem using a graph representation, where the observations and features are viewed as two types of nodes in a bipartite graph, and the observed feature values as edges. Under the GRAPE framework, the feature imputation is formulated as an edge-level prediction task and the label prediction as a node-level prediction task. These tasks are then solved with Graph Neural Networks. Expe...

IJERT-Improved Dimension Reduction With Modified Diffusion Maps

International Journal of Engineering Research and Technology (IJERT), 2014

https://www.ijert.org/improved-dimension-reduction-with-modified-diffusion-maps https://www.ijert.org/research/improved-dimension-reduction-with-modified-diffusion-maps-IJERTV3IS060717.pdf The task of representing the higher dimensional data into lower dimension while preserving the relative information, previously was done by principle component analysis, factor analysis, or feature selection. However if original lower dimensional data is embedded in high dimensional space, then approach based on manifold learning and graph theory allow to learn underlying geometry of data. One of such technique is Diffusion Maps. It preserves the local proximity between the data points by first constructing a representation for underlying manifold. In this paper, binary classification problem using Diffusion Map to embed the data with various kernel representations is targeted. Results show that specific kernels are well suited for Diffusion Map applications on some feature sets and in general some kernels are outperform the standard Gaussian and Polynomial kernels, on several of the higher dimensional data sets.

Learning gradients on manifolds

Bernoulli, 2010

A common belief in high-dimensional data analysis is that data are concentrated on a low-dimensional manifold. This motivates simultaneous dimension reduction and regression on manifolds. We provide an algorithm for learning gradients on manifolds for dimension reduction for high-dimensional data with few observations. We obtain generalization error bounds for the gradient estimates and show that the convergence rate depends on the intrinsic dimension of the manifold and not on the dimension of the ambient space. We illustrate the efficacy of this approach empirically on simulated and real data and compare the method to other dimension reduction procedures.

Missing Value Imputation with Unsupervised Backpropagation

Computational Intelligence, 2014

Many data mining and data analysis techniques operate on dense matrices or complete tables of data. Realworld data sets, however, often contain unknown values. Even many classification algorithms that are designed to operate with missing values still exhibit deteriorated accuracy. One approach to handling missing values is to fill in (impute) the missing values. In this paper, we present a technique for unsupervised learning called Unsupervised Backpropagation (UBP), which trains a multi-layer perceptron to fit to the manifold sampled by a set of observed point-vectors. We evaluate UBP with the task of imputing missing values in datasets, and show that UBP is able to predict missing values with significantly lower sum-squared error than other collaborative filtering and imputation techniques. We also demonstrate with 24 datasets and 9 supervised learning algorithms that classification accuracy is usually higher when randomly-withheld values are imputed using UBP, rather than with other methods.

] [ A deeper look at manifold-learning techniques based on kernels and graphs ] Diffusion Maps for Signal Processing

S ignal processing methods have significantly changed over the last several decades. Traditional methods were usually based on parametric statistical inference and linear filters. These frameworks have helped to develop efficient algorithms that have often been suitable for implementation on digital signal processing (DSP) systems. Over the years, DSP systems have advanced rapidly, and their computational capabilities have been substantially increased. This development has enabled contemporary signal processing algorithms to incorporate more computations. Consequently , we have recently experienced a growing interaction between signal processing and machine-learning approaches, e.g., Bayesian networks, graphical models, and kernel-based methods, whose computational burden is usually high. In this article, we review manifold-learning techniques based on kernels and graphs. Our survey covers recent developments and trends and presents ways to incorporate them into signal processing. W...