Estimating the dimension of a manifold and finding local charts on it by using nonlinear single value decomposition (original) (raw)
Related papers
In this paper we propose a method of using nonlinear generalization of Singular Value Decomposition (SVD) to arrive at an upper bound for the dimension of a manifold which is embedded in some RN. We have assumed that the data about its co-ordinates is available. We would also assume that there exists at least one small neighborhood with sufficient number of data points. Given these conditions, we show a method to compute the dimension of a manifold. We begin by looking at the simple case when the manifold is in the form of a lower dimensional affine subspace. In this case, we show that the well known technique of SVD can be used to (i) calculate the dimension of the manifold and (ii) to get the equations which define the subspace. For the more general case, we have applied a nonlinear generalization of the SVD (i) to search for an upper bound for the dimension of the manifold and (ii) to find the equations for the local charts of the manifold. We have included a brief discussion about how this method would be highly useful in the context of the Takens’ embedding which is used in the analysis of a time series data from a dynamical system. We show a specific problem that has recently been found out when applying this method. One very effective solution is to develop a model which is based on local charts and for this purpose a good estimate of the underlying dimension of an embedded data is required.
Dimension estimation of image manifolds by minimal cover approximation
Neurocomputing, 2013
Estimating intrinsic dimension of data is an important problem in feature extraction and feature selection. It provides an estimation of the number of desired features. Principal Components Analysis (PCA) is a powerful tool in discovering the dimension of data sets with a linear structure; it, however, becomes ineffective when data have a nonlinear structure. In this paper, we propose a new PCA-based method to estimate the embedding dimension of data with nonlinear structures. Our method works by first finding a minimal cover of the data set, then performing PCA locally on each subset in the cover to obtain local intrinsic dimension estimations and finally giving the estimation result as the average of the local estimations. There are two main innovations in our method. (1) A novel noise filtering procedure is applied in the PCA procedure for local intrinsic dimension estimation. (2) A minimal cover is constructed over the whole data set. Because of these two innovations, our method is fast, robust to noise and outliers, converges to a stable estimation with a wide range of sub-region sizes and can be used in the incremental sense, where the subregion refers to the local approximation of the distributed manifold. Experiments on synthetic and image data sets show effectiveness of the proposed method.
Geometrically local embedding in manifolds for dimension reduction
Pattern Recognition, 2012
In this paper, geometrically local embedding (GLE) is presented to discover the intrinsic structure of manifolds as a method in nonlinear dimension reduction. GLE is able to reveal the inner features of the input data in the lower dimension space while suppressing the influence of outliers in the local linear manifold. In addition to feature extraction and representation, GLE behaves as a clustering and classification method by projecting the feature data into low-dimensional separable regions. Through empirical evaluation, the performance of GLE is demonstrated by the visualization of synthetic data in lower dimension, and the comparison with other dimension reduction algorithms with the same data and configuration. Experiments on both pure and noisy data prove the effectiveness of GLE in dimension reduction, feature extraction, data visualization as well as clustering and classification.
Singular-value decomposition and embedding dimension
Physical review, 1987
Data from dynamical experiments are often studied with use of results due to Shaw et al. and to Takens, which generate points in a space of relatively high dimension by embedding measurements which are typically one dimensional. A number of questions arise from this, the most obvious being how should one choose the dimension of the embedding space. In this paper we show that a method which seems promising at first sight, estimating the rank of the matrix of embedded data, is unfortunately not useful in general. Previous encouraging results have almost certainly been due to numerical problems which can, in part, be avoided by a careful application of singular-value decomposition. We show that this process does not give useful dynamical information, though it is often useful in noise control.
Geometric Elements of Manifold Learning
Manifold learning has been widely exploited in high dimensional data analysis with applications in pattern recognition, data mining and computer vision. The general problem is to extract a low-dimensional representation of a high-dimensional data set such that it leads to a more compact description of the data and simplifies its analysis. If we assume that the input lies on a low-dimensional manifold, embedded in some high-dimensional space, this problem can be solved by computing a suitable embedding procedure. This is the problem of manifold learning. This paper presents an introduction to this field with focus on the geometric aspects of the manifold and subspace learning methods. Therefore, we provide some review in the area followed by a discussion that aims to motivate its study. The basic idea of manifold learning and its relationship with linear/non-linear dimensionality reduction techniques are presented using a data set lying on a smooth curve (one dimensional differential...
On Local Intrinsic Dimension Estimation and Its Applications
In this paper, we present multiple novel applications for local intrinsic dimension estimation. There has been much work done on estimating the global dimension of a data set, typically for the purposes of dimensionality reduction. We show that by estimating dimension locally, we are able to extend the uses of dimension estimation to many applications, which are not possible with global dimension estimation. Additionally, we show that local dimension estimation can be used to obtain a better global dimension estimate, alleviating the negative bias that is common to all known dimension estimation algorithms. We illustrate local dimension estimation's uses towards additional applications, such as learning on statistical manifolds, network anomaly detection, clustering, and image segmentation.
Non-linear dimensionality reduction by locally linear isomaps
2004
Algorithms for nonlinear dimensionality reduction (NLDR) find meaningful hidden low-dimensional structures in a high-dimensional space. Current algorithms for NLDR are Isomaps, Local Linear Embedding and Laplacian Eigenmaps. Isomaps are able to reliably recover low-dimensional nonlinear structures in high-dimensional data sets, but suffer from the problem of short-circuiting, which occurs when the neighborhood distance is larger than the distance between the folds in the manifolds.
Intrinsic Dimension Estimation: Relevant Techniques and a Benchmark Framework
Mathematical Problems in Engineering, 2015
When dealing with datasets comprising high-dimensional points, it is usually advantageous to discover some data structure. A fundamental information needed to this aim is the minimum number of parameters required to describe the data while minimizing the information loss. This number, usually called intrinsic dimension, can be interpreted as the dimension of the manifold from which the input data are supposed to be drawn. Due to its usefulness in many theoretical and practical problems, in the last decades the concept of intrinsic dimension has gained considerable attention in the scientific community, motivating the large number of intrinsic dimensionality estimators proposed in the literature. However, the problem is still open since most techniques cannot efficiently deal with datasets drawn from manifolds of high intrinsic dimension and nonlinearly embedded in higher dimensional spaces. This paper surveys some of the most interesting, widespread used, and advanced state-of-the-a...