Manifold-adaptive dimension estimation (original) (raw)

Manifold Hypothesis in Data Analysis: Double Geometrically-Probabilistic Approach to Manifold Dimension Estimation

2021

Manifold hypothesis states that data points in highdimensional space actually lie in close vicinity of a manifold of much lower dimension. In many cases this hypothesis was empirically verified and used to enhance unsupervised and semi-supervised learning. Here we present new approach to manifold hypothesis checking and underlying manifold dimension estimation. In order to do it we use two very different methods simultaneously — one geometric, another probabilistic — and check whether they give the same result. Our geometrical method is a modification for sparse data of a well-known box-counting algorithm for Minkowski dimension calculation. The probabilistic method is new. Although it exploits standard nearest neighborhood distance, it is different from methods which were previously used in such situations. This method is robust, fast and includes special preliminary data transformation. Experiments on real datasets show that the suggested approach based on two methods combination ...

Dimension estimation of image manifolds by minimal cover approximation

Neurocomputing, 2013

Estimating intrinsic dimension of data is an important problem in feature extraction and feature selection. It provides an estimation of the number of desired features. Principal Components Analysis (PCA) is a powerful tool in discovering the dimension of data sets with a linear structure; it, however, becomes ineffective when data have a nonlinear structure. In this paper, we propose a new PCA-based method to estimate the embedding dimension of data with nonlinear structures. Our method works by first finding a minimal cover of the data set, then performing PCA locally on each subset in the cover to obtain local intrinsic dimension estimations and finally giving the estimation result as the average of the local estimations. There are two main innovations in our method. (1) A novel noise filtering procedure is applied in the PCA procedure for local intrinsic dimension estimation. (2) A minimal cover is constructed over the whole data set. Because of these two innovations, our method is fast, robust to noise and outliers, converges to a stable estimation with a wide range of sub-region sizes and can be used in the incremental sense, where the subregion refers to the local approximation of the distributed manifold. Experiments on synthetic and image data sets show effectiveness of the proposed method.

De-biasing local dimension estimation

Many algorithms have been proposed for estimating the intrinsic dimension of high dimensional data. A phenomenon common to all of them is a negative bias, perceived to be the result of undersampling. We propose improved methods for estimating intrinsic dimension, taking manifold boundaries into consideration. By estimating dimension locally, we are able to analyze and reduce the effect that sample data depth has on the negative bias. Additionally, we offer improvements to an existing algorithm for dimension estimation, based on k-nearest neighbor graphs, and offer an algorithm for adapting any dimension estimation algorithm to operate locally. Finally, we illustrate the uses of local dimension estimation with data sets consisting of multiple manifolds, including applications such as diagnosing anomalies in router networks and image segmentation.

On Local Intrinsic Dimension Estimation and Its Applications

In this paper, we present multiple novel applications for local intrinsic dimension estimation. There has been much work done on estimating the global dimension of a data set, typically for the purposes of dimensionality reduction. We show that by estimating dimension locally, we are able to extend the uses of dimension estimation to many applications, which are not possible with global dimension estimation. Additionally, we show that local dimension estimation can be used to obtain a better global dimension estimate, alleviating the negative bias that is common to all known dimension estimation algorithms. We illustrate local dimension estimation's uses towards additional applications, such as learning on statistical manifolds, network anomaly detection, clustering, and image segmentation.

Intrinsic Dimension Estimation: Relevant Techniques and a Benchmark Framework

Mathematical Problems in Engineering, 2015

When dealing with datasets comprising high-dimensional points, it is usually advantageous to discover some data structure. A fundamental information needed to this aim is the minimum number of parameters required to describe the data while minimizing the information loss. This number, usually called intrinsic dimension, can be interpreted as the dimension of the manifold from which the input data are supposed to be drawn. Due to its usefulness in many theoretical and practical problems, in the last decades the concept of intrinsic dimension has gained considerable attention in the scientific community, motivating the large number of intrinsic dimensionality estimators proposed in the literature. However, the problem is still open since most techniques cannot efficiently deal with datasets drawn from manifolds of high intrinsic dimension and nonlinearly embedded in higher dimensional spaces. This paper surveys some of the most interesting, widespread used, and advanced state-of-the-a...

Riemannian Manifold Learning for Nonlinear Dimensionality Reduction

Computer Vision – ECCV 2006, 2006

In recent years, nonlinear dimensionality reduction (NLDR) techniques have attracted much attention in visual perception and many other areas of science. We propose an efficient algorithm called Riemannian manifold learning (RML). A Riemannian manifold can be constructed in the form of a simplicial complex, and thus its intrinsic dimension can be reliably estimated. Then the NLDR problem is solved by constructing Riemannian normal coordinates (RNC). Experimental results demonstrate that our algorithm can learn the data's intrinsic geometric structure, yielding uniformly distributed and well organized low-dimensional embedding data.

Estimating Local Intrinsic Dimension with k-Nearest Neighbor Graphs

Many high-dimensional data sets of practical interest exhibit a varying complexity in different parts of the data space. This is the case, for example, of databases of images containing many samples of a few textures of different complexity. Such phenomena can be modeled by assuming that the data lies on a collection of manifolds with different intrinsic dimensionalities. In this extended abstract, we introduce a method to estimate the local dimensionality associated with each point in a data set, without any prior information about the manifolds, their quantity and their sampling distributions. The proposed method uses a global dimensionality estimator based on knearest neighbor (k-NN) graphs, together with an algorithm for computing neighborhoods in the data with similar topological properties.

Geometrically local embedding in manifolds for dimension reduction

Pattern Recognition, 2012

In this paper, geometrically local embedding (GLE) is presented to discover the intrinsic structure of manifolds as a method in nonlinear dimension reduction. GLE is able to reveal the inner features of the input data in the lower dimension space while suppressing the influence of outliers in the local linear manifold. In addition to feature extraction and representation, GLE behaves as a clustering and classification method by projecting the feature data into low-dimensional separable regions. Through empirical evaluation, the performance of GLE is demonstrated by the visualization of synthetic data in lower dimension, and the comparison with other dimension reduction algorithms with the same data and configuration. Experiments on both pure and noisy data prove the effectiveness of GLE in dimension reduction, feature extraction, data visualization as well as clustering and classification.

Variance Reduction with neighborhood smoothing for local intrinsic dimension estimation

Local intrinsic dimension estimation has been shown to be useful for many tasks such as image segmentation, anomaly detection, and de-biasing global dimension estimates. Of particular concern with local dimension estimation algorithms is the high variance for high dimensions, leading to points which lie on the same manifold estimating at different dimensions. We propose adding adaptive 'neighborhood smoothing' -filtering over the generated dimension estimates to obtain the most probable estimate for each sample -as a method to reduce variance and increase algorithm accuracy. We present a method for defining neighborhoods using a geodesic distance, which constricts each neighborhood to the manifold of concern, and prevents smoothing over intersecting manifolds of differing dimension. Finally, we illustrate the benefits of neighborhood smoothing on synthetic data sets as well as towards diagnosing anomalies in router networks.

Geometric Elements of Manifold Learning

Manifold learning has been widely exploited in high dimensional data analysis with applications in pattern recognition, data mining and computer vision. The general problem is to extract a low-dimensional representation of a high-dimensional data set such that it leads to a more compact description of the data and simplifies its analysis. If we assume that the input lies on a low-dimensional manifold, embedded in some high-dimensional space, this problem can be solved by computing a suitable embedding procedure. This is the problem of manifold learning. This paper presents an introduction to this field with focus on the geometric aspects of the manifold and subspace learning methods. Therefore, we provide some review in the area followed by a discussion that aims to motivate its study. The basic idea of manifold learning and its relationship with linear/non-linear dimensionality reduction techniques are presented using a data set lying on a smooth curve (one dimensional differential...