Some steps towards a general principle for dimensionality reduction mappings (original) (raw)
Related papers
Dimensionality reduction mappings
2011
A wealth of powerful dimensionality reduction methods has been established which can be used for data visualization and preprocessing. These are accompanied by formal evaluation schemes, which allow a quantitative evaluation along general principles and which even lead to further visualization schemes based on these objectives. Most methods, however, provide a mapping of a priorly given finite set of points only, requiring additional steps for out-of-sample extensions. We propose a general view on dimensionality reduction based on the concept of cost functions, and, based on this general principle, extend dimensionality reduction to explicit mappings of the data manifold. This offers simple out-of-sample extensions. Further, it opens a way towards a theory of data visualization taking the perspective of its generalization ability to new data points. We demonstrate the approach based on a simple global linear mapping as well as prototype-based local linear mappings.
A General Framework for Dimensionality-Reducing Data Visualization Mapping
Neural Computation
In recent years a wealth of dimension reduction techniques for data visualization and preprocessing has been established. Non-parametric methods require additional effort for out-of-sample extensions, because they just provide a mapping of a given finite set of points. In this contribution we propose a general view on non-parametric dimension reduction based on the concept of cost functions and properties of the data. Based on this general principle we transfer non-parametric dimension reduction to explicit mappings of the data manifold such that direct outof-sample extensions become possible. Furthermore, this concept offers the possibility to investigate the generalization ability of data visualization to new data points. We demonstrate the approach based on a simple global linear mapping as well as prototype-based local linear mappings. In addition, we can bias the functional form according to given auxiliary information. This leads to explicit supervised visualization mappings which discriminative properties are comparable to state-of-the-art approaches. X X X X X X X X X XX X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
Improving Dimensionality Reduction Projections for Data Visualization
Applied Sciences
In data science and visualization, dimensionality reduction techniques have been extensively employed for exploring large datasets. These techniques involve the transformation of high-dimensional data into reduced versions, typically in 2D, with the aim of preserving significant properties from the original data. Many dimensionality reduction algorithms exist, and nonlinear approaches such as the t-SNE (t-Distributed Stochastic Neighbor Embedding) and UMAP (Uniform Manifold Approximation and Projection) have gained popularity in the field of information visualization. In this paper, we introduce a simple yet powerful manipulation for vector datasets that modifies their values based on weight frequencies. This technique significantly improves the results of the dimensionality reduction algorithms across various scenarios. To demonstrate the efficacy of our methodology, we conduct an analysis on a collection of well-known labeled datasets. The results demonstrate improved clustering p...
Visualization of Manifold-Valued Elements by Multidimensional Scaling
arXiv (Cornell University), 2010
The present contribution suggests the use of a multidimensional scaling (MDS) algorithm as a visualization tool for manifold-valued elements. A visualization tool of this kind is useful in signal processing and machine learning whenever learning/adaptation algorithms insist on high-dimensional parameter manifolds.
Topology Preservation Measures in the Visualization of Manifold-Type Multidimensional Data
Informatica
Most of real-life data are not often truly high-dimensional. The data points just lie on a low-dimensional manifold embedded in a high-dimensional space. Nonlinear manifold learning methods automatically discover the low-dimensional nonlinear manifold in a high-dimensional data space and then embed the data points into a low-dimensional embedding space, preserving the underlying structure in the data. In this paper, we have used the locally linear embedding method on purpose to unravel a manifold. In order to quantitatively estimate the topology preservation of a manifold after unfolding it in a low-dimensional space, some quantitative numerical measure must be used. There are lots of different measures of topology preservation. We have investigated three measures: Spearman's rho, Konig's measure (KM), and mean relative rank errors (MRRE). After investigating different manifolds, it turned out that only KM and MRRE gave proper results of manifold topology preservation in all the cases. The main reason is that Spearman's rho considers distances between all the pairs of points from the analysed data set, while KM and MRRE evaluate a limited number of neighbours of each point from the analysed data set.
Principal Manifolds for Data Visualization and Dimension Reduction
Lecture Notes in Computational Science and Enginee, 2008
In 1901 Karl Pearson invented Principal Component Analysis (PCA). Since then, PCA serves as a prototype for many other tools of data analysis, visualization and dimension reduction: Independent Component Analysis (ICA), Multidimensional Scaling (MDS), Nonlinear PCA (NLPCA), Self Organizing Maps (SOM), etc. The book starts with the quote of the classical Pearson definition of PCA and includes reviews of various methods: NLPCA, ICA, MDS, embedding and clustering algorithms, principal manifolds and SOM. New ...
Maps for the Visualization of high-dimensional Data Spaces
2003
The U-Matrix is a canonical tool for the display of distance structures in data space using emergent SOM (ESOM). The U-Matrix defined originally for planar map spaces is extended in this work to toroid neuron spaces. Embedding the neuron space in a finite but borderless space, such as a torus, avoids border effects of planar spaces. A planar display of a toroid map space disrupts, however, coherent U-Matrix structures. Tiling multiple instances of the U-Matrix solves this problem at the cost of multiple images of data points. The P-Matrix, as defined here, is a display of the density relationships in the data space using Pareto Density Estimation. While the P-Matrix is useful for clustering, it can also be used for a non-ambiguous display of a non planar neuron space. Centering the display for high density regions and removing ambiguous images of data points leads to U-Maps and P-Maps. U-Maps depict the distance structure of a data space as a borderless three dimensional landscape whose floor space is ordered according to the topology preserving features of ESOM. P-Maps display the density structures. Both maps are specially suited for data mining and knowledge discovery.
Information visualization by dimensionality reduction: a review
Information visualization can be considered a process of transforming similarity relationships between data points to a geometric representation in order to see unseen information. High-dimensionality data sets are one of the main problems of information visualization. Dimensionality Reduction (DR) is therefore a useful strategy to project high-dimensional space onto low-dimensional space, which it can be visualized directly. The application of this technique has several benefits. First, DR can minimize the amount of storage needed by reducing the size of the data sets. Second, it helps to understand the data sets by discarding any irrelevant features, and to focus on the main important features. DR can enable the discovery of rich information, which assists the task of data analysis. Visualization of high-dimensional data sets is widely used in many fields, such as remote sensing imagery, biology, computer vision, and computer graphics. The visualization is a simple way to understand the high-dimensional space because the relationship between original data points is incomprehensible. A large number of DR methods which attempt to minimize the loss of original information. This paper discuss and analys some DR methods to support the idea of dimensionality reduction to get trustworthy visualization.
2008
In 1901 Karl Pearson invented Principal Component Analysis (PCA). Since then, PCA serves as a prototype for many other tools of data analysis, visualization and dimension reduction: Independent Component Analysis (ICA), Multidimensional Scaling (MDS), Nonlinear PCA (NLPCA), Self Organizing Maps (SOM), etc. The book starts with the quote of the classical Pearson definition of PCA and includes reviews of various methods: NLPCA, ICA, MDS, embedding and clustering algorithms, principal manifolds and SOM. New ...
Toward a Quantitative Survey of Dimension Reduction Techniques
IEEE Transactions on Visualization and Computer Graphics, 2019
Dimensionality reduction methods, also known as projections, are frequently used in multidimensional data exploration in machine learning, data science, and information visualization. Tens of such techniques have been proposed, aiming to address a wide set of requirements, such as ability to show the high-dimensional data structure, distance or neighborhood preservation, computational scalability, stability to data noise and/or outliers, and practical ease of use. However, it is far from clear for practitioners how to choose the best technique for a given use context. We present a survey of a wide body of projection techniques that helps answering this question. For this, we characterize the input data space, projection techniques, and the quality of projections, by several quantitative metrics. We sample these three spaces according to these metrics, aiming at good coverage with bounded effort. We describe our measurements and outline observed dependencies of the measured variables. Based on these results, we draw several conclusions that help comparing projection techniques, explain their results for different types of data, and ultimately help practitioners when choosing a projection for a given context. Our methodology, datasets, projection implementations, metrics, visualizations, and results are publicly open, so interested stakeholders can examine and/or extend this benchmark.