High Dimensional Data Visualization: Advances and Challenges (original) (raw)

High Dimensional Data Visualization

6th International Symposium of Hungarian …, 2005

Visualizations that can handle flat files, or simple table data are most often used in data mining. Over the last few years many techniques have been developed for visualising different types of information. A brief background to data visualization and key references are provided in this paper. Our goal is to review the various high-dimensional visualization techniques and classify and summarize these in a comparative table in which we emphasize the main properties of these methods.

Review on High Dimensional Data Visualization

R S. Publication (rspublication.com),, 2014

Large high dimensional dataset encloses billions of entries and contains different attributes and relational databases. Data cube aggregation operation is a well-known technique used to implement data-mining for larger size databases. Spatial data are sometimes assumed to have absorbed prohibitively large amount of space, which consequently requires disk storage. Thus it is required to pre-compute every possible aggregate query over the database. Modern scientific applications are generating larger and larger volume of data at ever increasing rate. As datasets become bulkier, exploratory data visualization turns out to be more difficult and complex, and data-fetching turns into a time-consuming process in small devices. Nanocubes are in-memory data structures, specifically designed to speed up queries for multidimensional data cubes, and could eventually be used as a backend for these types of applications. Nanocubes offer efficient storage and querying of large, multidimensional, spatiotemporal datasets and high dimensional datasets.

A geometry-based approach to visualize high-dimensional data

Brazilian Conference on Intelligent Systems, 2019

Big Data has attracted extensive attention from industry , academia and governments around the world, employing various approaches from many fields such as machine learning, pattern recognition and data visualization. Data visualization is quite useful in the perception of relevant information by a human for gaining understanding and insight from data with high dimensionality. This paper presents a novel approach for dimensionality reduction called Polygonal Coordinate System (PCS), which is able to represent multi-dimensional data into a two-dimensional data. For this purpose, data are represented across a regular polygon or interface between the high dimensions and the 2D plane. PCS can deal with massive data sets by adopting an incremental and efficient dimensionality reduction. Statistical comparison using Spearman's rho correlation highlights the utility of PCS, outperforming the state-of-the-art t-Distributed Stochastic Neighbor Embedding (t-SNE) technique.

Visualisation of High-Dimensional Data for Very Large Data Sets

This paper proposes a modification on the Sammon map algorithm for data visualisation. The modification, known as the Sparse Approximated Sammon Stress(SASS), allows mappings to be produced for very large data sets of the order of 10 6 points. While the technique may be useful in a variety of applications, the results presented here will demonstrate its usefulness for visualising patient deterioration in vital sign data collected from step-down unit hospital patients. A final result demonstrates an application of the SASS visualisation for drug safety analysis.

Visual hierarchical dimension reduction for exploration of high dimensional datasets

2003

Traditional visualization techniques for multidimensional data sets, such as parallel coordinates, glyphs, and scatterplot matrices, do not scale well to high numbers of dimensions. A common approach to solving this problem is dimensionality reduction. Existing dimensionality reduction techniques usually generate lower dimensional spaces that have little intuitive meaning to users and allow little user interaction. In this paper we propose a new approach to handling high dimensional data, named Visual Hierarchical Dimension Reduction (VHDR), that addresses these drawbacks. VHDR not only generates lower dimensional spaces that are meaningful to users, but also allows user interactions in most steps of the process. In VHDR, dimensions are grouped into a hierarchy, and lower dimensional spaces are constructed using clusters of the hierarchy. We have implemented the VHDR approach into XmdvTool, and extended several traditional multidimensional visualization methods to convey dimension cluster characteristics when visualizing the data set in lower dimensional spaces. Our case study of applying VHDR to a real data set supports our belief that this approach is effective in supporting the exploration of high dimensional data sets.

High Dimensional Data Visualization Using 3-D Icons

2000

In this paper we present a method to visualize large amount of high dimensional data. High dimensional data visualization is very important to data analysts since it gives a direct and natural view of data. In our method we use one icon to represent one group of dimensions. Then we choose features of the icon to display the dimensions within

Analysis of Current Visualization Techniques and Main Challenges for the Future

Journal of Information Systems Engineering & Management, 2017

The big amount of data generated nowadays are being used by Big Data tools to generate knowledge and to facilitate the decision-making. However, this situation creates a new challenge: how to visualize all these data without losing mid/long term crucial information. The purpose of this article is to analyze the state of the art on massive data visualization, main problems and challenges of information representation current techniques as well as the evolution of the tools and the future of them, in other words, new functionalities to offer.

How to visualize high-dimensional data: a roadmap

Journal of Data Mining & Digital Humanities, Special issue on Visualisations in Historical Linguistics, 2020

Discovery of the chronological or geographical distribution of collections of historical text can be more reliable when based on multivariate rather than on univariate data because multivariate data provide a more complete description. Where the data are high-dimensional, however, their complexity can defy analysis using traditional philological methods. The first step in dealing with such data is to visualize it using graphical methods in order to identify any latent structure. If found, such structure facilitates formulation of hypotheses which can be tested using a range of mathematical and statistical methods. Where, however, the dimensionality is greater than 3, direct graphical investigation is impossible. The present discussion presents a roadmap of how this obstacle can be overcome, and is in three main parts: the first part presents some fundamental data concepts, the second describes an example corpus and a high-dimensional data set derived from it, and the third outlines two approaches to visualization of that data set: dimensionality reduction and cluster analysis. keywords Data visualization, multivariate data, high dimensionality, dimensionality reduction, cluster analysis. INTRODUCTION Discovery of the chronological or geographical distribution of collections of historical text can be more reliable when based on multivariate rather than on univariate data because, assuming that the variables describe different aspects of the texts in question, multivariate data provide a more complete description. Where the multivariate data are high-dimensional, however, their complexity can defy analysis using traditional philological methods. The first step in dealing with such data is to visualize it using graphical methods in order to identify any latent structure. If found, such structure facilitates formulation of hypotheses which can be tested using a range of mathematical and statistical methods. Where, however, the dimensionality is greater than 3, direct graphical investigation is impossible. The present discussion presents a roadmap of how this obstacle can be overcome. Exemplification is based on data abstracted from a corpus of English historical texts with a known temporal distribution, allowing the efficacy of the methods covered in the discussion to be readily verified by the reader. The discussion is in three main parts. The first part presents some fundamental data concepts-its nature, its representation using vectors and matrices, and its interpretation in terms of concepts of vector space and manifold, the second part describes the corpus and a high-dimensional data set abstracted from it, and the third outlines approaches to visualization of that data set using the concepts from (1) applied to (2). These approaches are of two types.  The first, dimensionality reduction, reduces high-dimensional data to dimensionality 3 or less to enable graphical representation; the methods presented are (i) variable selection based on variance and (ii) principal component analysis.  The second, cluster analysis, represents the structure of data in high-dimensional space directly without dimensionality reduction.