M4: A visualization-oriented time series data aggregation (original) (raw)

VDDA: automatic visualization-driven data aggregation in relational databases

The VLDB Journal, 2015

Visual analysis of high-volume numerical data is ubiquitous in many industries, including finance, banking, and discrete manufacturing. Contemporary, RDBMS-based systems for visualization of high-volume numerical data have difficulty to cope with the hard latency requirements and high ingestion rates of interactive visualizations. Existing solutions for lowering the volume of large data sets disregard the spatial properties of visualizations and result in visualization errors. In this work, we introduce VDDA, a Visualization-Driven Data Aggregation approach that provides highquality to error-free visualizations of high-volume data sets, at high data reduction rates. Based on the M4 aggregation for producing error-free line charts, we develop a complete set of visualization-driven data aggregation operators for the most common chart types. We describe how to model aggregation-based data reduction at the query level in a visualization-driven query rewriting system. Our approach is generic and applicable to any visualization system that consumes data stored in relational databases. Using real world data sets from high-tech manufacturing, stock markets, and sports analytics domains, we demonstrate that our visualization-driven data aggregation can reduce data volumes by up to two orders of magnitude,while preserving pixel-perfect visualizations of the raw data.

The Semantics of Sketch: A Visual Query System For Time Series Data

2016

Sketching allows analysts to specify complex and free-form patterns of interest. Visual query systems can make use of sketches to locate these patterns of interest in large datasets. However, sketching is ambiguous: the same drawing could represent a multitude of potential queries. In this work, we investigate these ambiguities as they apply to visual query systems for time series data. We define a class of “invariants” — the properties of a time series that the analyst wishes to ignore when performing a sketch-based query. We present the results of a crowd-sourced study, showing that these invariants are key components of how people rate the strength of match between sketch and target. We adapt a number of algorithms for time series matching to support invariants in sketches. Lastly, we present a web-deployed prototype sketch-based visual query system that relies on these invariants. We apply the prototype to example datasets from finance, the digital humanities, and political scie...

Faster visual analytics through pixel-perfect aggregation

Proceedings of the VLDB Endowment, 2014

State-of-the-art visual data analysis tools ignore bandwidth limitations. They fetch millions of records of high-volume time series data from an underlying RDBMS to eventually draw only a few thousand pixels on the screen.

Representing Unevenly-Spaced Time Series Data for Visualization and Interactive Exploration

Human-Computer …, 2005

Visualizing time series data is useful to support discovery of relations and patterns in financial, genomic, medical and other applications. In most time series, measurements are equally spaced over time. This paper discusses the challenges for unevenly-spaced time series data and presents four methods to represent them: sampled events, aggregated sampled events, event index and interleaved event index. We developed these methods while studying eBay auction data with TimeSearcher. We describe the advantages, disadvantages, choices for algorithms and parameters, and compare the different methods. Since each method has its advantages, this paper provides guidance for choosing the right combination of methods, algorithms, and parameters to solve a given problem for unevenly-spaced time series. Interaction issues such as screen resolution, response time for dynamic queries, and meaning of the visual display are governed by these decisions.

Iris: Amortized, Resource Efficient Visualizations of Voluminous Spatiotemporal Datasets

2020 IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT), 2020

The growth in observational data volumes over the past decade has occurred alongside a need to make sense of the phenomena that underpin them. Visualization is a key component of the data wrangling process that precedes the analyses that informs these insights. The crux of this study is interactive visualizations of spatiotemporal phenomena from voluminous datasets. Spatiotemporal visualizations of voluminous datasets introduce challenges relating to interactivity, overlaying multiple datasets and dynamic feature selection, resource capacity constraints, and scaling. In this study we describe our methodology to address these challenges. We rely on a novel mix of algorithms and systems innovations working in concert to ensure effective apportioning and amortization of workloads and enable interactivity during visualizations. In particular our research prototype, Iris, leverages sketching algorithms, effective query predicate generation and evaluation, avoids performance hotspots, har...

Time-series Bitmaps: a Practical Visualization Tool for Working with Large Time Series Databases

2005

The increasing interest in time series data mining in the last decade has resulted in the introduction of a variety of similarity measures, representations, and algorithms. Surprisingly, this massive research effort has had little impact on real world applications. Real world practitioners who work with time series on a daily basis rarely take advantage of the wealth of tools that the data mining community has made available. In this work, we attempt to address this problem by introducing a simple parameter-light tool that allows users to efficiently navigate through large collections of time series. Our system has the unique advantage that it can be embedded directly into any standard graphical user interfaces, such as Microsoft Windows, thus making deployment easier. Our approach extracts features from a time series of arbitrary length and uses information about the relative frequency of its features to color a bitmap in a principled way. By visualizing the similarities and differences within a collection of bitmaps, a user can quickly discover clusters, anomalies, and other regularities within their data collection. We demonstrate the utility of our approach with a set of comprehensive experiments on real datasets from a variety of domains.

Task-Driven Evaluation of Aggregation in Time Series Visualization

Many visualization tasks require the viewer to make judgments about aggregate properties of data. Recent work has shown that viewers can perform such tasks effectively, for example to efficiently compare the maximums or means over ranges of data. However, this work also shows that such effectiveness depends on the designs of the displays. In this paper, we explore this relationship between aggregation task and visualization design to provide guidance on matching tasks with designs. We combine prior results from perceptual science and graphical perception to suggest a set of design variables that influence performance on various aggregate comparison tasks. We describe how choices in these variables can lead to designs that are matched to particular tasks. We use these variables to assess a set of eight different designs, predicting how they will support a set of six aggregate time series comparison tasks. A crowd-sourced evaluation confirms these predictions. These results not only provide evidence for how the specific visualizations support various tasks, but also suggest using the identified design variables as a tool for designing visualizations well suited for various types of tasks.

An augmented visual query mechanism for finding patterns in time series data

2002

Relatively few query tools exist for data exploration and pattern identification in time series data sets. In previous work we introduced Timeboxes. Timeboxes are rectangular, direct-manipulation queries for studying time-series datasets. We demonstrated how Timeboxes can be used to support interactive exploration via dynamic queries, along with overviews of query results and drag-and-drop support for query-by-example. In this paper, we extend our work by introducing Variable Time Timeboxes (VTT).

Visualizing and discovering non-trivial patterns in large time series databases

2005

Abstract Data visualization techniques are very important for data analysis, since the human eye has been frequently advocated as the ultimate data-mining tool. However, there has been surprisingly little work on visualizing massive time series data sets. To this end, we developed VizTree, a time series pattern discovery and visualization system based on augmenting suffix trees. VizTree visually summarizes both the global and local structures of time series data at the same time.