Shunan Guo | East China Normal University (original) (raw)

Papers by Shunan Guo

Research paper thumbnail of WhatsNext: Guidance-enriched Exploratory Data Analysis with Interactive, Low-Code Notebooks

2023 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)

Research paper thumbnail of A Hypergraph Neural Network Framework for Learning Hyperedge-Dependent Node Embeddings

arXiv (Cornell University), Dec 28, 2022

In this work, we introduce a hypergraph representation learning framework called Hypergraph Neura... more In this work, we introduce a hypergraph representation learning framework called Hypergraph Neural Networks (HNN) that jointly learns hyperedge embeddings along with a set of hyperedgedependent embeddings for each node in the hypergraph. HNN derives multiple embeddings per node in the hypergraph where each embedding for a node is dependent on a specific hyperedge of that node. Notably, HNN is accurate, data-efficient, flexible with many interchangeable components, and useful for a wide range of hypergraph learning tasks. We evaluate the effectiveness of the HNN framework for hyperedge prediction and hypergraph node classification. We find that HNN achieves an overall mean gain of 7.72% and 11.37% across all baseline models and graphs for hyperedge prediction and hypergraph node classification, respectively.

Research paper thumbnail of Socrates: Data Story Generation via Adaptive Machine-Guided Elicitation of User Feedback

IEEE Transactions on Visualization and Computer Graphics

Research paper thumbnail of Survey on Visual Analysis of Event Sequence Data

arXiv (Cornell University), Jun 25, 2020

Event sequence data record series of discrete events in the time order of occurrence. They are co... more Event sequence data record series of discrete events in the time order of occurrence. They are commonly observed in a variety of applications ranging from electronic health records to network logs, with the characteristics of large-scale, high-dimensional and heterogeneous. This high complexity of event sequence data makes it difficult for analysts to manually explore and find patterns, resulting in ever-increasing needs for computational and perceptual aids from visual analytics techniques to extract and communicate insights from event sequence datasets. In this paper, we review the state-of-the-art visual analytics approaches, characterize them with our proposed design space, and categorize them based on analytical tasks and applications. From our review of relevant literature, we have also identified several remaining research challenges and future research opportunities.

Research paper thumbnail of A ML-based Approach for HTML-based Style Recommendation

Companion Proceedings of the ACM Web Conference 2023

Research paper thumbnail of De-Stijl: Facilitating Graphics Design with Interactive 2D Color Palette Recommendation

Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

Research paper thumbnail of DataPilot: Utilizing Quality and Usage Information for Subset Selection during Visual Data Preparation

Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

Research paper thumbnail of Graph Learning with Localized Neighborhood Fairness

arXiv (Cornell University), Dec 22, 2022

Learning fair graph representations for downstream applications is becoming increasingly importan... more Learning fair graph representations for downstream applications is becoming increasingly important, but existing work has mostly focused on improving fairness at the global level by either modifying the graph structure or objective function without taking into account the local neighborhood of a node. In this work, we formally introduce the notion of neighborhood fairness and develop a computational framework for learning such locally fair embeddings. We argue that the notion of neighborhood fairness is more appropriate since GNN-based models operate at the local neighborhood level of a node. Our neighborhood fairness framework has two main components that are flexible for learning fair graph representations from arbitrary data: the first aims to construct fair neighborhoods for any arbitrary node in a graph and the second enables adaption of these fair neighborhoods to better capture certain application or data-dependent constraints, such as allowing neighborhoods to be more biased towards certain attributes or neighbors in the graph. Furthermore, while link prediction has been extensively studied, we are the first to investigate the graph representation learning task of fair link classification. We demonstrate the effectiveness of the proposed neighborhood fairness framework for a variety of graph machine learning tasks including fair link prediction, link classification, and learning fair graph embeddings. Notably, our approach achieves not only better fairness but also increases the accuracy in the majority of cases across a wide variety of graphs, problem settings, and metrics.

Research paper thumbnail of Let's Get Personal: Exploring the Design of Personalized Visualizations

2022 IEEE Visualization and Visual Analytics (VIS)

Research paper thumbnail of ARShopping: In-Store Shopping Decision Support Through Augmented Reality and Immersive Visualization

2022 IEEE Visualization and Visual Analytics (VIS)

Online shopping gives customers boundless options to choose from, backed by extensive product det... more Online shopping gives customers boundless options to choose from, backed by extensive product details and customer reviews, all from the comfort of home; yet, no amount of detailed, online information can outweigh the instant gratification and hands-on understanding of a product that is provided by physical stores. However, making purchasing decisions in physical stores can be challenging due to a large number of similar alternatives and limited accessibility of the relevant product information (e.g., features, ratings, and reviews). In this work, we present ARShopping: a web-based prototype to visually communicate detailed product information from an online setting on portable smart devices (e.g., phones, tablets, glasses), within the physical space at the point of purchase. This prototype uses augmented reality (AR) to identify products and display detailed information to help consumers make purchasing decisions that fulfill their needs while decreasing the decision-making time. In particular, we use a data fusion algorithm to improve the precision of the product detection; we then integrate AR visualizations into the scene to facilitate comparisons across multiple products and features. We designed our prototype based on interviews with 14 participants to better understand the utility and ease of use of the prototype.

Research paper thumbnail of Direct Embedding of Temporal Network Edges via Time-Decayed Line Graphs

arXiv (Cornell University), Sep 30, 2022

Temporal networks model a variety of important phenomena involving timed interactions between ent... more Temporal networks model a variety of important phenomena involving timed interactions between entities. Existing methods for machine learning on temporal networks generally exhibit at least one of two limitations. First, time is assumed to be discretized, so if the time data is continuous, the user must determine the discretization and discard precise time information. Second, edge representations can only be calculated indirectly from the nodes, which may be suboptimal for tasks like edge classification. We present a simple method that avoids both shortcomings: construct the line graph of the network, which includes a node for each interaction, and weigh the edges of this graph based on the difference in time between interactions. From this derived graph, edge representations for the original network can be computed with efficient classical methods. The simplicity of this approach facilitates explicit theoretical analysis: we can constructively show the effectiveness of our method's representations for a natural synthetic model of temporal networks. Empirical results on real-world networks demonstrate our method's efficacy and efficiency on both edge classification and temporal link prediction.

Research paper thumbnail of Evaluating the Use of Uncertainty Visualisations for Imputations of Data Missing At Random in Scatterplots

Most real-world datasets contain missing values yet most exploratory data analysis (EDA) systems ... more Most real-world datasets contain missing values yet most exploratory data analysis (EDA) systems only support visualising data points with complete cases. This omission may potentially lead the user to biased analyses and insights. Imputation techniques can help estimate the value of a missing data point, but introduces additional uncertainty. In this work, we investigate the effects of visualising imputed values in charts using different ways of representing data imputations and imputation uncertainty—no imputation, mean, 95% confidence intervals, probability density plots, gradient intervals, and hypothetical outcome plots. We focus on scatterplots, which is a commonly used chart type, and conduct a crowdsourced study with 202 participants. We measure users’ bias and precision in performing two tasks—estimating average and detecting trend—and their self-reported confidence in performing these tasks. Our results suggest that, when estimating averages, uncertainty representations ma...

Research paper thumbnail of Detecting Email Components with Constraints: Expressive and Extensible Models in Answer Set Programming

Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems

Detecting structures and components in business emails is vital for editor software to convert th... more Detecting structures and components in business emails is vital for editor software to convert third-party emails so that designers can edit them without needing to know how HTML works. In a production environment, the challenge is to make the model easy to be understood and maintained by different stakeholders. We propose detecting email components with a collection of constraints written in Answer Set Programming (ASP). Hard constraints can detect well-defined components like email layouts, and soft constraints can incorporate ML to detect custom components like buttons and titles in emails. Using constraints, developers can apply their domain knowledge to the model and express them in a concrete, extensible, and deterministic form. We demonstrate the effectiveness with a prototype and evaluations from real datasets.

Research paper thumbnail of ARShopping: In-Store Shopping Decision Support Through Augmented Reality and Immersive Visualization

arXiv (Cornell University), Jul 15, 2022

Online shopping gives customers boundless options to choose from, backed by extensive product det... more Online shopping gives customers boundless options to choose from, backed by extensive product details and customer reviews, all from the comfort of home; yet, no amount of detailed, online information can outweigh the instant gratification and hands-on understanding of a product that is provided by physical stores. However, making purchasing decisions in physical stores can be challenging due to a large number of similar alternatives and limited accessibility of the relevant product information (e.g., features, ratings, and reviews). In this work, we present ARShopping: a web-based prototype to visually communicate detailed product information from an online setting on portable smart devices (e.g., phones, tablets, glasses), within the physical space at the point of purchase. This prototype uses augmented reality (AR) to identify products and display detailed information to help consumers make purchasing decisions that fulfill their needs while decreasing the decision-making time. In particular, we use a data fusion algorithm to improve the precision of the product detection; we then integrate AR visualizations into the scene to facilitate comparisons across multiple products and features. We designed our prototype based on interviews with 14 participants to better understand the utility and ease of use of the prototype.

Research paper thumbnail of VisGNN: Personalized Visualization Recommendationvia Graph Neural Networks

Proceedings of the ACM Web Conference 2022

In this work, we develop a Graph Neural Network (GNN) framework for the problem of personalized v... more In this work, we develop a Graph Neural Network (GNN) framework for the problem of personalized visualization recommendation. The GNN-based framework first represents the large corpus of datasets and visualizations from users as a large heterogeneous graph. Then, it decomposes a visualization into its data and visual components, and then jointly models each of them as a large graph to obtain embeddings of the users, attributes (across all datasets in the corpus), and visual-configurations. From these user-specific embeddings of the attributes and visual-configurations, we can predict the probability of any visualization arising from a specific user. Finally, the experiments demonstrated the effectiveness of using graph neural networks for automatic and personalized recommendation of visualizations to specific users based on their data and visual (design choice) preferences. To the best of our knowledge, this is the first such work to develop and leverage GNNs for this problem.

Research paper thumbnail of Cicero: A Declarative Grammar for Responsive Visualization

CHI Conference on Human Factors in Computing Systems

Figure desktop and mobile versions, respectively. The mobile versions of the Oil Spills case are ... more Figure desktop and mobile versions, respectively. The mobile versions of the Oil Spills case are from (1) the original article and (2) the version suggested by Hofswell et al. [13]. Full size images are included in the Supplemental Material (https://osf.io/eg4xq). 1: Thirteen responsive visualization use cases reproduced using Cicero. The blue-and gray-bordered views are the

Research paper thumbnail of Visual Anomaly Detection in Event Sequence Data

2019 IEEE International Conference on Big Data (Big Data)

Anomaly detection is a common analytical task that aims to identify rare cases that differ from t... more Anomaly detection is a common analytical task that aims to identify rare cases that differ from the typical cases that make up the majority of a dataset. When applied to the analysis of event sequence data, the task of anomaly detection can be complex because the sequential and temporal nature of such data results in diverse definitions and flexible forms of anomalies. This, in turn, increases the difficulty in interpreting detected anomalies. In this paper, we propose an unsupervised anomaly detection algorithm based on Variational AutoEncoders (VAE) to estimate underlying normal progressions for each given sequence represented as occurrence probabilities of events along the sequence progression. Events in violation of their occurrence probability are identified as abnormal. We also introduce a visualization system, EventThread3 (ET 3), to support interactive exploration and interpretations of anomalies within the context of normal sequence progressions in the dataset through comprehensive one-to-many sequence comparison. Finally, we quantitatively evaluate the performance of our anomaly detection algorithm and demonstrate the effectiveness of our system through a case study.

Research paper thumbnail of Survey on Visual Analysis of Event Sequence Data

IEEE transactions on visualization and computer graphics, 2021

Event sequence data record series of discrete events in the time order of occurrence. They are co... more Event sequence data record series of discrete events in the time order of occurrence. They are commonly observed in a variety of applications ranging from electronic health records to network logs, with the characteristics of large-scale, high-dimensional and heterogeneous. This high complexity of event sequence data makes it difficult for analysts to manually explore and find patterns, resulting in ever-increasing needs for computational and perceptual aids from visual analytics techniques to extract and communicate insights from event sequence datasets. In this paper, we review the state-of-the-art visual analytics approaches, characterize them with our proposed design space, and categorize them based on analytical tasks and applications.

Research paper thumbnail of Interactive Event Sequence Prediction for Marketing Analysts

Copyright held by the owner/author(s). CHI’20 Extended Abstracts, April 25–30, 2020, Honolulu, HI... more Copyright held by the owner/author(s). CHI’20 Extended Abstracts, April 25–30, 2020, Honolulu, HI, USA ACM 978-1-4503-6819-3/20/04. https://doi.org/10.1145/3334480.3382971 Abstract Timestamped event sequences are analyzed to tackle varied problems but have unique challenges in interpretation and analysis. Especially in event sequence prediction, it is difficult to convey the results due to the added uncertainty and complexity introduced by predictive models. In this work, we design and develop ProFlow, a visual analytics system for supporting analysts’ workflow of exploring and predicting event sequences. Through an evaluation conducted with four data analysts in a real-world marketing scenario, we discuss the applicability and usefulness of ProFlow as well as its limitations and future directions.

Research paper thumbnail of Interpretable Anomaly Detection in Event Sequences via Sequence Matching and Visual Comparison

IEEE transactions on visualization and computer graphics, 2021

Anomaly detection is a common analytical task that aims to identify rare cases that differ from t... more Anomaly detection is a common analytical task that aims to identify rare cases that differ from the typical cases that make up the majority of a dataset. When analyzing event sequence data, the task of anomaly detection can be complex because the sequential and temporal nature of such data results in diverse definitions and flexible forms of anomalies. This, in turn, increases the difficulty in interpreting detected anomalies. In this paper, we propose a visual analytic approach for detecting anomalous sequences in an event sequence dataset via an unsupervised anomaly detection algorithm based on Variational AutoEncoders. We further compare the anomalous sequences with their reconstructions and with the normal sequences through a sequence matching algorithm to identify event anomalies. A visual analytics system is developed to support interactive exploration and interpretations of anomalies through novel visualization designs that facilitate the comparison between anomalous sequence...

Research paper thumbnail of WhatsNext: Guidance-enriched Exploratory Data Analysis with Interactive, Low-Code Notebooks

2023 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)

Research paper thumbnail of A Hypergraph Neural Network Framework for Learning Hyperedge-Dependent Node Embeddings

arXiv (Cornell University), Dec 28, 2022

In this work, we introduce a hypergraph representation learning framework called Hypergraph Neura... more In this work, we introduce a hypergraph representation learning framework called Hypergraph Neural Networks (HNN) that jointly learns hyperedge embeddings along with a set of hyperedgedependent embeddings for each node in the hypergraph. HNN derives multiple embeddings per node in the hypergraph where each embedding for a node is dependent on a specific hyperedge of that node. Notably, HNN is accurate, data-efficient, flexible with many interchangeable components, and useful for a wide range of hypergraph learning tasks. We evaluate the effectiveness of the HNN framework for hyperedge prediction and hypergraph node classification. We find that HNN achieves an overall mean gain of 7.72% and 11.37% across all baseline models and graphs for hyperedge prediction and hypergraph node classification, respectively.

Research paper thumbnail of Socrates: Data Story Generation via Adaptive Machine-Guided Elicitation of User Feedback

IEEE Transactions on Visualization and Computer Graphics

Research paper thumbnail of Survey on Visual Analysis of Event Sequence Data

arXiv (Cornell University), Jun 25, 2020

Event sequence data record series of discrete events in the time order of occurrence. They are co... more Event sequence data record series of discrete events in the time order of occurrence. They are commonly observed in a variety of applications ranging from electronic health records to network logs, with the characteristics of large-scale, high-dimensional and heterogeneous. This high complexity of event sequence data makes it difficult for analysts to manually explore and find patterns, resulting in ever-increasing needs for computational and perceptual aids from visual analytics techniques to extract and communicate insights from event sequence datasets. In this paper, we review the state-of-the-art visual analytics approaches, characterize them with our proposed design space, and categorize them based on analytical tasks and applications. From our review of relevant literature, we have also identified several remaining research challenges and future research opportunities.

Research paper thumbnail of A ML-based Approach for HTML-based Style Recommendation

Companion Proceedings of the ACM Web Conference 2023

Research paper thumbnail of De-Stijl: Facilitating Graphics Design with Interactive 2D Color Palette Recommendation

Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

Research paper thumbnail of DataPilot: Utilizing Quality and Usage Information for Subset Selection during Visual Data Preparation

Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

Research paper thumbnail of Graph Learning with Localized Neighborhood Fairness

arXiv (Cornell University), Dec 22, 2022

Learning fair graph representations for downstream applications is becoming increasingly importan... more Learning fair graph representations for downstream applications is becoming increasingly important, but existing work has mostly focused on improving fairness at the global level by either modifying the graph structure or objective function without taking into account the local neighborhood of a node. In this work, we formally introduce the notion of neighborhood fairness and develop a computational framework for learning such locally fair embeddings. We argue that the notion of neighborhood fairness is more appropriate since GNN-based models operate at the local neighborhood level of a node. Our neighborhood fairness framework has two main components that are flexible for learning fair graph representations from arbitrary data: the first aims to construct fair neighborhoods for any arbitrary node in a graph and the second enables adaption of these fair neighborhoods to better capture certain application or data-dependent constraints, such as allowing neighborhoods to be more biased towards certain attributes or neighbors in the graph. Furthermore, while link prediction has been extensively studied, we are the first to investigate the graph representation learning task of fair link classification. We demonstrate the effectiveness of the proposed neighborhood fairness framework for a variety of graph machine learning tasks including fair link prediction, link classification, and learning fair graph embeddings. Notably, our approach achieves not only better fairness but also increases the accuracy in the majority of cases across a wide variety of graphs, problem settings, and metrics.

Research paper thumbnail of Let's Get Personal: Exploring the Design of Personalized Visualizations

2022 IEEE Visualization and Visual Analytics (VIS)

Research paper thumbnail of ARShopping: In-Store Shopping Decision Support Through Augmented Reality and Immersive Visualization

2022 IEEE Visualization and Visual Analytics (VIS)

Online shopping gives customers boundless options to choose from, backed by extensive product det... more Online shopping gives customers boundless options to choose from, backed by extensive product details and customer reviews, all from the comfort of home; yet, no amount of detailed, online information can outweigh the instant gratification and hands-on understanding of a product that is provided by physical stores. However, making purchasing decisions in physical stores can be challenging due to a large number of similar alternatives and limited accessibility of the relevant product information (e.g., features, ratings, and reviews). In this work, we present ARShopping: a web-based prototype to visually communicate detailed product information from an online setting on portable smart devices (e.g., phones, tablets, glasses), within the physical space at the point of purchase. This prototype uses augmented reality (AR) to identify products and display detailed information to help consumers make purchasing decisions that fulfill their needs while decreasing the decision-making time. In particular, we use a data fusion algorithm to improve the precision of the product detection; we then integrate AR visualizations into the scene to facilitate comparisons across multiple products and features. We designed our prototype based on interviews with 14 participants to better understand the utility and ease of use of the prototype.

Research paper thumbnail of Direct Embedding of Temporal Network Edges via Time-Decayed Line Graphs

arXiv (Cornell University), Sep 30, 2022

Temporal networks model a variety of important phenomena involving timed interactions between ent... more Temporal networks model a variety of important phenomena involving timed interactions between entities. Existing methods for machine learning on temporal networks generally exhibit at least one of two limitations. First, time is assumed to be discretized, so if the time data is continuous, the user must determine the discretization and discard precise time information. Second, edge representations can only be calculated indirectly from the nodes, which may be suboptimal for tasks like edge classification. We present a simple method that avoids both shortcomings: construct the line graph of the network, which includes a node for each interaction, and weigh the edges of this graph based on the difference in time between interactions. From this derived graph, edge representations for the original network can be computed with efficient classical methods. The simplicity of this approach facilitates explicit theoretical analysis: we can constructively show the effectiveness of our method's representations for a natural synthetic model of temporal networks. Empirical results on real-world networks demonstrate our method's efficacy and efficiency on both edge classification and temporal link prediction.

Research paper thumbnail of Evaluating the Use of Uncertainty Visualisations for Imputations of Data Missing At Random in Scatterplots

Most real-world datasets contain missing values yet most exploratory data analysis (EDA) systems ... more Most real-world datasets contain missing values yet most exploratory data analysis (EDA) systems only support visualising data points with complete cases. This omission may potentially lead the user to biased analyses and insights. Imputation techniques can help estimate the value of a missing data point, but introduces additional uncertainty. In this work, we investigate the effects of visualising imputed values in charts using different ways of representing data imputations and imputation uncertainty—no imputation, mean, 95% confidence intervals, probability density plots, gradient intervals, and hypothetical outcome plots. We focus on scatterplots, which is a commonly used chart type, and conduct a crowdsourced study with 202 participants. We measure users’ bias and precision in performing two tasks—estimating average and detecting trend—and their self-reported confidence in performing these tasks. Our results suggest that, when estimating averages, uncertainty representations ma...

Research paper thumbnail of Detecting Email Components with Constraints: Expressive and Extensible Models in Answer Set Programming

Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems

Detecting structures and components in business emails is vital for editor software to convert th... more Detecting structures and components in business emails is vital for editor software to convert third-party emails so that designers can edit them without needing to know how HTML works. In a production environment, the challenge is to make the model easy to be understood and maintained by different stakeholders. We propose detecting email components with a collection of constraints written in Answer Set Programming (ASP). Hard constraints can detect well-defined components like email layouts, and soft constraints can incorporate ML to detect custom components like buttons and titles in emails. Using constraints, developers can apply their domain knowledge to the model and express them in a concrete, extensible, and deterministic form. We demonstrate the effectiveness with a prototype and evaluations from real datasets.

Research paper thumbnail of ARShopping: In-Store Shopping Decision Support Through Augmented Reality and Immersive Visualization

arXiv (Cornell University), Jul 15, 2022

Online shopping gives customers boundless options to choose from, backed by extensive product det... more Online shopping gives customers boundless options to choose from, backed by extensive product details and customer reviews, all from the comfort of home; yet, no amount of detailed, online information can outweigh the instant gratification and hands-on understanding of a product that is provided by physical stores. However, making purchasing decisions in physical stores can be challenging due to a large number of similar alternatives and limited accessibility of the relevant product information (e.g., features, ratings, and reviews). In this work, we present ARShopping: a web-based prototype to visually communicate detailed product information from an online setting on portable smart devices (e.g., phones, tablets, glasses), within the physical space at the point of purchase. This prototype uses augmented reality (AR) to identify products and display detailed information to help consumers make purchasing decisions that fulfill their needs while decreasing the decision-making time. In particular, we use a data fusion algorithm to improve the precision of the product detection; we then integrate AR visualizations into the scene to facilitate comparisons across multiple products and features. We designed our prototype based on interviews with 14 participants to better understand the utility and ease of use of the prototype.

Research paper thumbnail of VisGNN: Personalized Visualization Recommendationvia Graph Neural Networks

Proceedings of the ACM Web Conference 2022

In this work, we develop a Graph Neural Network (GNN) framework for the problem of personalized v... more In this work, we develop a Graph Neural Network (GNN) framework for the problem of personalized visualization recommendation. The GNN-based framework first represents the large corpus of datasets and visualizations from users as a large heterogeneous graph. Then, it decomposes a visualization into its data and visual components, and then jointly models each of them as a large graph to obtain embeddings of the users, attributes (across all datasets in the corpus), and visual-configurations. From these user-specific embeddings of the attributes and visual-configurations, we can predict the probability of any visualization arising from a specific user. Finally, the experiments demonstrated the effectiveness of using graph neural networks for automatic and personalized recommendation of visualizations to specific users based on their data and visual (design choice) preferences. To the best of our knowledge, this is the first such work to develop and leverage GNNs for this problem.

Research paper thumbnail of Cicero: A Declarative Grammar for Responsive Visualization

CHI Conference on Human Factors in Computing Systems

Figure desktop and mobile versions, respectively. The mobile versions of the Oil Spills case are ... more Figure desktop and mobile versions, respectively. The mobile versions of the Oil Spills case are from (1) the original article and (2) the version suggested by Hofswell et al. [13]. Full size images are included in the Supplemental Material (https://osf.io/eg4xq). 1: Thirteen responsive visualization use cases reproduced using Cicero. The blue-and gray-bordered views are the

Research paper thumbnail of Visual Anomaly Detection in Event Sequence Data

2019 IEEE International Conference on Big Data (Big Data)

Anomaly detection is a common analytical task that aims to identify rare cases that differ from t... more Anomaly detection is a common analytical task that aims to identify rare cases that differ from the typical cases that make up the majority of a dataset. When applied to the analysis of event sequence data, the task of anomaly detection can be complex because the sequential and temporal nature of such data results in diverse definitions and flexible forms of anomalies. This, in turn, increases the difficulty in interpreting detected anomalies. In this paper, we propose an unsupervised anomaly detection algorithm based on Variational AutoEncoders (VAE) to estimate underlying normal progressions for each given sequence represented as occurrence probabilities of events along the sequence progression. Events in violation of their occurrence probability are identified as abnormal. We also introduce a visualization system, EventThread3 (ET 3), to support interactive exploration and interpretations of anomalies within the context of normal sequence progressions in the dataset through comprehensive one-to-many sequence comparison. Finally, we quantitatively evaluate the performance of our anomaly detection algorithm and demonstrate the effectiveness of our system through a case study.

Research paper thumbnail of Survey on Visual Analysis of Event Sequence Data

IEEE transactions on visualization and computer graphics, 2021

Event sequence data record series of discrete events in the time order of occurrence. They are co... more Event sequence data record series of discrete events in the time order of occurrence. They are commonly observed in a variety of applications ranging from electronic health records to network logs, with the characteristics of large-scale, high-dimensional and heterogeneous. This high complexity of event sequence data makes it difficult for analysts to manually explore and find patterns, resulting in ever-increasing needs for computational and perceptual aids from visual analytics techniques to extract and communicate insights from event sequence datasets. In this paper, we review the state-of-the-art visual analytics approaches, characterize them with our proposed design space, and categorize them based on analytical tasks and applications.

Research paper thumbnail of Interactive Event Sequence Prediction for Marketing Analysts

Copyright held by the owner/author(s). CHI’20 Extended Abstracts, April 25–30, 2020, Honolulu, HI... more Copyright held by the owner/author(s). CHI’20 Extended Abstracts, April 25–30, 2020, Honolulu, HI, USA ACM 978-1-4503-6819-3/20/04. https://doi.org/10.1145/3334480.3382971 Abstract Timestamped event sequences are analyzed to tackle varied problems but have unique challenges in interpretation and analysis. Especially in event sequence prediction, it is difficult to convey the results due to the added uncertainty and complexity introduced by predictive models. In this work, we design and develop ProFlow, a visual analytics system for supporting analysts’ workflow of exploring and predicting event sequences. Through an evaluation conducted with four data analysts in a real-world marketing scenario, we discuss the applicability and usefulness of ProFlow as well as its limitations and future directions.

Research paper thumbnail of Interpretable Anomaly Detection in Event Sequences via Sequence Matching and Visual Comparison

IEEE transactions on visualization and computer graphics, 2021

Anomaly detection is a common analytical task that aims to identify rare cases that differ from t... more Anomaly detection is a common analytical task that aims to identify rare cases that differ from the typical cases that make up the majority of a dataset. When analyzing event sequence data, the task of anomaly detection can be complex because the sequential and temporal nature of such data results in diverse definitions and flexible forms of anomalies. This, in turn, increases the difficulty in interpreting detected anomalies. In this paper, we propose a visual analytic approach for detecting anomalous sequences in an event sequence dataset via an unsupervised anomaly detection algorithm based on Variational AutoEncoders. We further compare the anomalous sequences with their reconstructions and with the normal sequences through a sequence matching algorithm to identify event anomalies. A visual analytics system is developed to support interactive exploration and interpretations of anomalies through novel visualization designs that facilitate the comparison between anomalous sequence...