Shunan Guo | East China Normal University (original) (raw)
Papers by Shunan Guo
2023 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)
arXiv (Cornell University), Dec 28, 2022
In this work, we introduce a hypergraph representation learning framework called Hypergraph Neura... more In this work, we introduce a hypergraph representation learning framework called Hypergraph Neural Networks (HNN) that jointly learns hyperedge embeddings along with a set of hyperedgedependent embeddings for each node in the hypergraph. HNN derives multiple embeddings per node in the hypergraph where each embedding for a node is dependent on a specific hyperedge of that node. Notably, HNN is accurate, data-efficient, flexible with many interchangeable components, and useful for a wide range of hypergraph learning tasks. We evaluate the effectiveness of the HNN framework for hyperedge prediction and hypergraph node classification. We find that HNN achieves an overall mean gain of 7.72% and 11.37% across all baseline models and graphs for hyperedge prediction and hypergraph node classification, respectively.
IEEE Transactions on Visualization and Computer Graphics
arXiv (Cornell University), Jun 25, 2020
Event sequence data record series of discrete events in the time order of occurrence. They are co... more Event sequence data record series of discrete events in the time order of occurrence. They are commonly observed in a variety of applications ranging from electronic health records to network logs, with the characteristics of large-scale, high-dimensional and heterogeneous. This high complexity of event sequence data makes it difficult for analysts to manually explore and find patterns, resulting in ever-increasing needs for computational and perceptual aids from visual analytics techniques to extract and communicate insights from event sequence datasets. In this paper, we review the state-of-the-art visual analytics approaches, characterize them with our proposed design space, and categorize them based on analytical tasks and applications. From our review of relevant literature, we have also identified several remaining research challenges and future research opportunities.
Companion Proceedings of the ACM Web Conference 2023
Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
arXiv (Cornell University), Dec 22, 2022
Learning fair graph representations for downstream applications is becoming increasingly importan... more Learning fair graph representations for downstream applications is becoming increasingly important, but existing work has mostly focused on improving fairness at the global level by either modifying the graph structure or objective function without taking into account the local neighborhood of a node. In this work, we formally introduce the notion of neighborhood fairness and develop a computational framework for learning such locally fair embeddings. We argue that the notion of neighborhood fairness is more appropriate since GNN-based models operate at the local neighborhood level of a node. Our neighborhood fairness framework has two main components that are flexible for learning fair graph representations from arbitrary data: the first aims to construct fair neighborhoods for any arbitrary node in a graph and the second enables adaption of these fair neighborhoods to better capture certain application or data-dependent constraints, such as allowing neighborhoods to be more biased towards certain attributes or neighbors in the graph. Furthermore, while link prediction has been extensively studied, we are the first to investigate the graph representation learning task of fair link classification. We demonstrate the effectiveness of the proposed neighborhood fairness framework for a variety of graph machine learning tasks including fair link prediction, link classification, and learning fair graph embeddings. Notably, our approach achieves not only better fairness but also increases the accuracy in the majority of cases across a wide variety of graphs, problem settings, and metrics.
2022 IEEE Visualization and Visual Analytics (VIS)
2022 IEEE Visualization and Visual Analytics (VIS)
Online shopping gives customers boundless options to choose from, backed by extensive product det... more Online shopping gives customers boundless options to choose from, backed by extensive product details and customer reviews, all from the comfort of home; yet, no amount of detailed, online information can outweigh the instant gratification and hands-on understanding of a product that is provided by physical stores. However, making purchasing decisions in physical stores can be challenging due to a large number of similar alternatives and limited accessibility of the relevant product information (e.g., features, ratings, and reviews). In this work, we present ARShopping: a web-based prototype to visually communicate detailed product information from an online setting on portable smart devices (e.g., phones, tablets, glasses), within the physical space at the point of purchase. This prototype uses augmented reality (AR) to identify products and display detailed information to help consumers make purchasing decisions that fulfill their needs while decreasing the decision-making time. In particular, we use a data fusion algorithm to improve the precision of the product detection; we then integrate AR visualizations into the scene to facilitate comparisons across multiple products and features. We designed our prototype based on interviews with 14 participants to better understand the utility and ease of use of the prototype.
arXiv (Cornell University), Sep 30, 2022
Temporal networks model a variety of important phenomena involving timed interactions between ent... more Temporal networks model a variety of important phenomena involving timed interactions between entities. Existing methods for machine learning on temporal networks generally exhibit at least one of two limitations. First, time is assumed to be discretized, so if the time data is continuous, the user must determine the discretization and discard precise time information. Second, edge representations can only be calculated indirectly from the nodes, which may be suboptimal for tasks like edge classification. We present a simple method that avoids both shortcomings: construct the line graph of the network, which includes a node for each interaction, and weigh the edges of this graph based on the difference in time between interactions. From this derived graph, edge representations for the original network can be computed with efficient classical methods. The simplicity of this approach facilitates explicit theoretical analysis: we can constructively show the effectiveness of our method's representations for a natural synthetic model of temporal networks. Empirical results on real-world networks demonstrate our method's efficacy and efficiency on both edge classification and temporal link prediction.
Most real-world datasets contain missing values yet most exploratory data analysis (EDA) systems ... more Most real-world datasets contain missing values yet most exploratory data analysis (EDA) systems only support visualising data points with complete cases. This omission may potentially lead the user to biased analyses and insights. Imputation techniques can help estimate the value of a missing data point, but introduces additional uncertainty. In this work, we investigate the effects of visualising imputed values in charts using different ways of representing data imputations and imputation uncertainty—no imputation, mean, 95% confidence intervals, probability density plots, gradient intervals, and hypothetical outcome plots. We focus on scatterplots, which is a commonly used chart type, and conduct a crowdsourced study with 202 participants. We measure users’ bias and precision in performing two tasks—estimating average and detecting trend—and their self-reported confidence in performing these tasks. Our results suggest that, when estimating averages, uncertainty representations ma...
Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems
Detecting structures and components in business emails is vital for editor software to convert th... more Detecting structures and components in business emails is vital for editor software to convert third-party emails so that designers can edit them without needing to know how HTML works. In a production environment, the challenge is to make the model easy to be understood and maintained by different stakeholders. We propose detecting email components with a collection of constraints written in Answer Set Programming (ASP). Hard constraints can detect well-defined components like email layouts, and soft constraints can incorporate ML to detect custom components like buttons and titles in emails. Using constraints, developers can apply their domain knowledge to the model and express them in a concrete, extensible, and deterministic form. We demonstrate the effectiveness with a prototype and evaluations from real datasets.
arXiv (Cornell University), Jul 15, 2022
Online shopping gives customers boundless options to choose from, backed by extensive product det... more Online shopping gives customers boundless options to choose from, backed by extensive product details and customer reviews, all from the comfort of home; yet, no amount of detailed, online information can outweigh the instant gratification and hands-on understanding of a product that is provided by physical stores. However, making purchasing decisions in physical stores can be challenging due to a large number of similar alternatives and limited accessibility of the relevant product information (e.g., features, ratings, and reviews). In this work, we present ARShopping: a web-based prototype to visually communicate detailed product information from an online setting on portable smart devices (e.g., phones, tablets, glasses), within the physical space at the point of purchase. This prototype uses augmented reality (AR) to identify products and display detailed information to help consumers make purchasing decisions that fulfill their needs while decreasing the decision-making time. In particular, we use a data fusion algorithm to improve the precision of the product detection; we then integrate AR visualizations into the scene to facilitate comparisons across multiple products and features. We designed our prototype based on interviews with 14 participants to better understand the utility and ease of use of the prototype.
Proceedings of the ACM Web Conference 2022
In this work, we develop a Graph Neural Network (GNN) framework for the problem of personalized v... more In this work, we develop a Graph Neural Network (GNN) framework for the problem of personalized visualization recommendation. The GNN-based framework first represents the large corpus of datasets and visualizations from users as a large heterogeneous graph. Then, it decomposes a visualization into its data and visual components, and then jointly models each of them as a large graph to obtain embeddings of the users, attributes (across all datasets in the corpus), and visual-configurations. From these user-specific embeddings of the attributes and visual-configurations, we can predict the probability of any visualization arising from a specific user. Finally, the experiments demonstrated the effectiveness of using graph neural networks for automatic and personalized recommendation of visualizations to specific users based on their data and visual (design choice) preferences. To the best of our knowledge, this is the first such work to develop and leverage GNNs for this problem.
CHI Conference on Human Factors in Computing Systems
Figure desktop and mobile versions, respectively. The mobile versions of the Oil Spills case are ... more Figure desktop and mobile versions, respectively. The mobile versions of the Oil Spills case are from (1) the original article and (2) the version suggested by Hofswell et al. [13]. Full size images are included in the Supplemental Material (https://osf.io/eg4xq). 1: Thirteen responsive visualization use cases reproduced using Cicero. The blue-and gray-bordered views are the
2019 IEEE International Conference on Big Data (Big Data)
Anomaly detection is a common analytical task that aims to identify rare cases that differ from t... more Anomaly detection is a common analytical task that aims to identify rare cases that differ from the typical cases that make up the majority of a dataset. When applied to the analysis of event sequence data, the task of anomaly detection can be complex because the sequential and temporal nature of such data results in diverse definitions and flexible forms of anomalies. This, in turn, increases the difficulty in interpreting detected anomalies. In this paper, we propose an unsupervised anomaly detection algorithm based on Variational AutoEncoders (VAE) to estimate underlying normal progressions for each given sequence represented as occurrence probabilities of events along the sequence progression. Events in violation of their occurrence probability are identified as abnormal. We also introduce a visualization system, EventThread3 (ET 3), to support interactive exploration and interpretations of anomalies within the context of normal sequence progressions in the dataset through comprehensive one-to-many sequence comparison. Finally, we quantitatively evaluate the performance of our anomaly detection algorithm and demonstrate the effectiveness of our system through a case study.
IEEE transactions on visualization and computer graphics, 2021
Event sequence data record series of discrete events in the time order of occurrence. They are co... more Event sequence data record series of discrete events in the time order of occurrence. They are commonly observed in a variety of applications ranging from electronic health records to network logs, with the characteristics of large-scale, high-dimensional and heterogeneous. This high complexity of event sequence data makes it difficult for analysts to manually explore and find patterns, resulting in ever-increasing needs for computational and perceptual aids from visual analytics techniques to extract and communicate insights from event sequence datasets. In this paper, we review the state-of-the-art visual analytics approaches, characterize them with our proposed design space, and categorize them based on analytical tasks and applications.
Copyright held by the owner/author(s). CHI’20 Extended Abstracts, April 25–30, 2020, Honolulu, HI... more Copyright held by the owner/author(s). CHI’20 Extended Abstracts, April 25–30, 2020, Honolulu, HI, USA ACM 978-1-4503-6819-3/20/04. https://doi.org/10.1145/3334480.3382971 Abstract Timestamped event sequences are analyzed to tackle varied problems but have unique challenges in interpretation and analysis. Especially in event sequence prediction, it is difficult to convey the results due to the added uncertainty and complexity introduced by predictive models. In this work, we design and develop ProFlow, a visual analytics system for supporting analysts’ workflow of exploring and predicting event sequences. Through an evaluation conducted with four data analysts in a real-world marketing scenario, we discuss the applicability and usefulness of ProFlow as well as its limitations and future directions.
IEEE transactions on visualization and computer graphics, 2021
Anomaly detection is a common analytical task that aims to identify rare cases that differ from t... more Anomaly detection is a common analytical task that aims to identify rare cases that differ from the typical cases that make up the majority of a dataset. When analyzing event sequence data, the task of anomaly detection can be complex because the sequential and temporal nature of such data results in diverse definitions and flexible forms of anomalies. This, in turn, increases the difficulty in interpreting detected anomalies. In this paper, we propose a visual analytic approach for detecting anomalous sequences in an event sequence dataset via an unsupervised anomaly detection algorithm based on Variational AutoEncoders. We further compare the anomalous sequences with their reconstructions and with the normal sequences through a sequence matching algorithm to identify event anomalies. A visual analytics system is developed to support interactive exploration and interpretations of anomalies through novel visualization designs that facilitate the comparison between anomalous sequence...
2023 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)
arXiv (Cornell University), Dec 28, 2022
In this work, we introduce a hypergraph representation learning framework called Hypergraph Neura... more In this work, we introduce a hypergraph representation learning framework called Hypergraph Neural Networks (HNN) that jointly learns hyperedge embeddings along with a set of hyperedgedependent embeddings for each node in the hypergraph. HNN derives multiple embeddings per node in the hypergraph where each embedding for a node is dependent on a specific hyperedge of that node. Notably, HNN is accurate, data-efficient, flexible with many interchangeable components, and useful for a wide range of hypergraph learning tasks. We evaluate the effectiveness of the HNN framework for hyperedge prediction and hypergraph node classification. We find that HNN achieves an overall mean gain of 7.72% and 11.37% across all baseline models and graphs for hyperedge prediction and hypergraph node classification, respectively.
IEEE Transactions on Visualization and Computer Graphics
arXiv (Cornell University), Jun 25, 2020
Event sequence data record series of discrete events in the time order of occurrence. They are co... more Event sequence data record series of discrete events in the time order of occurrence. They are commonly observed in a variety of applications ranging from electronic health records to network logs, with the characteristics of large-scale, high-dimensional and heterogeneous. This high complexity of event sequence data makes it difficult for analysts to manually explore and find patterns, resulting in ever-increasing needs for computational and perceptual aids from visual analytics techniques to extract and communicate insights from event sequence datasets. In this paper, we review the state-of-the-art visual analytics approaches, characterize them with our proposed design space, and categorize them based on analytical tasks and applications. From our review of relevant literature, we have also identified several remaining research challenges and future research opportunities.
Companion Proceedings of the ACM Web Conference 2023
Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
arXiv (Cornell University), Dec 22, 2022
Learning fair graph representations for downstream applications is becoming increasingly importan... more Learning fair graph representations for downstream applications is becoming increasingly important, but existing work has mostly focused on improving fairness at the global level by either modifying the graph structure or objective function without taking into account the local neighborhood of a node. In this work, we formally introduce the notion of neighborhood fairness and develop a computational framework for learning such locally fair embeddings. We argue that the notion of neighborhood fairness is more appropriate since GNN-based models operate at the local neighborhood level of a node. Our neighborhood fairness framework has two main components that are flexible for learning fair graph representations from arbitrary data: the first aims to construct fair neighborhoods for any arbitrary node in a graph and the second enables adaption of these fair neighborhoods to better capture certain application or data-dependent constraints, such as allowing neighborhoods to be more biased towards certain attributes or neighbors in the graph. Furthermore, while link prediction has been extensively studied, we are the first to investigate the graph representation learning task of fair link classification. We demonstrate the effectiveness of the proposed neighborhood fairness framework for a variety of graph machine learning tasks including fair link prediction, link classification, and learning fair graph embeddings. Notably, our approach achieves not only better fairness but also increases the accuracy in the majority of cases across a wide variety of graphs, problem settings, and metrics.
2022 IEEE Visualization and Visual Analytics (VIS)
2022 IEEE Visualization and Visual Analytics (VIS)
Online shopping gives customers boundless options to choose from, backed by extensive product det... more Online shopping gives customers boundless options to choose from, backed by extensive product details and customer reviews, all from the comfort of home; yet, no amount of detailed, online information can outweigh the instant gratification and hands-on understanding of a product that is provided by physical stores. However, making purchasing decisions in physical stores can be challenging due to a large number of similar alternatives and limited accessibility of the relevant product information (e.g., features, ratings, and reviews). In this work, we present ARShopping: a web-based prototype to visually communicate detailed product information from an online setting on portable smart devices (e.g., phones, tablets, glasses), within the physical space at the point of purchase. This prototype uses augmented reality (AR) to identify products and display detailed information to help consumers make purchasing decisions that fulfill their needs while decreasing the decision-making time. In particular, we use a data fusion algorithm to improve the precision of the product detection; we then integrate AR visualizations into the scene to facilitate comparisons across multiple products and features. We designed our prototype based on interviews with 14 participants to better understand the utility and ease of use of the prototype.
arXiv (Cornell University), Sep 30, 2022
Temporal networks model a variety of important phenomena involving timed interactions between ent... more Temporal networks model a variety of important phenomena involving timed interactions between entities. Existing methods for machine learning on temporal networks generally exhibit at least one of two limitations. First, time is assumed to be discretized, so if the time data is continuous, the user must determine the discretization and discard precise time information. Second, edge representations can only be calculated indirectly from the nodes, which may be suboptimal for tasks like edge classification. We present a simple method that avoids both shortcomings: construct the line graph of the network, which includes a node for each interaction, and weigh the edges of this graph based on the difference in time between interactions. From this derived graph, edge representations for the original network can be computed with efficient classical methods. The simplicity of this approach facilitates explicit theoretical analysis: we can constructively show the effectiveness of our method's representations for a natural synthetic model of temporal networks. Empirical results on real-world networks demonstrate our method's efficacy and efficiency on both edge classification and temporal link prediction.
Most real-world datasets contain missing values yet most exploratory data analysis (EDA) systems ... more Most real-world datasets contain missing values yet most exploratory data analysis (EDA) systems only support visualising data points with complete cases. This omission may potentially lead the user to biased analyses and insights. Imputation techniques can help estimate the value of a missing data point, but introduces additional uncertainty. In this work, we investigate the effects of visualising imputed values in charts using different ways of representing data imputations and imputation uncertainty—no imputation, mean, 95% confidence intervals, probability density plots, gradient intervals, and hypothetical outcome plots. We focus on scatterplots, which is a commonly used chart type, and conduct a crowdsourced study with 202 participants. We measure users’ bias and precision in performing two tasks—estimating average and detecting trend—and their self-reported confidence in performing these tasks. Our results suggest that, when estimating averages, uncertainty representations ma...
Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems
Detecting structures and components in business emails is vital for editor software to convert th... more Detecting structures and components in business emails is vital for editor software to convert third-party emails so that designers can edit them without needing to know how HTML works. In a production environment, the challenge is to make the model easy to be understood and maintained by different stakeholders. We propose detecting email components with a collection of constraints written in Answer Set Programming (ASP). Hard constraints can detect well-defined components like email layouts, and soft constraints can incorporate ML to detect custom components like buttons and titles in emails. Using constraints, developers can apply their domain knowledge to the model and express them in a concrete, extensible, and deterministic form. We demonstrate the effectiveness with a prototype and evaluations from real datasets.
arXiv (Cornell University), Jul 15, 2022
Online shopping gives customers boundless options to choose from, backed by extensive product det... more Online shopping gives customers boundless options to choose from, backed by extensive product details and customer reviews, all from the comfort of home; yet, no amount of detailed, online information can outweigh the instant gratification and hands-on understanding of a product that is provided by physical stores. However, making purchasing decisions in physical stores can be challenging due to a large number of similar alternatives and limited accessibility of the relevant product information (e.g., features, ratings, and reviews). In this work, we present ARShopping: a web-based prototype to visually communicate detailed product information from an online setting on portable smart devices (e.g., phones, tablets, glasses), within the physical space at the point of purchase. This prototype uses augmented reality (AR) to identify products and display detailed information to help consumers make purchasing decisions that fulfill their needs while decreasing the decision-making time. In particular, we use a data fusion algorithm to improve the precision of the product detection; we then integrate AR visualizations into the scene to facilitate comparisons across multiple products and features. We designed our prototype based on interviews with 14 participants to better understand the utility and ease of use of the prototype.
Proceedings of the ACM Web Conference 2022
In this work, we develop a Graph Neural Network (GNN) framework for the problem of personalized v... more In this work, we develop a Graph Neural Network (GNN) framework for the problem of personalized visualization recommendation. The GNN-based framework first represents the large corpus of datasets and visualizations from users as a large heterogeneous graph. Then, it decomposes a visualization into its data and visual components, and then jointly models each of them as a large graph to obtain embeddings of the users, attributes (across all datasets in the corpus), and visual-configurations. From these user-specific embeddings of the attributes and visual-configurations, we can predict the probability of any visualization arising from a specific user. Finally, the experiments demonstrated the effectiveness of using graph neural networks for automatic and personalized recommendation of visualizations to specific users based on their data and visual (design choice) preferences. To the best of our knowledge, this is the first such work to develop and leverage GNNs for this problem.
CHI Conference on Human Factors in Computing Systems
Figure desktop and mobile versions, respectively. The mobile versions of the Oil Spills case are ... more Figure desktop and mobile versions, respectively. The mobile versions of the Oil Spills case are from (1) the original article and (2) the version suggested by Hofswell et al. [13]. Full size images are included in the Supplemental Material (https://osf.io/eg4xq). 1: Thirteen responsive visualization use cases reproduced using Cicero. The blue-and gray-bordered views are the
2019 IEEE International Conference on Big Data (Big Data)
Anomaly detection is a common analytical task that aims to identify rare cases that differ from t... more Anomaly detection is a common analytical task that aims to identify rare cases that differ from the typical cases that make up the majority of a dataset. When applied to the analysis of event sequence data, the task of anomaly detection can be complex because the sequential and temporal nature of such data results in diverse definitions and flexible forms of anomalies. This, in turn, increases the difficulty in interpreting detected anomalies. In this paper, we propose an unsupervised anomaly detection algorithm based on Variational AutoEncoders (VAE) to estimate underlying normal progressions for each given sequence represented as occurrence probabilities of events along the sequence progression. Events in violation of their occurrence probability are identified as abnormal. We also introduce a visualization system, EventThread3 (ET 3), to support interactive exploration and interpretations of anomalies within the context of normal sequence progressions in the dataset through comprehensive one-to-many sequence comparison. Finally, we quantitatively evaluate the performance of our anomaly detection algorithm and demonstrate the effectiveness of our system through a case study.
IEEE transactions on visualization and computer graphics, 2021
Event sequence data record series of discrete events in the time order of occurrence. They are co... more Event sequence data record series of discrete events in the time order of occurrence. They are commonly observed in a variety of applications ranging from electronic health records to network logs, with the characteristics of large-scale, high-dimensional and heterogeneous. This high complexity of event sequence data makes it difficult for analysts to manually explore and find patterns, resulting in ever-increasing needs for computational and perceptual aids from visual analytics techniques to extract and communicate insights from event sequence datasets. In this paper, we review the state-of-the-art visual analytics approaches, characterize them with our proposed design space, and categorize them based on analytical tasks and applications.
Copyright held by the owner/author(s). CHI’20 Extended Abstracts, April 25–30, 2020, Honolulu, HI... more Copyright held by the owner/author(s). CHI’20 Extended Abstracts, April 25–30, 2020, Honolulu, HI, USA ACM 978-1-4503-6819-3/20/04. https://doi.org/10.1145/3334480.3382971 Abstract Timestamped event sequences are analyzed to tackle varied problems but have unique challenges in interpretation and analysis. Especially in event sequence prediction, it is difficult to convey the results due to the added uncertainty and complexity introduced by predictive models. In this work, we design and develop ProFlow, a visual analytics system for supporting analysts’ workflow of exploring and predicting event sequences. Through an evaluation conducted with four data analysts in a real-world marketing scenario, we discuss the applicability and usefulness of ProFlow as well as its limitations and future directions.
IEEE transactions on visualization and computer graphics, 2021
Anomaly detection is a common analytical task that aims to identify rare cases that differ from t... more Anomaly detection is a common analytical task that aims to identify rare cases that differ from the typical cases that make up the majority of a dataset. When analyzing event sequence data, the task of anomaly detection can be complex because the sequential and temporal nature of such data results in diverse definitions and flexible forms of anomalies. This, in turn, increases the difficulty in interpreting detected anomalies. In this paper, we propose a visual analytic approach for detecting anomalous sequences in an event sequence dataset via an unsupervised anomaly detection algorithm based on Variational AutoEncoders. We further compare the anomalous sequences with their reconstructions and with the normal sequences through a sequence matching algorithm to identify event anomalies. A visual analytics system is developed to support interactive exploration and interpretations of anomalies through novel visualization designs that facilitate the comparison between anomalous sequence...