Vicinity Exploration: Enabling User-Driven Visual Search of Multiple Machine Learning Models for Precision Medicine (original) (raw)

ML-MEDIC: A Preliminary Study of an Interactive Visual Analysis Tool Facilitating Clinical Applications of Machine Learning for Precision Medicine

Applied Sciences, 2020

Accessible interactive tools that integrate machine learning methods with clinical research and reduce the programming experience required are needed to move science forward. Here, we present Machine Learning for Medical Exploration and Data-Inspired Care (ML-MEDIC), a point-and-click, interactive tool with a visual interface for facilitating machine learning and statistical analyses in clinical research. We deployed ML-MEDIC in the American Heart Association (AHA) Precision Medicine Platform to provide secure internet access and facilitate collaboration. ML-MEDIC’s efficacy for facilitating the adoption of machine learning was evaluated through two case studies in collaboration with clinical domain experts. A domain expert review was also conducted to obtain an impression of the usability and potential limitations.

Interactive Exploration of Medical Data Sets

2008 Fifth International Conference BioMedical Visualization: Information Visualization in Medical and Biomedical Informatics, 2008

This paper describes an interactive data exploration system for molecular and clinical data in the field of personalized medicine. It addresses the essential but to date unsolved problem of how to identify connections between genetic variants and their corresponding diseases or the response to certain drugs and treatments, respectively. It is therefore necessary to connect genetic with clinical data in order to categorize specific subgroups of patients with certain disease features. The huge amount of data provided by molecular analytical methods (e.g. data on genetic alterations, proteomic or metabolomic data) can only be analyzed by applying statistical methods and bioinformatics. However, even standard methods of statistics and bioinformatics fail when the data is inhomogeneous -as is the case with clinical dataand when data structures are obscured by noise and dominant patterns. The structure of large medical data sets is made visible by using so called object-and attribute-glyphs, which can be arranged in a two dimensional space and synchronized with a set of visualization views.

Clustering approaches for visual knowledge exploration in molecular interaction networks

BMC Bioinformatics, 2018

Background: Biomedical knowledge grows in complexity, and becomes encoded in network-based repositories, which include focused, expert-drawn diagrams, networks of evidence-based associations and established ontologies. Combining these structured information sources is an important computational challenge, as large graphs are difficult to analyze visually. Results: We investigate knowledge discovery in manually curated and annotated molecular interaction diagrams. To evaluate similarity of content we use: i) Euclidean distance in expert-drawn diagrams, ii) shortest path distance using the underlying network and iii) ontology-based distance. We employ clustering with these metrics used separately and in pairwise combinations. We propose a novel bi-level optimization approach together with an evolutionary algorithm for informative combination of distance metrics. We compare the enrichment of the obtained clusters between the solutions and with expert knowledge. We calculate the number of Gene and Disease Ontology terms discovered by different solutions as a measure of cluster quality. Our results show that combining distance metrics can improve clustering accuracy, based on the comparison with expert-provided clusters. Also, the performance of specific combinations of distance functions depends on the clustering depth (number of clusters). By employing bi-level optimization approach we evaluated relative importance of distance functions and we found that indeed the order by which they are combined affects clustering performance. Next, with the enrichment analysis of clustering results we found that both hierarchical and bi-level clustering schemes discovered more Gene and Disease Ontology terms than expert-provided clusters for the same knowledge repository. Moreover, bi-level clustering found more enriched terms than the best hierarchical clustering solution for three distinct distance metric combinations in three different instances of disease maps. Conclusions: In this work we examined the impact of different distance functions on clustering of a visual biomedical knowledge repository. We found that combining distance functions may be beneficial for clustering, and improve exploration of such repositories. We proposed bi-level optimization to evaluate the importance of order by which the distance functions are combined. Both combination and order of these functions affected clustering quality and knowledge recognition in the considered benchmarks. We propose that multiple dimensions can be utilized simultaneously for visual knowledge exploration.

Visual analysis of biological data-knowledge networks

BMC Bioinformatics, 2015

Background: The interpretation of the results from genome-scale experiments is a challenging and important problem in contemporary biomedical research. Biological networks that integrate experimental results with existing knowledge from biomedical databases and published literature can provide a rich resource and powerful basis for hypothesizing about mechanistic explanations for observed gene-phenotype relationships. However, the size and density of such networks often impede their efficient exploration and understanding. Results: We introduce a visual analytics approach that integrates interactive filtering of dense networks based on degree-of-interest functions with attribute-based layouts of the resulting subnetworks. The comparison of multiple subnetworks representing different analysis facets is facilitated through an interactive super-network that integrates brushing-and-linking techniques for highlighting components across networks. An implementation is freely available as a Cytoscape app. Conclusions: We demonstrate the utility of our approach through two case studies using a dataset that combines clinical data with high-throughput data for studying the effect of β-blocker treatment on heart failure patients. Furthermore, we discuss our team-based iterative design and development process as well as the limitations and generalizability of our approach.

A User-Centered Visual Approach to Data Mining

Intelligent Information Processing, 2002

We present a human-centered approach to model selection in machine leaming and data mining that emphasizes and facilitates the active participation of the user in the knowledge discovery process with quantitative and qualitative evaluation of patterns/models. The key idea of such a model selection is it would result from a combination of a quantitative evaluation of model characteristics and performance metrics with a qualitative evaluation of patterns/model by the user. We develop data mining methods integrated with visualization tools in the user-centered visual system D2MS (Data Mining with Model Selection). We finally present a case-study of D2MS in mining stomach cancer data.

Morphing projections: a new visual technique for fast and interactive large-scale analysis of biomedical datasets

Bioinformatics, 2020

Motivation Biomedical research entails analyzing high dimensional records of biomedical features with hundreds or thousands of samples each. This often involves using also complementary clinical metadata, as well as a broad user domain knowledge. Common data analytics software makes use of machine learning algorithms or data visualization tools. However, they are frequently one-way analyses, providing little room for the user to reconfigure the steps in light of the observed results. In other cases, reconfigurations involve large latencies, requiring a retraining of algorithms or a large pipeline of actions. The complex and multiway nature of the problem, nonetheless, suggests that user interaction feedback is a key element to boost the cognitive process of analysis, and must be both broad and fluid. Results In this article, we present a technique for biomedical data analytics, based on blending meaningful views in an efficient manner, allowing to provide a natural smooth way to tra...

Coordinating computational and visual approaches for interactive feature selection and multivariate clustering

Information Visualization, 2003

Unknown (and unexpected) multivariate patterns lurking in high-dimensional datasets are often very hard to find. This paper describes a human-centered exploration environment, which incorporates a coordinated suite of computational and visualization methods to explore high-dimensional data for uncovering patterns in multivariate spaces. Specifically, it includes: (1) an interactive feature selection method for identifying potentially interesting, multidimensional subspaces from a high-dimensional data space, (2) an interactive, hierarchical clustering method for searching multivariate clusters of arbitrary shape, and (3) a suite of coordinated visualization and computational components centered around the above two methods to facilitate a human-led exploration. The implemented system is used to analyze a cancer dataset and shows that it is efficient and effective for discovering unknown and unexpected multivariate patterns from high-dimensional data.

Interactive exploration of a global clinical network from a large breast cancer cohort

npj Digital Medicine

Despite unprecedented amount of information now available in medical records, health data remain underexploited due to their heterogeneity and complexity. Simple charts and hypothesis-driven statistics can no longer apprehend the content of information-rich clinical data. There is, therefore, a clear need for powerful interactive visualization tools enabling medical practitioners to perceive the patterns and insights gained by state-of-the-art machine learning algorithms. Here, we report an interactive graphical interface for use as the front end of a machine learning causal inference server (MIIC), to facilitate the visualization and comprehension by clinicians of relationships between clinically relevant variables. The widespread use of such tools, facilitating the interactive exploration of datasets, is crucial both for data visualization and for the generation of research hypotheses. We demonstrate the utility of the MIIC interactive interface, by exploring the clinical network ...

The Challenges of Data Visualization for Precision Medicine

Proceedings of the International Symposium of Human Factors and Ergonomics in Healthcare, 2019

Precision medicine is driving medicine towards a new era where technology and large amounts of data come together to play an essential role in treatment. Data needed to empower and inform decision-makers can be overwhelming to interpret and poses unique challenges related to the visualization of data generated by machine learning and deep learning algorithms. Therefore, the present study aims to provide an in-depth understanding of the challenges, current trends, and opportunities concerning data visualization for precision medicine.

VICTOR: A visual analytics web application for comparing cluster sets

Computers in Biology and Medicine

Clustering is the process of grouping together different data objects based on similar properties. Clustering has applications in various case studies from several fields such as graph theory, image analysis, pattern recognition, statistics and others. Nowadays, there are numerous algorithms and tools able to generate clustering results. However, different algorithms or parameterization may result in very different clusters. This way, the user is often forced to manually filter and compare these results in order to decide which of them produce the ideal clusters. To automate this process, in this study, we present VICTOR, the first fully interactive and dependency-free visual analytics web application which allows the comparison and visualization of various clustering algorithms. VICTOR can handle multiple clustering results simultaneously and compare them using ten different metrics. Clustering results can be filtered and compared to each other with the use of interactive heatmaps, bar plots, correlation networks, sankey and circos plots. We demonstrate VICTOR's functionality using three examples. In the first case, we compare five different algorithms on a protein-protein interaction dataset whereas in the second example, we test four different parameters of the same clustering algorithm applied on the same dataset. Finally, as a third example, we compare four different meta-analyses with hierarchically clustered differentially expressed genes found to be involved in myocardial infarction.