Kristin Bennett - Academia.edu (original) (raw)

Papers by Kristin Bennett

Research paper thumbnail of Making Study Populations Visible Through Knowledge Graphs

Lecture Notes in Computer Science, 2019

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Ontology-enabled Analysis of Study Populations

We address the problem of modeling study populations in research studies in a declarative manner.... more We address the problem of modeling study populations in research studies in a declarative manner. Research studies often have a great degree of variability in the reporting of population descriptions. To make study populations easily accessible for decision making related to study applicability, we will show the usage of our ontology-enabled prototype system in different applications. Our system leverages our Study Cohort Ontology and the related cohort Knowledge Graph (as described in our accepted resource track paper). We aim to address three retrospective population analysis scenarios, designed to specifically determine the study match, study limitations, and evaluate the study quality. We also provide visualizations of a patient (or patient population) to a treatment arm. In addition, for each guideline recommendation that depends upon a study, we provide a summary of the relevant study’s cohort description. We describe some of our applications and their potential impacts. Resou...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Developing Scientific Knowledge Graphs Using Whyis

We present Whyis, the first framework for creating custom provenance-driven knowledge graphs. Why... more We present Whyis, the first framework for creating custom provenance-driven knowledge graphs. Whyis knowledge graphs are based on nanopublications, which simplifies and standardizes the production of structured, provenance-supported knowledge in knowledge graphs. To demonstrate Whyis, we created BioKG, a probabilistic biology knowledge graph, and populated it with well-used drug and protein content from DrugBank, Uniprot, and OBO Foundry ontologies. As shown with BioKG, knowledge graph developers can use Whyis to configure custom knowledge curation pipelines using data importers and semantic extract, transform, and load scripts. Whyis also contains a knowledge metaanalysis capability for use in customizable graph exploration. The flexible, nanopublication-based architecture of Whyis lets knowledge graph developers integrate, extend, and publish knowledge from heterogeneous sources on the web.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of The Whyis Knowledge Graph Framework in Action

We will demonstrate a reusable framework for developing knowledge graphs that supports general, o... more We will demonstrate a reusable framework for developing knowledge graphs that supports general, open-ended development of knowledge curation, interaction, and inference. Knowledge graphs need to be easily maintainable and usable in sometimes complex application settings. Often, scaling knowledge graph updates can require developing a knowledge curation pipeline that either replaces the graph wholesale whenever updates are made, or requires detailed tracking of knowledge provenance across multiple data sources. Fig. 1 shows how Whyis provides a semantic analysis ecosystem: an environment that supports research and development of semantic analytics for which we previously had to build custom applications [3,4]. Users interact through a suite of knowledge graph views driven by the node type and view requested in the URL. Knowledge curation methods include Semantic ETL, external linked data mapping,and Natural Language Processing (NLP). Autonomous inference agents expand the available k...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Ontology-enabled Breast Cancer Characterization

We address the problem of characterizing breast cancer, which today is done using staging guideli... more We address the problem of characterizing breast cancer, which today is done using staging guidelines. Our demo will show different breast cancer staging results that leverage the Whyis semantic nanopublication knowledge graph framework [8]. The system we developed is able to ingest breast cancer characterization guidelines in a semi-automated manner and then use our deductive inferencer to generate new information based on those guidelines as described in our ISWC resource track paper ‘Knowledge Integration for Disease Characterization: A Breast Cancer Example’ [11]. In this paper we demonstrate the versatility of our framework using a synthetic patient profile.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Knowledge Integration for Disease Characterization: A Breast Cancer Example

Lecture Notes in Computer Science, 2018

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Semantically-aware population health risk analyses

arXiv (Cornell University), Nov 27, 2018

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Cadre Modeling: Simultaneously Discovering Subpopulations and Predictive Models

2018 International Joint Conference on Neural Networks (IJCNN), 2018

We consider the problem in regression analysis of identifying subpopulations that exhibit differe... more We consider the problem in regression analysis of identifying subpopulations that exhibit different patterns of response, where each subpopulation requires a different underlying model. Unlike statistical cohorts, these subpopulations are not known a priori; thus, we refer to them as cadres. When the cadres and their associated models are interpretable, modeling leads to insights about the subpopulations and their associations with the regression target. We introduce a discriminative model that simultaneously learns cadre assignment and target-prediction rules. Sparsity-inducing priors are placed on the model parameters, under which independent feature selection is performed for both the cadre assignment and targetprediction processes. We learn models using adaptive step size stochastic gradient descent, and we assess cadre quality with bootstrapped sample analysis. We present simulated results showing that, when the true clustering rule does not depend on the entire set of features...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Identifying Windows of Susceptibility by Temporal Gene Analysis

Scientific Reports, 2019

Increased understanding of developmental disorders of the brain has shown that genetic mutations,... more Increased understanding of developmental disorders of the brain has shown that genetic mutations, environmental toxins and biological insults typically act during developmental windows of susceptibility. Identifying these vulnerable periods is a necessary and vital step for safeguarding women and their fetuses against disease causing agents during pregnancy and for developing timely interventions and treatments for neurodevelopmental disorders. We analyzed developmental time-course gene expression data derived from human pluripotent stem cells, with disease association, pathway, and protein interaction databases to identify windows of disease susceptibility during development and the time periods for productive interventions. The results are displayed as interactive Susceptibility Windows Ontological Transcriptome (SWOT) Clocks illustrating disease susceptibility over developmental time. Using this method, we determine the likely windows of susceptibility for multiple neurological d...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of SemNExT: A Framework for Semantically Integrating and Exploring Numeric Analyses

Combining statistical techniques with semantic data representations holds the potential to enhanc... more Combining statistical techniques with semantic data representations holds the potential to enhance understandability of scientific results. It can augment scientific findings with existing data sources in a reproducible manner through provenance capture, as well as enable further analysis and deduction through computer and human understandable definitions of terms. We present a framework for semantically integrating and exploring numerical analyses. We call our work SemNExT for Semantic Numeric Exploration Technology. We apply our approach to data analysis aimed at improving understanding of human brain development that leverages the Cortecon RNA-Seq data repository. Our approach supports enrichment of Cortecon data through combinations with structured data sources available via SQL or SPARQL from the web to provide semantically enhanced analyses combined with statistical analyses. Our results are encoded as RDF graphs that may be used as input to reasoners and may drive provenance-...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Special Issue: Hierarchical Optimization

Mathematical Programming

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Should we tweet this? Generative response modeling for predicting reception of public health messaging on Twitter

14th ACM Web Science Conference 2022

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Downstream Fairness Caveats with Synthetic Healthcare Data

Cornell University - arXiv, Mar 8, 2022

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Encore

Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, 2019

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Synthetic Event Time Series Health Data Generation

ArXiv, 2019

Synthetic medical data which preserves privacy while maintaining utility can be used as an altern... more Synthetic medical data which preserves privacy while maintaining utility can be used as an alternative to real medical data, which has privacy costs and resource constraints associated with it. At present, most models focus on generating cross-sectional health data which is not necessarily representative of real data. In reality, medical data is longitudinal in nature, with a single patient having multiple health events, non-uniformly distributed throughout their lifetime. These events are influenced by patient covariates such as comorbidities, age group, gender etc. as well as external temporal effects (e.g. flu season). While there exist seminal methods to model time series data, it becomes increasingly challenging to extend these methods to medical event time series data. Due to the complexity of the real data, in which each patient visit is an event, we transform the data by using summary statistics to characterize the events for a fixed set of time intervals, to facilitate anal...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Assessing privacy and quality of synthetic health data

Proceedings of the Conference on Artificial Intelligence for Data Discovery and Reuse, 2019

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Temporal analysis of social determinants associated with COVID-19 mortality

Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, 2021

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Generation and evaluation of privacy preserving synthetic health data

Neurocomputing, 2020

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Social Determinants Associated with COVID-19 Mortality in the United States

This study examines social determinants associated with disparities in COVID-19 mortality rates i... more This study examines social determinants associated with disparities in COVID-19 mortality rates in the United States. Using county-level data, 42 negative binomial mixed models were used to evaluate the impact of social determinants on COVID-19 outcome. First, to identify proper controls, the effect of 24 high-risk factors on COVID-19 mortality rate was quantified. Then, the high-risk terms found to be significant were controlled for in an association study between 41 social determinants and COVID-19 mortality rates. The results describe that ethnic minorities, immigrants, socioeconomic inequalities, and early exposure to COVID-19 are associated with increased COVID-19 mortality, while the prevalence of asthma, suicide, and excessive drinking is associated with decreased mortality. Overall, we recognize that social inequality places disadvantaged groups at risk, which must be addressed through future policies and programs. Additionally, we reveal possible relationships between lung ...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Unmasking the conversation on masks: Natural language processing for topical sentiment analysis of COVID-19 Twitter discourse

In this exploratory study, we scrutinize a database of over one million tweets collected from Mar... more In this exploratory study, we scrutinize a database of over one million tweets collected from March to July 2020 to illustrate public attitudes towards mask usage during the COVID-19 pandemic. We employ natural language processing, clustering and sentiment analysis techniques to organize tweets relating to mask-wearing into high-level themes, then relay narratives for each theme using automatic text summarization. In recent months, a body of literature has highlighted the robustness of trends in online activity as proxies for the sociological impact of COVID-19. We find that topic clustering based on mask-related Twitter data offers revealing insights into societal perceptions of COVID-19 and techniques for its prevention. We observe that the volume and polarity of mask-related tweets has greatly increased. Importantly, the analysis pipeline presented may be leveraged by the health community for qualitative assessment of public response to health intervention techniques in real time.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Making Study Populations Visible Through Knowledge Graphs

Lecture Notes in Computer Science, 2019

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Ontology-enabled Analysis of Study Populations

We address the problem of modeling study populations in research studies in a declarative manner.... more We address the problem of modeling study populations in research studies in a declarative manner. Research studies often have a great degree of variability in the reporting of population descriptions. To make study populations easily accessible for decision making related to study applicability, we will show the usage of our ontology-enabled prototype system in different applications. Our system leverages our Study Cohort Ontology and the related cohort Knowledge Graph (as described in our accepted resource track paper). We aim to address three retrospective population analysis scenarios, designed to specifically determine the study match, study limitations, and evaluate the study quality. We also provide visualizations of a patient (or patient population) to a treatment arm. In addition, for each guideline recommendation that depends upon a study, we provide a summary of the relevant study’s cohort description. We describe some of our applications and their potential impacts. Resou...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Developing Scientific Knowledge Graphs Using Whyis

We present Whyis, the first framework for creating custom provenance-driven knowledge graphs. Why... more We present Whyis, the first framework for creating custom provenance-driven knowledge graphs. Whyis knowledge graphs are based on nanopublications, which simplifies and standardizes the production of structured, provenance-supported knowledge in knowledge graphs. To demonstrate Whyis, we created BioKG, a probabilistic biology knowledge graph, and populated it with well-used drug and protein content from DrugBank, Uniprot, and OBO Foundry ontologies. As shown with BioKG, knowledge graph developers can use Whyis to configure custom knowledge curation pipelines using data importers and semantic extract, transform, and load scripts. Whyis also contains a knowledge metaanalysis capability for use in customizable graph exploration. The flexible, nanopublication-based architecture of Whyis lets knowledge graph developers integrate, extend, and publish knowledge from heterogeneous sources on the web.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of The Whyis Knowledge Graph Framework in Action

We will demonstrate a reusable framework for developing knowledge graphs that supports general, o... more We will demonstrate a reusable framework for developing knowledge graphs that supports general, open-ended development of knowledge curation, interaction, and inference. Knowledge graphs need to be easily maintainable and usable in sometimes complex application settings. Often, scaling knowledge graph updates can require developing a knowledge curation pipeline that either replaces the graph wholesale whenever updates are made, or requires detailed tracking of knowledge provenance across multiple data sources. Fig. 1 shows how Whyis provides a semantic analysis ecosystem: an environment that supports research and development of semantic analytics for which we previously had to build custom applications [3,4]. Users interact through a suite of knowledge graph views driven by the node type and view requested in the URL. Knowledge curation methods include Semantic ETL, external linked data mapping,and Natural Language Processing (NLP). Autonomous inference agents expand the available k...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Ontology-enabled Breast Cancer Characterization

We address the problem of characterizing breast cancer, which today is done using staging guideli... more We address the problem of characterizing breast cancer, which today is done using staging guidelines. Our demo will show different breast cancer staging results that leverage the Whyis semantic nanopublication knowledge graph framework [8]. The system we developed is able to ingest breast cancer characterization guidelines in a semi-automated manner and then use our deductive inferencer to generate new information based on those guidelines as described in our ISWC resource track paper ‘Knowledge Integration for Disease Characterization: A Breast Cancer Example’ [11]. In this paper we demonstrate the versatility of our framework using a synthetic patient profile.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Knowledge Integration for Disease Characterization: A Breast Cancer Example

Lecture Notes in Computer Science, 2018

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Semantically-aware population health risk analyses

arXiv (Cornell University), Nov 27, 2018

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Cadre Modeling: Simultaneously Discovering Subpopulations and Predictive Models

2018 International Joint Conference on Neural Networks (IJCNN), 2018

We consider the problem in regression analysis of identifying subpopulations that exhibit differe... more We consider the problem in regression analysis of identifying subpopulations that exhibit different patterns of response, where each subpopulation requires a different underlying model. Unlike statistical cohorts, these subpopulations are not known a priori; thus, we refer to them as cadres. When the cadres and their associated models are interpretable, modeling leads to insights about the subpopulations and their associations with the regression target. We introduce a discriminative model that simultaneously learns cadre assignment and target-prediction rules. Sparsity-inducing priors are placed on the model parameters, under which independent feature selection is performed for both the cadre assignment and targetprediction processes. We learn models using adaptive step size stochastic gradient descent, and we assess cadre quality with bootstrapped sample analysis. We present simulated results showing that, when the true clustering rule does not depend on the entire set of features...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Identifying Windows of Susceptibility by Temporal Gene Analysis

Scientific Reports, 2019

Increased understanding of developmental disorders of the brain has shown that genetic mutations,... more Increased understanding of developmental disorders of the brain has shown that genetic mutations, environmental toxins and biological insults typically act during developmental windows of susceptibility. Identifying these vulnerable periods is a necessary and vital step for safeguarding women and their fetuses against disease causing agents during pregnancy and for developing timely interventions and treatments for neurodevelopmental disorders. We analyzed developmental time-course gene expression data derived from human pluripotent stem cells, with disease association, pathway, and protein interaction databases to identify windows of disease susceptibility during development and the time periods for productive interventions. The results are displayed as interactive Susceptibility Windows Ontological Transcriptome (SWOT) Clocks illustrating disease susceptibility over developmental time. Using this method, we determine the likely windows of susceptibility for multiple neurological d...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of SemNExT: A Framework for Semantically Integrating and Exploring Numeric Analyses

Combining statistical techniques with semantic data representations holds the potential to enhanc... more Combining statistical techniques with semantic data representations holds the potential to enhance understandability of scientific results. It can augment scientific findings with existing data sources in a reproducible manner through provenance capture, as well as enable further analysis and deduction through computer and human understandable definitions of terms. We present a framework for semantically integrating and exploring numerical analyses. We call our work SemNExT for Semantic Numeric Exploration Technology. We apply our approach to data analysis aimed at improving understanding of human brain development that leverages the Cortecon RNA-Seq data repository. Our approach supports enrichment of Cortecon data through combinations with structured data sources available via SQL or SPARQL from the web to provide semantically enhanced analyses combined with statistical analyses. Our results are encoded as RDF graphs that may be used as input to reasoners and may drive provenance-...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Special Issue: Hierarchical Optimization

Mathematical Programming

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Should we tweet this? Generative response modeling for predicting reception of public health messaging on Twitter

14th ACM Web Science Conference 2022

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Downstream Fairness Caveats with Synthetic Healthcare Data

Cornell University - arXiv, Mar 8, 2022

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Encore

Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, 2019

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Synthetic Event Time Series Health Data Generation

ArXiv, 2019

Synthetic medical data which preserves privacy while maintaining utility can be used as an altern... more Synthetic medical data which preserves privacy while maintaining utility can be used as an alternative to real medical data, which has privacy costs and resource constraints associated with it. At present, most models focus on generating cross-sectional health data which is not necessarily representative of real data. In reality, medical data is longitudinal in nature, with a single patient having multiple health events, non-uniformly distributed throughout their lifetime. These events are influenced by patient covariates such as comorbidities, age group, gender etc. as well as external temporal effects (e.g. flu season). While there exist seminal methods to model time series data, it becomes increasingly challenging to extend these methods to medical event time series data. Due to the complexity of the real data, in which each patient visit is an event, we transform the data by using summary statistics to characterize the events for a fixed set of time intervals, to facilitate anal...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Assessing privacy and quality of synthetic health data

Proceedings of the Conference on Artificial Intelligence for Data Discovery and Reuse, 2019

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Temporal analysis of social determinants associated with COVID-19 mortality

Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, 2021

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Generation and evaluation of privacy preserving synthetic health data

Neurocomputing, 2020

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Social Determinants Associated with COVID-19 Mortality in the United States

This study examines social determinants associated with disparities in COVID-19 mortality rates i... more This study examines social determinants associated with disparities in COVID-19 mortality rates in the United States. Using county-level data, 42 negative binomial mixed models were used to evaluate the impact of social determinants on COVID-19 outcome. First, to identify proper controls, the effect of 24 high-risk factors on COVID-19 mortality rate was quantified. Then, the high-risk terms found to be significant were controlled for in an association study between 41 social determinants and COVID-19 mortality rates. The results describe that ethnic minorities, immigrants, socioeconomic inequalities, and early exposure to COVID-19 are associated with increased COVID-19 mortality, while the prevalence of asthma, suicide, and excessive drinking is associated with decreased mortality. Overall, we recognize that social inequality places disadvantaged groups at risk, which must be addressed through future policies and programs. Additionally, we reveal possible relationships between lung ...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Unmasking the conversation on masks: Natural language processing for topical sentiment analysis of COVID-19 Twitter discourse

In this exploratory study, we scrutinize a database of over one million tweets collected from Mar... more In this exploratory study, we scrutinize a database of over one million tweets collected from March to July 2020 to illustrate public attitudes towards mask usage during the COVID-19 pandemic. We employ natural language processing, clustering and sentiment analysis techniques to organize tweets relating to mask-wearing into high-level themes, then relay narratives for each theme using automatic text summarization. In recent months, a body of literature has highlighted the robustness of trends in online activity as proxies for the sociological impact of COVID-19. We find that topic clustering based on mask-related Twitter data offers revealing insights into societal perceptions of COVID-19 and techniques for its prevention. We observe that the volume and polarity of mask-related tweets has greatly increased. Importantly, the analysis pipeline presented may be leveraged by the health community for qualitative assessment of public response to health intervention techniques in real time.

Bookmarks Related papers MentionsView impact