Albert M Lai - Academia.edu (original) (raw)
Papers by Albert M Lai
JAMIA Open
Background Synthetic data may provide a solution to researchers who wish to generate and share da... more Background Synthetic data may provide a solution to researchers who wish to generate and share data in support of precision healthcare. Recent advances in data synthesis enable the creation and analysis of synthetic derivatives as if they were the original data; this process has significant advantages over data deidentification. Objectives To assess a big-data platform with data-synthesizing capabilities (MDClone Ltd., Beer Sheva, Israel) for its ability to produce data that can be used for research purposes while obviating privacy and confidentiality concerns. Methods We explored three use cases and tested the robustness of synthetic data by comparing the results of analyses using synthetic derivatives to analyses using the original data using traditional statistics, machine learning approaches, and spatial representations of the data. We designed these use cases with the purpose of conducting analyses at the observation level (Use Case 1), patient cohorts (Use Case 2), and populat...
Contemporary Clinical Trials Communications
Proceedings of the 15th Workshop on Biomedical Natural Language Processing, 2016
Gradable adjectives are inherently vague and are used by clinicians to document medical interpret... more Gradable adjectives are inherently vague and are used by clinicians to document medical interpretations (e.g., severe reaction, mild symptoms). We present a comprehensive study of gradable adjectives used in the clinical domain. We automatically identify gradable adjectives and demonstrate that they have a substantial presence in clinical text. Further, we show that there is a specific pattern associated with their usage, where certain medical concepts are more likely to be described using these adjectives than others. Interpretation of statements using such adjectives is a barrier in medical decision making. Therefore, we use a simple probabilistic model to ground their meaning based on their usage in context.
Learning representations for knowledge base entities and concepts is becoming increasingly import... more Learning representations for knowledge base entities and concepts is becoming increasingly important for NLP applications. However, recent entity embedding methods have relied on structured resources that are expensive to create for new domains and corpora. We present a distantly-supervised method for jointly learning embeddings of entities and text from an unnanotated corpus, using only a list of mappings between entities and surface forms. We learn embeddings from open-domain and biomedical corpora, and compare against prior methods that rely on human-annotated text or large knowledge graph structure. Our embeddings capture entity similarity and relatedness better than prior work, both in existing biomed-ical datasets and a new Wikipedia-based dataset that we release to the community. Results on analogy completion and entity sense disambiguation indicate that entities and words capture complementary information that can be effectively combined for downstream use.
EGEMS (Washington, DC), 2017
With the growing use of electronic medical records, electronic health records (EHRs), and persona... more With the growing use of electronic medical records, electronic health records (EHRs), and personal health records (PHRs) for health care delivery, new opportunities have arisen for population health researchers. Our objective was to characterize PHR users and examine sample representativeness and nonresponse bias in a study of pregnant women recruited via the PHR. Demographic characteristics were examined for PHR users and nonusers. Enrolled study participants (responders, n=187) were then compared with nonresponders and a representative sample of the target population. PHR patient portal users (34 percent of eligible persons) were older and more likely to be White, have private health insurance, and develop gestational diabetes than nonusers. Of eligible persons (all PHR users), 11 percent (187/1,713) completed a self-administered PHR based questionnaire. Participants in the research study were more likely to be non-Hispanic White (90 percent versus 79 percent) and married (85 perc...
AMIA ... Annual Symposium proceedings. AMIA Symposium, 2016
Clinical trial coordinators refer to both structured and unstructured sources of data when evalua... more Clinical trial coordinators refer to both structured and unstructured sources of data when evaluating a subject for eligibility. While some eligibility criteria can be resolved using structured data, some require manual review of clinical notes. An important step in automating the trial screening process is to be able to identify the right data source for resolving each criterion. In this work, we discuss the creation of an eligibility criteria dataset for clinical trials for patients with two disparate diseases, annotated with the preferred data source for each criterion (i.e., structured or unstructured) by annotators with medical training. The dataset includes 50 heart-failure trials with a total of 766 eligibility criteria and 50 trials for chronic lymphocytic leukemia (CLL) with 677 criteria. Further, we developed machine learning models to predict the preferred data source: kernel methods outperform simpler learning models when used with a combination of lexical, syntactic, se...
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, 2012
The manual annotation of clinical narratives is an important step for training and validating the... more The manual annotation of clinical narratives is an important step for training and validating the performance of automated systems that utilize these clinical narratives. We build an annotation specification to capture medical events, and coreferences and temporal relations between medical events in clinical text. Unfortunately, the process of clinical data annotation is both time consuming and costly. Many annotation efforts have used physicians to annotate the data. We investigate using annotators that are current students or graduates from diverse clinical backgrounds with varying levels of clinical experience. In spite of this diversity, the annotation agreement across our team of annotators is high; the average inter-annotator kappa statistic for medical events, coreferences, temporal relations, and medical event concept unique identifiers was 0.843, 0.859, 0.833, and 0.806, respectively. We describe methods towards leveraging the annotations to support temporal reasoning with medical events.
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, 2013
Understanding clinical workflow is critical for researchers and healthcare decision makers. Curre... more Understanding clinical workflow is critical for researchers and healthcare decision makers. Current workflow studies tend to oversimplify and underrepresent the complexity of clinical workflow. Continuous observation time motion studies (TMS) could enhance clinical workflow studies by providing rich quantitative data required for in-depth workflow analyses. However, methodological inconsistencies have been reported in continuous observation TMS, potentially reducing the validity of TMS' data and limiting their contribution to the general state of knowledge. We believe that a cornerstone in standardizing TMS is to ensure the reliability of the human observers. In this manuscript we review the approaches for inter-observer reliability assessment (IORA) in a representative sample of TMS focusing on clinical workflow. We found that IORA is an uncommon practice, inconsistently reported, and often uses methods that provide partial and overestimated measures of agreement. Since a comprehensive approach to IORA is yet to be proposed and validated, we provide initial recommendations for IORA reporting in continuous observation TMS.
Proceedings of the 2012 Conference of the North American Chapter of the Association For Computational Linguistics Human Language Technologies, Jun 3, 2012
ABSTRACT We investigate the task of medical concept coreference resolution in clinical text using... more ABSTRACT We investigate the task of medical concept coreference resolution in clinical text using two semi-supervised methods, co-training and multi-view learning with posterior regularization. By extracting semantic and temporal features of medical concepts found in clinical text, we create conditionally independent data views; co-training MaxEnt classifiers on this data works almost as well as supervised learning for the task of pairwise coreference resolution of medical concepts. We also train MaxEnt models with expectation constraints, using posterior regularization, and find that posterior regularization performs comparably to or slightly better than co-training. We describe the process of semantic and temporal feature extraction and demonstrate our methods on a corpus of case reports from the New England Journal of Medicine and a corpus of patient narratives obtained from The Ohio State University Wexner Medical Center.
Studies in Health Technology and Informatics, 2013
Electronic health records capture patient information using structured controlled vocabularies an... more Electronic health records capture patient information using structured controlled vocabularies and unstructured narrative text. While structured data typically encodes lab values, encounters and medication lists, unstructured data captures the physician's interpretation of the patient's condition, prognosis, and response to therapeutic intervention. In this paper, we demonstrate that information extraction from unstructured clinical narratives is essential to most clinical applications. We perform an empirical study to validate the argument and show that structured data alone is insufficient in resolving eligibility criteria for recruiting patients onto clinical trials for chronic lymphocytic leukemia (CLL) and prostate cancer. Unstructured data is essential to solving 59% of the CLL trial criteria and 77% of the prostate cancer trial criteria. More specifically, for resolving eligibility criteria with temporal constraints, we show the need for temporal reasoning and information integration with medical events within and across unstructured clinical narratives and structured data.
Telemedicine Journal and E Health the Official Journal of the American Telemedicine Association, May 20, 2008
The objective of the study was to develop and implement an architecture for remote training that ... more The objective of the study was to develop and implement an architecture for remote training that can be used in the narrowband home telemedicine environment. A remote training architecture, the REmote Patient Education in a Telemedicine Environment (REPETE) architecture, using a remote control protocol (RCP) was developed. A set of design criteria was specified. The developed architecture was integrated into the IDEATel home telemedicine unit (HTU) and evaluated against these design criteria using a combination of technical and expert evaluations. Technical evaluation of the architecture demonstrated that remote cursor movements and positioning displayed on the HTU were smooth and effectively real-time. The trainers were able to observe within approximately 2 seconds lag what the patient sees on their HTU screen. Evaluation of the architecture by experts was favorable. Responses to a Likert scale questionnaire regarding audio quality and remote control performance indicated that the expert evaluators thought that the audio quality and remote control performance were adequate for remote training. All evaluators strongly agreed that the system would be useful for training patients. The REPETE architecture supports basic training needs over a narrowband dial-up connection. We were able to maintain an audio chat simultaneously with performing a remote training session, while maintaining both acceptable audio quality and remote control performance. The RCP provides a mechanism to provide training without requiring a trainer to go to the patient's home and effectively supports deictic referencing to on screen objects.
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2005
In spite of efforts to develop easy-to-use devices, patients may require multiple training sessio... more In spite of efforts to develop easy-to-use devices, patients may require multiple training sessions to achieve mastery of advanced telehealth devices, especially those incorporating web-access. In geographically-distributed projects, such repeat training can be costly. A software architecture for simultaneous voice conferencing and remote device control over a single telephone line is presented. Evaluation of the pilot implementation is favorable.
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2008
Modeling the temporal information in the medical record is an important area of research. This pa... more Modeling the temporal information in the medical record is an important area of research. This paper describes an extension of TimeText, a temporal reasoning system designed to represent, extract, and reason about temporal information in clinical text, to include the use of fuzzy temporal constraints. The addition of fuzzy temporal constraints increases TimeText’s ability to handle uncertainty in temporal relations. We use a three-state, staircase possibility distribution function in conjunction with earlier methods of finding solutions to fuzzy temporal constraint networks. We perform analysis to determine the complexity of using this staircase in conjunction with finding solutions to fuzzy temporal constraint satisfaction problems and show that these solutions can be efficiently computed in O(n3).
Circulation, Mar 25, 2014
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, 2011
The increasing availability of personal genome data has led to escalating needs by consumers to u... more The increasing availability of personal genome data has led to escalating needs by consumers to understand the implications of their gene sequences. At present, poorly integrated genetic knowledge has not met these needs. This proof-of-concept study proposes a similarity-based approach to assess the disease risk predisposition for personal genomes. We hypothesize that the semantic similarity between a personal genome and a disease can indicate the disease risks in the person. We developed a knowledge network that integrates existing knowledge of genes, diseases, and symptoms from six sources using the Semantic Web standard, Resource Description Framework (RDF). We then used latent relationships between genes and diseases derived from our knowledge network to measure the semantic similarity between a personal genome and a genetic disease. For demonstration, we showed the feasibility of assessing the disease risks in one personal genome and discussed related methodology issues.
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, 2011
Temporal constraints are present in 38% of clinical research eligibility criteria and are crucial... more Temporal constraints are present in 38% of clinical research eligibility criteria and are crucial for screening patients. However, eligibility criteria are often written as free text, which is not amenable for computer processing. In this paper, we present an ontology-based approach to extracting temporal information from clinical research eligibility criteria. We generated temporal labels using a frame-based temporal ontology. We manually annotated 150 free-text eligibility criteria using the temporal labels and trained a parser using Conditional Random Fields (CRFs) to automatically extract temporal expressions from eligibility criteria. An evaluation of an additional 60 randomly selected eligibility criteria using manual review achieved an overall precision of 83%, a recall of 79%, and an F-score of 80%. We illustrate the application of temporal extraction with the use cases of question answering and free-text criteria querying.
Journal of Biomedical Informatics, 2016
Community-level factors have been clearly linked to health outcomes, but are challenging to incor... more Community-level factors have been clearly linked to health outcomes, but are challenging to incorporate into medical practice. Increasing use of electronic health records (EHRs) makes patient-level data available for researchers in a systematic and accessible way, but these data remain siloed from community-level data relevant to health. This study sought to link community and EHR data from an older female patient cohort participating in an ongoing intervention at the Ohio State University Wexner Medical Center to associate community-level data with patient-level cardiovascular health (CVH) as well as to assess the utility of this EHR integration methodology. CVH was characterized among patients using available EHR data collected May through July of 2013. EHR data for 153 patients were linked to United States census-tract level data to explore feasibility and insights gained from combining these disparate data sources. Analyses were conducted in 2014. Using the linked data, weekly per capita expenditure on fruits and vegetables was found to be significantly associated with CVH at the p<0.05 level and three other community-level attributes (median income, average household size, and unemployment rate) were associated with CVH at the p<0.10 level. This work paves the way for future integration of community and EHR-based data into patient care as a novel methodology to gain insight into multi-level factors that affect CVH and other health outcomes. Further, our findings demonstrate the specific architectural and functional challenges associated with integrating decision support technologies and geographic information to support tailored and patient-centered decision making therein.
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, 2012
Time Motion Studies (TMS) have proved to be the gold standard method to measure and quantify clin... more Time Motion Studies (TMS) have proved to be the gold standard method to measure and quantify clinical workflow, and have been widely used to assess the impact of health information systems implementation. Although there are tools available to conduct TMS, they provide different approaches for multitasking, interruptions, inter-observer reliability assessment and task taxonomy, making results across studies not comparable. We postulate that a significant contributing factor towards the standardization and spread of TMS would be the availability and spread of an accessible, scalable and dynamic tool. We present the development of a comprehensive Time Capture Tool (TimeCaT): a web application developed to support data capture for TMS. Ongoing and continuous development of TimeCaT includes the development and validation of a realistic inter-observer reliability scoring algorithm, the creation of an online clinical tasks ontology, and a novel quantitative workflow comparison method.
JAMIA Open
Background Synthetic data may provide a solution to researchers who wish to generate and share da... more Background Synthetic data may provide a solution to researchers who wish to generate and share data in support of precision healthcare. Recent advances in data synthesis enable the creation and analysis of synthetic derivatives as if they were the original data; this process has significant advantages over data deidentification. Objectives To assess a big-data platform with data-synthesizing capabilities (MDClone Ltd., Beer Sheva, Israel) for its ability to produce data that can be used for research purposes while obviating privacy and confidentiality concerns. Methods We explored three use cases and tested the robustness of synthetic data by comparing the results of analyses using synthetic derivatives to analyses using the original data using traditional statistics, machine learning approaches, and spatial representations of the data. We designed these use cases with the purpose of conducting analyses at the observation level (Use Case 1), patient cohorts (Use Case 2), and populat...
Contemporary Clinical Trials Communications
Proceedings of the 15th Workshop on Biomedical Natural Language Processing, 2016
Gradable adjectives are inherently vague and are used by clinicians to document medical interpret... more Gradable adjectives are inherently vague and are used by clinicians to document medical interpretations (e.g., severe reaction, mild symptoms). We present a comprehensive study of gradable adjectives used in the clinical domain. We automatically identify gradable adjectives and demonstrate that they have a substantial presence in clinical text. Further, we show that there is a specific pattern associated with their usage, where certain medical concepts are more likely to be described using these adjectives than others. Interpretation of statements using such adjectives is a barrier in medical decision making. Therefore, we use a simple probabilistic model to ground their meaning based on their usage in context.
Learning representations for knowledge base entities and concepts is becoming increasingly import... more Learning representations for knowledge base entities and concepts is becoming increasingly important for NLP applications. However, recent entity embedding methods have relied on structured resources that are expensive to create for new domains and corpora. We present a distantly-supervised method for jointly learning embeddings of entities and text from an unnanotated corpus, using only a list of mappings between entities and surface forms. We learn embeddings from open-domain and biomedical corpora, and compare against prior methods that rely on human-annotated text or large knowledge graph structure. Our embeddings capture entity similarity and relatedness better than prior work, both in existing biomed-ical datasets and a new Wikipedia-based dataset that we release to the community. Results on analogy completion and entity sense disambiguation indicate that entities and words capture complementary information that can be effectively combined for downstream use.
EGEMS (Washington, DC), 2017
With the growing use of electronic medical records, electronic health records (EHRs), and persona... more With the growing use of electronic medical records, electronic health records (EHRs), and personal health records (PHRs) for health care delivery, new opportunities have arisen for population health researchers. Our objective was to characterize PHR users and examine sample representativeness and nonresponse bias in a study of pregnant women recruited via the PHR. Demographic characteristics were examined for PHR users and nonusers. Enrolled study participants (responders, n=187) were then compared with nonresponders and a representative sample of the target population. PHR patient portal users (34 percent of eligible persons) were older and more likely to be White, have private health insurance, and develop gestational diabetes than nonusers. Of eligible persons (all PHR users), 11 percent (187/1,713) completed a self-administered PHR based questionnaire. Participants in the research study were more likely to be non-Hispanic White (90 percent versus 79 percent) and married (85 perc...
AMIA ... Annual Symposium proceedings. AMIA Symposium, 2016
Clinical trial coordinators refer to both structured and unstructured sources of data when evalua... more Clinical trial coordinators refer to both structured and unstructured sources of data when evaluating a subject for eligibility. While some eligibility criteria can be resolved using structured data, some require manual review of clinical notes. An important step in automating the trial screening process is to be able to identify the right data source for resolving each criterion. In this work, we discuss the creation of an eligibility criteria dataset for clinical trials for patients with two disparate diseases, annotated with the preferred data source for each criterion (i.e., structured or unstructured) by annotators with medical training. The dataset includes 50 heart-failure trials with a total of 766 eligibility criteria and 50 trials for chronic lymphocytic leukemia (CLL) with 677 criteria. Further, we developed machine learning models to predict the preferred data source: kernel methods outperform simpler learning models when used with a combination of lexical, syntactic, se...
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, 2012
The manual annotation of clinical narratives is an important step for training and validating the... more The manual annotation of clinical narratives is an important step for training and validating the performance of automated systems that utilize these clinical narratives. We build an annotation specification to capture medical events, and coreferences and temporal relations between medical events in clinical text. Unfortunately, the process of clinical data annotation is both time consuming and costly. Many annotation efforts have used physicians to annotate the data. We investigate using annotators that are current students or graduates from diverse clinical backgrounds with varying levels of clinical experience. In spite of this diversity, the annotation agreement across our team of annotators is high; the average inter-annotator kappa statistic for medical events, coreferences, temporal relations, and medical event concept unique identifiers was 0.843, 0.859, 0.833, and 0.806, respectively. We describe methods towards leveraging the annotations to support temporal reasoning with medical events.
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, 2013
Understanding clinical workflow is critical for researchers and healthcare decision makers. Curre... more Understanding clinical workflow is critical for researchers and healthcare decision makers. Current workflow studies tend to oversimplify and underrepresent the complexity of clinical workflow. Continuous observation time motion studies (TMS) could enhance clinical workflow studies by providing rich quantitative data required for in-depth workflow analyses. However, methodological inconsistencies have been reported in continuous observation TMS, potentially reducing the validity of TMS' data and limiting their contribution to the general state of knowledge. We believe that a cornerstone in standardizing TMS is to ensure the reliability of the human observers. In this manuscript we review the approaches for inter-observer reliability assessment (IORA) in a representative sample of TMS focusing on clinical workflow. We found that IORA is an uncommon practice, inconsistently reported, and often uses methods that provide partial and overestimated measures of agreement. Since a comprehensive approach to IORA is yet to be proposed and validated, we provide initial recommendations for IORA reporting in continuous observation TMS.
Proceedings of the 2012 Conference of the North American Chapter of the Association For Computational Linguistics Human Language Technologies, Jun 3, 2012
ABSTRACT We investigate the task of medical concept coreference resolution in clinical text using... more ABSTRACT We investigate the task of medical concept coreference resolution in clinical text using two semi-supervised methods, co-training and multi-view learning with posterior regularization. By extracting semantic and temporal features of medical concepts found in clinical text, we create conditionally independent data views; co-training MaxEnt classifiers on this data works almost as well as supervised learning for the task of pairwise coreference resolution of medical concepts. We also train MaxEnt models with expectation constraints, using posterior regularization, and find that posterior regularization performs comparably to or slightly better than co-training. We describe the process of semantic and temporal feature extraction and demonstrate our methods on a corpus of case reports from the New England Journal of Medicine and a corpus of patient narratives obtained from The Ohio State University Wexner Medical Center.
Studies in Health Technology and Informatics, 2013
Electronic health records capture patient information using structured controlled vocabularies an... more Electronic health records capture patient information using structured controlled vocabularies and unstructured narrative text. While structured data typically encodes lab values, encounters and medication lists, unstructured data captures the physician's interpretation of the patient's condition, prognosis, and response to therapeutic intervention. In this paper, we demonstrate that information extraction from unstructured clinical narratives is essential to most clinical applications. We perform an empirical study to validate the argument and show that structured data alone is insufficient in resolving eligibility criteria for recruiting patients onto clinical trials for chronic lymphocytic leukemia (CLL) and prostate cancer. Unstructured data is essential to solving 59% of the CLL trial criteria and 77% of the prostate cancer trial criteria. More specifically, for resolving eligibility criteria with temporal constraints, we show the need for temporal reasoning and information integration with medical events within and across unstructured clinical narratives and structured data.
Telemedicine Journal and E Health the Official Journal of the American Telemedicine Association, May 20, 2008
The objective of the study was to develop and implement an architecture for remote training that ... more The objective of the study was to develop and implement an architecture for remote training that can be used in the narrowband home telemedicine environment. A remote training architecture, the REmote Patient Education in a Telemedicine Environment (REPETE) architecture, using a remote control protocol (RCP) was developed. A set of design criteria was specified. The developed architecture was integrated into the IDEATel home telemedicine unit (HTU) and evaluated against these design criteria using a combination of technical and expert evaluations. Technical evaluation of the architecture demonstrated that remote cursor movements and positioning displayed on the HTU were smooth and effectively real-time. The trainers were able to observe within approximately 2 seconds lag what the patient sees on their HTU screen. Evaluation of the architecture by experts was favorable. Responses to a Likert scale questionnaire regarding audio quality and remote control performance indicated that the expert evaluators thought that the audio quality and remote control performance were adequate for remote training. All evaluators strongly agreed that the system would be useful for training patients. The REPETE architecture supports basic training needs over a narrowband dial-up connection. We were able to maintain an audio chat simultaneously with performing a remote training session, while maintaining both acceptable audio quality and remote control performance. The RCP provides a mechanism to provide training without requiring a trainer to go to the patient's home and effectively supports deictic referencing to on screen objects.
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2005
In spite of efforts to develop easy-to-use devices, patients may require multiple training sessio... more In spite of efforts to develop easy-to-use devices, patients may require multiple training sessions to achieve mastery of advanced telehealth devices, especially those incorporating web-access. In geographically-distributed projects, such repeat training can be costly. A software architecture for simultaneous voice conferencing and remote device control over a single telephone line is presented. Evaluation of the pilot implementation is favorable.
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, Feb 1, 2008
Modeling the temporal information in the medical record is an important area of research. This pa... more Modeling the temporal information in the medical record is an important area of research. This paper describes an extension of TimeText, a temporal reasoning system designed to represent, extract, and reason about temporal information in clinical text, to include the use of fuzzy temporal constraints. The addition of fuzzy temporal constraints increases TimeText’s ability to handle uncertainty in temporal relations. We use a three-state, staircase possibility distribution function in conjunction with earlier methods of finding solutions to fuzzy temporal constraint networks. We perform analysis to determine the complexity of using this staircase in conjunction with finding solutions to fuzzy temporal constraint satisfaction problems and show that these solutions can be efficiently computed in O(n3).
Circulation, Mar 25, 2014
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, 2011
The increasing availability of personal genome data has led to escalating needs by consumers to u... more The increasing availability of personal genome data has led to escalating needs by consumers to understand the implications of their gene sequences. At present, poorly integrated genetic knowledge has not met these needs. This proof-of-concept study proposes a similarity-based approach to assess the disease risk predisposition for personal genomes. We hypothesize that the semantic similarity between a personal genome and a disease can indicate the disease risks in the person. We developed a knowledge network that integrates existing knowledge of genes, diseases, and symptoms from six sources using the Semantic Web standard, Resource Description Framework (RDF). We then used latent relationships between genes and diseases derived from our knowledge network to measure the semantic similarity between a personal genome and a genetic disease. For demonstration, we showed the feasibility of assessing the disease risks in one personal genome and discussed related methodology issues.
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, 2011
Temporal constraints are present in 38% of clinical research eligibility criteria and are crucial... more Temporal constraints are present in 38% of clinical research eligibility criteria and are crucial for screening patients. However, eligibility criteria are often written as free text, which is not amenable for computer processing. In this paper, we present an ontology-based approach to extracting temporal information from clinical research eligibility criteria. We generated temporal labels using a frame-based temporal ontology. We manually annotated 150 free-text eligibility criteria using the temporal labels and trained a parser using Conditional Random Fields (CRFs) to automatically extract temporal expressions from eligibility criteria. An evaluation of an additional 60 randomly selected eligibility criteria using manual review achieved an overall precision of 83%, a recall of 79%, and an F-score of 80%. We illustrate the application of temporal extraction with the use cases of question answering and free-text criteria querying.
Journal of Biomedical Informatics, 2016
Community-level factors have been clearly linked to health outcomes, but are challenging to incor... more Community-level factors have been clearly linked to health outcomes, but are challenging to incorporate into medical practice. Increasing use of electronic health records (EHRs) makes patient-level data available for researchers in a systematic and accessible way, but these data remain siloed from community-level data relevant to health. This study sought to link community and EHR data from an older female patient cohort participating in an ongoing intervention at the Ohio State University Wexner Medical Center to associate community-level data with patient-level cardiovascular health (CVH) as well as to assess the utility of this EHR integration methodology. CVH was characterized among patients using available EHR data collected May through July of 2013. EHR data for 153 patients were linked to United States census-tract level data to explore feasibility and insights gained from combining these disparate data sources. Analyses were conducted in 2014. Using the linked data, weekly per capita expenditure on fruits and vegetables was found to be significantly associated with CVH at the p<0.05 level and three other community-level attributes (median income, average household size, and unemployment rate) were associated with CVH at the p<0.10 level. This work paves the way for future integration of community and EHR-based data into patient care as a novel methodology to gain insight into multi-level factors that affect CVH and other health outcomes. Further, our findings demonstrate the specific architectural and functional challenges associated with integrating decision support technologies and geographic information to support tailored and patient-centered decision making therein.
Amia Annual Symposium Proceedings Amia Symposium Amia Symposium, 2012
Time Motion Studies (TMS) have proved to be the gold standard method to measure and quantify clin... more Time Motion Studies (TMS) have proved to be the gold standard method to measure and quantify clinical workflow, and have been widely used to assess the impact of health information systems implementation. Although there are tools available to conduct TMS, they provide different approaches for multitasking, interruptions, inter-observer reliability assessment and task taxonomy, making results across studies not comparable. We postulate that a significant contributing factor towards the standardization and spread of TMS would be the availability and spread of an accessible, scalable and dynamic tool. We present the development of a comprehensive Time Capture Tool (TimeCaT): a web application developed to support data capture for TMS. Ongoing and continuous development of TimeCaT includes the development and validation of a realistic inter-observer reliability scoring algorithm, the creation of an online clinical tasks ontology, and a novel quantitative workflow comparison method.