Mining Heterogeneous Information Graph for Health Status Classification (original) (raw)

Mining health knowledge graph for health risk prediction

World Wide Web

Nowadays classification models have been widely adopted in healthcare, aiming at supporting practitioners for disease diagnosis and human error reduction. The challenge is utilising effective methods to mine real-world data in the medical domain, as many different models have been proposed with varying results. A large number of researchers focus on the diversity problem of real-time data sets in classification models. Some previous works developed methods comprising of homogeneous graphs for knowledge representation and then knowledge discovery. However, such approaches are weak in discovering different relationships among elements. In this paper, we propose an innovative classification model for knowledge discovery from patients' personal health repositories. The model discovers medical domain knowledge from the massive data in the National Health and Nutrition Examination Survey (NHANES). The knowledge is conceptualised in a heterogeneous knowledge graph. On the basis of the model, an innovative method is developed to help uncover potential diseases suffered by people and, furthermore, to classify patients' health risk. The proposed model is evaluated by comparison to a baseline model also built on the NHANES data set in an empirical experiment. The performance of proposed model is promising. The paper makes significant contributions to the advancement of knowledge in data mining with an innovative classification model specifically crafted for domain-based data. In addition, by accessing the

Constructing a knowledge-based heterogeneous information graph for medical health status classification

Health Information Science and Systems, 2020

Applying Pearson correlation and semantic relations in building a heterogeneous information graph (HIG) to develop a classification model has achieved a notable performance in improving the accuracy of predicting the status of health risks. In this study, the approach that was used, integrated knowledge of the medical domain as well as taking advantage of applying Pearson correlation and semantic relations in building a classification model for diagnosis. The research mined knowledge which was extracted from titles and abstracts of MEDLINE to discover how to assess the links between objects relating to medical concepts. A knowledge-base HIG model then was developed for the prediction of a patient's health status. The results of the experiment showed that the knowledge-base model was superior to the baseline model and has demonstrated that the knowledge-base could help improve the performance of the classification model. The contribution of this study has been to provide a framework for applying a knowledge-base in the classification model which helps these models achieve the best performance of predictions. This study has also contributed a model to medical practice to help practitioners become more confident in making final decisions in diagnosing illness. Moreover, this study affirmed that biomedical literature could assist in building a classification model. This contribution will be advantageous for future researchers in mining the knowledge-base to develop different kinds of classification models.

Mining Electronic Health Records to Guide and Support Clinical Decision Support Systems

Improving Health Management through Clinical Decision Support Systems, 2016

Clinical decision support systems require well-designed electronic health record (EHR) systems and vice versa. The data stored or captured in EHRs are diverse and include demographics, billing, medications, and laboratory reports; and can be categorized as structured, semi-structured and unstructured data. Various data and text mining techniques have been used to extract these data from EHRs for use in decision support, quality improvement and research. Mining EHRs has been used to identify cohorts, correlated phenotypes in genome-wide association studies, disease correlations and risk factors, drug-drug interactions, and to improve health services. However, mining EHR data is a challenge with many issues and barriers. The aim of this chapter is to discuss how data and text mining techniques may guide and support the building of improved clinical decision support systems.

An Introduction to Data Mining Applied to Health-Oriented Databases

The application of data mining (DM) in healthcare is increasing. Healthcare organizations generate and collect large voluminous and heterogeneous information daily and DM helps to uncover some interesting patterns, which leads to the manual tasks elimination, easy data extraction directly from records, to save lives, to reduce the cost of medical services and to enable early detection of diseases. These patterns can help healthcare specialists to make forecasts, put diagnoses, and set treatments for patients in health facilities. This work overviews DM methods and main issues. Three case studies illustrate DM in healthcare applications: (i) In-Vitro Fertilization; (ii) Content-Based Image Retrieval (CBIR); and (iii) Organ transplantation.

Data Mining in Electronic Health Records – A Survey

Abstract-An electronic health record (EHR) is an evolving concept defined as a systematic collection of electronic health information about individual patients or human beings. It is one of the digitalized records that are theoretically capable of being shared across different health care settings. In the minority cases this sharing can occur by way of network-connected, enterprise wide information system and other information networks or exchanges. The purpose of this paper is to review and summarize the literature on the benefits of EHR, advantages of EHR, drawbacks of EHR, and role of data mining in EHR. This paper also describes the potential research problems in EHR which include Disease prediction, Finding similar symptoms or patients and Privacy-preserving and efficient access retrieval system.

Health prediction system using Data Mining

International Journal of Advance Research, Ideas and Innovations in Technology, 2019

Here we propose a framework that enables clients to get moment direction on their medical problems through an astute social intelligent health care system online. The framework is bolstered with different symptoms and the disease or illness associated with those systems. Also the system allows user to share their symptoms and issues. Data Mining as a field of research has already well proven capabilities of identifying hidden patterns, analysis and knowledge applied on different research domains, now gaining popularity day by day among researchers and scientist towards generating novel and deep insights of these large biomedical datasets also. Uncovering new biomedical and healthcare related knowledge to support clinical decision making, is another dimension of data mining. Through massive literature survey, it is found that early disease prediction is the most demanded area of research in health care sector.

Inferring disease correlation from healthcare data

ArXiv, 2015

Electronic Health Records maintained in health care settings are a potential source of substantial clinical knowledge. The massive volume of data, unstructured nature of records and obligatory requirement of domain acquaintance together pose a challenge in knowledge extraction from it. The aim of this study is to overcome this challenge with a methodical analysis, abstraction and summarization of such data. This is an attempt to explain clinical observations through bio-medical and genomic data. Discharge summaries of obesity patients were processed to extract coherent patterns. This was supported by Machine Learning and Natural Language Processing based technologies and concept mapping tool along with biomedical, clinical and genomic knowledge bases. Semantic relations between diseases were extracted and filtered through Chi square test to remove spurious relations. The remaining relations were validated against biomedical literature and gene interaction networks. A collection of b...

Data mining and clinical data repositories: Insights from a 667,000 patient data set

Computers in Biology and Medicine, 2006

Clinical repositories containing large amounts of biological, clinical, and administrative data are increasingly becoming available as health care systems integrate patient information for research and utilization objectives. To investigate the potential value of searching these databases for novel insights, we applied a new data mining approach, HealthMiner ᭨ , to a large cohort of 667,000 inpatient and outpatient digital records from an academic medical system. HealthMiner ᭨ approaches knowledge discovery using three unsupervised methods: CliniMiner ᭨ , Predictive Analysis, and Pattern Discovery. The initial results from this study suggest that these approaches have the potential to expand research capabilities through identification of potentially novel clinical disease associations.

Personal health indexing based on medical examinations: A data mining approach

We design a method called MyPHI that predicts personal health index (PHI), a new evidence-based health indicator to explore the underlying patterns of a large collection of geriatric medical examination (GME) records using data mining techniques. We define PHI as a vector of scores, each reflecting the health risk in a particular disease category. The PHI prediction is formulated as an optimization problem that finds the optimal soft labels as health scores based on medical records that are infrequent, incomplete, and sparse. Our method is compared with classification models commonly used in medical applications. The experimental evaluation has demonstrated the effectiveness of our method based on a real-world GME data set collected from 102,258 participants.

IJERT-Effective Chronic Disease Progression Model using Frequent Subgraph Mining Algorithm

International Journal of Engineering Research and Technology (IJERT), 2019

https://www.ijert.org/effective-chronic-disease-progression-model-using-frequent-subgraph-mining-algorithm https://www.ijert.org/research/effective-chronic-disease-progression-model-using-frequent-subgraph-mining-algorithm-IJERTCONV7IS01014.pdf Public healthcare funds around the world a billions of dollars in losses due to Healthcare insurance fraud. Understanding disease progression can help the investigators to detect healthcare insurance frauds early on. Existing disease progression methods often ignore complex relations, such as the time-gap and pattern of disease occurrence. They also do not take into account the different medication stages of the same chronic disease, which is of great help when conducting healthcare insurance fraud detection and reducing healthcare costs. This project proposes a heterogeneous network-based chronic disease progression mining method to improve the current understanding on the progression of chronic diseases, including orphan diseases. The method also considers the different medication stages of the same chronic disease. Combining automated method and statistical knowledge lead to the emergence of a new interdisciplinary branch of science that is named Knowledge Discovery from Databases(KDD).