Using Published Medical Results and Non-homogenous Data in Rule Learning (original) (raw)

Towards application of rule learning to the meta-analysis of clinical data: An example of the metabolic syndrome

International Journal of Medical Informatics, 2009

Clinical data a b s t r a c t Purpose: Systematic reviews and meta-analysis of published clinical datasets are important part of medical research. By combining results of multiple studies, meta-analysis is able to increase confidence in its conclusions, validate particular study results, and sometimes lead to new findings. Extensive theory has been built on how to aggregate results from multiple studies and arrive to the statistically valid conclusions. Surprisingly, very little has been done to adopt advanced machine learning methods to support meta-analysis.

Bayesian rule learning for biomedical data mining

Bioinformatics, 2010

Motivation: Disease state prediction from biomarker profiling studies is an important problem because more accurate classification models will potentially lead to the discovery of better, more discriminative markers. Data mining methods are routinely applied to such analyses of biomedical datasets generated from highthroughput 'omic' technologies applied to clinical samples from tissues or bodily fluids. Past work has demonstrated that rule models can be successfully applied to this problem, since they can produce understandable models that facilitate review of discriminative biomarkers by biomedical scientists. While many rule-based methods produce rules that make predictions under uncertainty, they typically do not quantify the uncertainty in the validity of the rule itself. This article describes an approach that uses a Bayesian score to evaluate rule models. Results: We have combined the expressiveness of rules with the mathematical rigor of Bayesian networks (BNs) to develop and evaluate a Bayesian rule learning (BRL) system. This system utilizes a novel variant of the K2 algorithm for building BNs from the training data to provide probabilistic scores for IF-antecedent-THENconsequent rules using heuristic best-first search. We then apply rule-based inference to evaluate the learned models during 10-fold cross-validation performed two times. The BRL system is evaluated on 24 published 'omic' datasets, and on average it performs on par or better than other readily available rule learning methods. Moreover, BRL produces models that contain on average 70% fewer variables, which means that the biomarker panels for disease prediction contain fewer markers for further verification and validation by bench scientists.

Selecting the best machine learning algorithm to support the diagnosis of Non-Alcoholic Fatty Liver Disease: A meta learner study

PLOS ONE

Background & aims Liver ultrasound scan (US) use in diagnosing Non-Alcoholic Fatty Liver Disease (NAFLD) causes costs and waiting lists overloads. We aimed to compare various Machine learning algorithms with a Meta learner approach to find the best of these as a predictor of NAFLD. Methods The study included 2970 subjects, 2920 constituting the training set and 50, randomly selected, used in the test phase, performing cross-validation. The best predictors were combined to create three models: 1) FLI plus GLUCOSE plus SEX plus AGE, 2) AVI plus GLUCOSE plus GGT plus SEX plus AGE, 3) BRI plus GLUCOSE plus GGT plus SEX plus AGE. Eight machine learning algorithms were trained with the predictors of each of the three models created. For these algorithms, the percent accuracy, variance and percent weight were compared. Results The SVM algorithm performed better with all models. Model 1 had 68% accuracy, with 1% variance and an algorithm weight of 27.35; Model 2 had 68% accuracy, with 1% variance and an algorithm weight of 33.62 and Model 3 had 77% accuracy, with 1% variance and an algorithm weight of 34.70. Model 2 was the most performing, composed of AVI plus GLUCOSE plus GGT plus SEX plus AGE, despite a lower percentage of accuracy.

Unsupervised Machine Learning Application to Perform a Systematic Review and Meta-Analysis in Medical Research

Computación y Sistemas, 2016

When trying to synthesize information from multiple sources and perform a statistical review to compare them, particularly in the medical research field, several statistical tools are available, most common are the systematic review and the meta-analysis. These techniques allow the comparison of the effectiveness or success among a group of studies. However, a problem of these tools is that if the information to be compared is incomplete or mismatched between two or more studies, the comparison becomes an arduous task. On a parallel line, machine learning methodologies have been proven to be a reliable resource, such software is developed to classify several variables and learn from previous experiences to improve the classification. In this paper, we use unsupervised machine learning methodologies to describe a simple yet effective algorithm that, given a dataset with missing data, completes such data, which leads to a more complete systematic review and metaanalysis, capable of presenting a final effectiveness or success rating between studies. Our method is first validated in a movie ranking database scenario, and then used in a real life systematic review and metaanalysis of obesity prevention scientific papers, where 66.6% of the outcomes are missing.

Integrating knowledge-driven and data-driven approaches for the derivation of clinical prediction rules

2005

Clinical prediction rules are created by medical researchers and practitioners based on their knowledge and clinical experience. Such expert-generated rules are then evaluated and refined in clinical tests. Once verified, these knowledge-driven rules are used to expedite diagnosis and treatment for the serious cases and to limit unnecessary tests for low-probability cases. Alternatively, machine learning techniques can be used for automated induction of comprehensible data-driven rules from vast amount of existing clinical data. This paper investigates how the rules generated by the clinical experts compare with the data-driven rules. The paper describes three outcomes: rule confirmation, contradiction, and expansion. The study concentrates on prediction rules for the diagnosis of obstructive sleep apnea using three clinical data sets with 1,318 records. The prototype system, Hypnos, includes both a framework for rule definition, and also a mechanism for rule induction.

RULE LEARNING OVER MEDICAL DATA WITH MACHINE LEARNING ALGORITHMS

In this paper, OneR, Navie Bayes, JRip, Ridor, SMO, J48, LMT, Conjunctive Rule, Decision Tables, NNge, KStar, IBk, PART machine learning algorithms and Fuzzy Logic with classification analysis made from instances in medical data set. Furthermore, JRip, PART, OneR algorithms and Fuzzy Logic with constituted rules. Machine learning is all about learning rules from data. OneR algorithm is used in fuzzy logic classification to confirm. In this classification L_O2 and ADM_DECS attributes used. Approach to OneR algorithm is limited value 95.5 of L_O2. L_O2(A) and L_O2(S) fuzzy membership functions are approximately intersect at point ≈ 96.07 from the fuzzy membership functions at figure 1. This value is accepted as limit value. Each of our two results is nearer to each other. And we can see that results calculated with different fuzzy membership functions are more sensitive than the result of OneR algorithm.

A Systematic Machine Learning Based Approach for the Diagnosis of Non-Alcoholic Fatty Liver Disease Risk and Progression

Scientific reports, 2018

Prevention and diagnosis of NAFLD is an ongoing area of interest in the healthcare community. Screening is complicated by the fact that the accuracy of noninvasive testing lacks specificity and sensitivity to make and stage the diagnosis. Currently no non-invasive ATP III criteria based prediction method is available to diagnose NAFLD risk. Firstly, the objective of this research is to develop machine learning based method in order to identify individuals at an increased risk of developing NAFLD using risk factors of ATP III clinical criteria updated in 2005 for Metabolic Syndrome (MetS). Secondly, to validate the relative ability of quantitative score defined by Italian Association for the Study of the Liver (IASF) and guideline explicitly defined for the Canadian population based on triglyceride thresholds to predict NAFLD risk. We proposed a Decision Tree based method to evaluate the risk of developing NAFLD and its progression in the Canadian population, using Electronic Medical...

Machine Learning to Predict Cardiovascular Disease: Systematic Meta-Analysis

People do not have much time to think about their health in today's fast-paced society because of how busy their lives are. The overscheduling of people's lives and a widespread disregard for their health are two factors that contribute to the rise in the incidence of disease. Additionally, the great majority of people have disability that prevents them from operating properly, such as cardiovascular illness. These people are unable to operate normally because of their disabilities. According to the statistics provided by the World Health Organization (WHO), cardiovascular disease is responsible for more than one-third of all deaths. Because of this, it is very necessary for anybody working in the medical industry to be able to evaluate the chance that a patient may develop cardiovascular disease. However, because hospitals and the healthcare industry produce such a vast amount of data, it may be difficult to do research owing to the sheer amount of information that is available. The use of methods that are based on Machine Learning (ML) by medical professionals has the potential to cut down on the amount of time and effort spent formulating predictions and organising data. Because of this, we have been talking about the factors that lead to heart disease as well as the methods that are involved in machine learning. We evaluated and compared the efficacy of a wide variety of well-known ML algorithms that were utilised in the experiment, and we made inferences about cardiovascular disease using these techniques. In the end, the purpose of this research is to determine whether a ML system is capable of reliably predicting cardiac illness.

An Evaluation of Discretization Methods for Learning Rules from Biomedical Datasets

Rule learning has the major advantage of understandability by human experts when performing knowledge discovery within the biomedical domain. Many rule learning algorithms require discrete data in order to learn the IF-THEN rule sets. This requirement makes the selection of a discretization technique an important step in rule learning. We compare the performance of one standard technique, Fayyad and Irani's Minimum Description Length Principle Criterion, which is the defacto discretization method in many machine learning packages, to that of a new Efficient Bayesian Discretization (EBD) method and show that EBD leads to significant gains in performance especially as the complexity of the rule learner increases.