Knowledge-Based Data Analysis: First Step Toward the Creation of Clinical Prediction Rules Using a New Typicality Measure (original) (raw)
Related papers
WIT Transactions on Information and Communication Technologies, 2005
Researchers and practitioners in medicine use various clinical prediction rules to estimate the probability and severity of a disease. Based on a limited number of factors from medical history, physical examination, and laboratory tests, the practitioners use these rules to expedite diagnosis and treatment for serious cases and limit unnecessary tests for low-probability cases. However, before the rules can be used in clinics, they must be validated on large and diversified populations and evaluated in clinical settings. A computer system providing intelligent data analysis and data mining techniques can facilitate this lengthy and costly process. This paper describes a conceptual framework and a prototype system for the rule evaluation process. The study concentrates on predictive rules used in the diagnosis of obstructive sleep apnea (OSA), a common and serious respiratory disorder. The prototype system, Hypnos, includes (1) a framework for rule definition, and (2) a mechanism for rule evaluation. The rule definition framework is based on a semiotic and a fuzzy logic approach. The semiotic description incorporates a rule's syntax, semantics, and pragmatics. The fuzzy logic models the imprecise features. The rule evaluation mechanism supports first the validation and then comparison of the rules built by the domain experts and the rules generated by the data-driven method. The results from both methods are compared based on rule accuracy, interpretability, generality, and clinical utility. The prediction rules for OSA are described and evaluated using four datasets (1,300 records) from two clinics. The results show that rules obtained from data mining can confirm, contradict, or expand the rules created by medical experts. Therefore, the paper suggests a combination of knowledge-driven and data-driven methods for rule derivation and validation.
2005
Clinical prediction rules are created by medical researchers and practitioners based on their knowledge and clinical experience. Such expert-generated rules are then evaluated and refined in clinical tests. Once verified, these knowledge-driven rules are used to expedite diagnosis and treatment for the serious cases and to limit unnecessary tests for low-probability cases. Alternatively, machine learning techniques can be used for automated induction of comprehensible data-driven rules from vast amount of existing clinical data. This paper investigates how the rules generated by the clinical experts compare with the data-driven rules. The paper describes three outcomes: rule confirmation, contradiction, and expansion. The study concentrates on prediction rules for the diagnosis of obstructive sleep apnea using three clinical data sets with 1,318 records. The prototype system, Hypnos, includes both a framework for rule definition, and also a mechanism for rule induction.
Proceedings Workshop 'Intelligent Data Analysis in Medicine and Pharmacology' (IDAMAP '99), Washington DC, United States, pp. 91-102, 1999., 1999
In this article, joint medical and data analysis expertise is brought to bear using fuzzy knowledge representation and ‘intelligent’ aggregation techniques to solve a difficult medical diagnosis problem, that of sleep apnea syndrome screening. Key Words: fuzzy representation, sleep apnea diagnosis, questionnaire, aggregation
Descriptive Modelling of Clinical Conditions with Data-driven Rule Mining in Physiological Data
Proceedings of the International Conference on Health Informatics, 2015
This paper presents an approach to automatically mine rules in time series data representing physiological parameters in clinical conditions. The approach is fully data driven, where prototypical patterns are mined for each physiological time series data. The generated rules based on the prototypical patterns are then described in a textual representation which captures trends in each physiological parameter and their relation to the other physiological data. In this paper, a method for measuring similarity of rule sets is introduced in order to validate the uniqueness of rule sets. This method is evaluated on physiological records from clinical classes in the MIMIC online database such as angina, sepsis, respiratory failure, etc.. The results show that the rule mining technique is able to acquire a distinctive model for each clinical condition, and represent the generated rules in a human understandable textual representation.
Obstructive sleep apnea (OSA) is a common sleep disorder, especially in middle-aged and obese patients. However, the diagnosis and treatment of OSA is complex, particularly in patients with mild or moderate symptoms and co-occurring medical problems. Furthermore, sleep disorders are interconnected with diverse factors, such life style, shift work, and psychological problems. Case-based reasoning (CBR) is a natural approach in such a highly complex domain. CBR allows for the retrieval and analysis of successful outcomes, as well as, failures in treatment. To address the complexity of the domain knowledge, the authors applied a semio-fuzzy framework, an approach combining fuzzy logic and semiotics, in the development of a prototype CBR system. The system has been implemented in an educational sleep disorders clinic to assist respiratory students in the retrieval of prototypical cases, and has been particularly helpful in the analysis of co-occurring medical problems and the treatment process.
Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care, 1991
In clinical research data is often studied by a particular method without previous analysis of quality or semantic contents which could link clinical database and data analytical (e.g. statistical) procedures. In order to avoid bias caused by this situation, we propose that the analysis of medical data should be divided into two main steps. In the first one we concentrate on conducting the quality, semantic and structure analyses. In the second step our aim is to build an appropriate dictionary of data analysis methods for further knowledge extraction. Methods like robust statistical techniques, procedures for mixed continuous and discrete data, fuzzy linguistic approach, machine learning and neural networks can be included. The results may be evaluated both using test samples and applying other relevant data-analytical techniques to the particular problem under the study.
Clinical Data Mining: Problems, Pitfalls and Solutions
2013 24th International Workshop on Database and Expert Systems Applications, 2013
The high growth of clinical data sets originated from the introduction of electronic health records collected in medical environments. Nowadays, the computerization of clinical data is supported by many international projects and frequently national integrated clinical record systems are adopted in both public and private health-care facilities. The study of the diseases and the discovery of effective therapies demands collection, management, integration and analysis of clinical data. The main target is to obtain valuable knowledge from large sets of clinical data. Novel methods for their analysis are therefore required in order to extract relevant information and compact models. A candidate discipline is data mining, an interdisciplinary field that comprises computer science, statistics and artificial intelligence, and is dedicated to automate the knowledge discovery process. Data mining methods are able to consider different types of datastructured and unstructured -from disparate sources. The objectives of clinical data mining are to: recognize the trials that characterize a particular disease; deal with often incomplete (missing values) and noisy data sets (e.g. different measure scales); integrate patient samples from different sources; recognize trials name with the same meaning; integrate diverse patient data collection procedures. The management of clinical data, the discover of patients interactions, and the integration of the disparate data sources are the hardest problems to solve. Finally, after an adequate handling of these issues, a compact and human interpretable data model has to be extracted. In this work consolidated data mining methods able to manage and analyze clinical data sets are studied and applied to a real case study. Demented patient samples collected in different Italian health care facilities are investigated, providing a practical example of clinical data mining. Classification through logical rules -a supervised data mining technique, whose aim is to extract a data model in terms of propositional formulas ("ifthen rules") -is considered and compared with other state of art classification algorithms. It is shown that rule based classification is a promising technique for extracting significant models in clinical patient samples data sets in medical knowledge discovery.
The Prediction of Obstructive Sleep Apnea Using Data Mining Approaches
Archives of Iranian medicine, 2018
BACKGROUND Obstructive sleep apnea (OSA) which is the most common sleep disorder breathing (SDB), imposes heavy costs on health and economy. The aim of this study was to provide models based on data mining approaches (C5.0 decision tree and logistic regression model [LRM]) and choose a top model for predicting OSA without polysomnography (PSG) devices that is a standard method for diagnosis of this disease, to identify patients with this syndrome payment. METHODS In this cross sectional study, data was extracted from the medical records of 333 patients with sleep disorders who were referred to sleep disorders research center of Kermanshah University of Medical Sciences during the years 2012-2016. All patients underwent one night PSG. A stepwise LRM was fitted and its performance was compared with C5.0 decision tree with use of the criteria of accuracy, sensitivity and specificity. RESULTS For C5.0 decision tree, accuracy was obtained 0.757 with 95% confidence interval (0.661, 0.838)...
Expert Systems with Applications, 1997
Precise relationships between symptoms and diseases are rarely documented in the literature, and yet it is essential that the physician establishes a diagnostic label that will lend to the appropriate therapy. In obstructive sleep apnoea (a breathing disorder during sleep which is characterised by intermittent pauses in respiration) the borderline between normal and pathological is arbitrary. As such, fuzzy sets provide an intuitively appealing framework for representing the medical knowledge. In this paper, the expert system CADOSA for decision support in a hospital sleep clinic is presented. The medical knowledge in the system is stored in the form of fuzzy logical relationships between symptoms and diseases, between symptoms themselves and between symptom combinations and diseases. The symptoms present in the patient are confirmed by an interview, a physical examination, a questionnaire and laboratory tests, including an overnight oximetry study which measures oxygen saturation and heart rate. The fuzzy inference in the system is performed using the max-rain compositional rule to calculate indication relations expressing occurrence and confirmability. These lead to confirmed and excluded diagnoses as well as diagnostic hypotheses. In an evaluation of CADOSA using 21 patients, the proportion diagnosed correctly as confirmed 'excluded) obstructive sleep apnoea was 0.95 (1.00).
International Journal of Medical Informatics, Vol. 63, Issue 1-2, pp. 77-89, Elsevier, 2001., 2001
In this article, we revise and try to resolve some of the problems inherent in questionnaire screening of sleep apnea cases and apnea diagnosis based on attributes which are relevant and reliable. We present a way of learning information about the relevance of the data, comparing this with the definition of the information by the medical expert. We generate a predictive data model using a data aggregation operator which takes relevance and reliability information about the data into account to produce a diagnosis for each case. We also introduce a grade of membership for each question response which allows the patient to indicate a level of confidence or doubt in their own judgement. The method is tested with data collected from patients in a Sleep Clinic using questionnaires specially designed for the study. Other artificial intelligence predictive modeling algorithms are also tested on the same data and their predictive accuracy compared to that of the aggregation operator.