Decision Trees Research Papers - Academia.edu (original) (raw)

Spinal cord stimulation (SCS) is an effective method of relieving chronic intractable pain, and one of its key indications is failed back surgery syndrome (FBSS). The objective of the current study was to evaluate the cost effectiveness... more

Spinal cord stimulation (SCS) is an effective method of relieving chronic intractable pain, and one of its key indications is failed back surgery syndrome (FBSS). The objective of the current study was to evaluate the cost effectiveness of 10 kHz high-frequency SCS (HF10 SCS) compared to conventional medical management (CMM), reoperation, and traditional nonrechargeable (TNR-SCS) and rechargeable SCS (TR-SCS). A health economic model of SCS in the United Kingdom was reproduced in the perspective of the health care system to simulate costs and quality adjusted life years (QALYs) over 15 years. In the model, both a decision tree and the Markov model were used to describe the health outcomes of the evaluated therapies. HF10 SCS therapy showed a favorable incremental cost-effectiveness ratio (ICER) of £3,153 per QALY gained as compared to CMM and established dominance (less costly, more QALYs) compared to TNR-SCS (£8,802 per QALY vs. CMM) and TR-SCS (£5,101 per QALY vs. CMM). This first...

The concepts of causation and prediction are different, and have different implications for practice. This distinction is applied here to studies of the problem of student attrition (although it is more widely applicable). Studies of... more

The concepts of causation and prediction are different, and have different implications for practice. This distinction is applied here to studies of the problem of student attrition (although it is more widely applicable). Studies of attrition from nursing courses have tended to concentrate on causation, trying, largely unsuccessfully, to elicit what causes drop out. However, the problem may more fruitfully be cast in terms of predicting who is likely to drop out. One powerful method for attempting to make predictions is rule induction. This paper reports the use of the Answer Tree package from SPSS for that purpose. The main data set consisted of 3978 records on 528 nursing students, split into a training set and a test set. The source was standard university student records. The method obtained 84% sensitivity, 70% specificity, and 94% accuracy on previously unseen cases. The method requires large amounts of high quality data. When such data are available, rule induction offers a ...

Using modern methods of the electronic commerce in daily life transactions is increasing because of the growth and the comfortable access of the people to the internet and social networks. The electronic payment systems are one of the... more

Using modern methods of the electronic commerce in daily life transactions is increasing because of the growth and the comfortable access of the people to the internet and social networks. The electronic payment systems are one of the most important electronic commerce methods and the electronic payment fraud is a major problem.For example, the credit card fraud loss increases every year and is regarded as one of the important issues in the credit card institutes and corporations. Therefore, fraud detection is considered as an important research challenge. Fraud reduction is a complicated process requiring a body of knowledge in many scientific fields. Based on the kind of the fraud the banks or the credit card institutes face, different measures may be taken. This paper compares and analyzes the available recent findings on the credit card fraud detection techniques. The objectives of the present study are first to detect different credit card and electronic commerce fraud and then to investigate the strategies used for the purpose of detection.

Lots of studies worldwide have been carried out to check out the prevalence of Hepatitis C Virus (HCV) in human populations. Spatial data analysis and clustering detection is a vital process in HCV monitoring to discover the area of high... more

Lots of studies worldwide have been carried out to check out the prevalence of Hepatitis C Virus (HCV) in human populations. Spatial data analysis and clustering detection is a vital process in HCV monitoring to discover the area of high risk and to help involved decision makers to draw hypotheses about the cause of disease. Egypt is declared as one of the countries having the highest prevalence rate of HCV worldwide. The anomaly of the HCV infection's distribution in Egypt allowed several researches to identify the reasons that contributed to such widespread of HCV in this country. One way that can help in identification of areas with highest diseases is to give a detailed knowledge about the geographical distribution of HCV in Egypt. To achieve that goal, Data mining analytical tools integrated with GIS can help to visualize the distribution. Thus, the main propose of this paper is to present a spatial distribution of HCV in Egypt using case data obtained from the Egyptian health institute National Hepatology Tropical Medicine Research Institute (NHTMR). The visualization of the spatial analysis distribution by means of GIS allows us to investigate statistical results that are easily interpreted by non-experts.

Prosody has been widely used in many speech-related applications including speaker and word recognition, emotion and accent identification, topic and sentence segmentation, and text-to-speech applications. An important application we... more

Prosody has been widely used in many speech-related applications including speaker and word recognition, emotion and accent identification, topic and sentence segmentation, and text-to-speech applications. An important application we investigate is that of identifying question sentences in Arabic monologue lectures. Languages other than Arabic have received a lot of attention in this regard. We approach this problem by first segmenting the sentences from the continuous speech using intensity and duration features. Prosodic features are, then, extracted from each sentence. These features are used as input to decision trees to classify each sentence into either question or non question sentence. Our results suggest that questions are cued by more than one type of prosodic features in natural Arabic speech. We used C4.5 decision trees for classification and achieved 75.7% accuracy. Feature specific analysis further reveals that energy and fundamental frequency features are mainly responsible for discriminating between questions and non-question sentences.

In modern computational science, the interplay existing between machine learning and optimization process marks the most vital developments. Optimization plays an important role in mechanical industries because it leads to reduce in... more

In modern computational science, the interplay existing between machine learning and optimization process marks the most vital developments. Optimization plays an important role in mechanical industries because it leads to reduce in material cost, time consumption and increase in production rate. The recent work focuses on performing the optimization task on Friction Stir Welding process for obtaining the maximum Ultimate Tensile Strength (UTS) of the friction stir welded joints. Two machine learning algorithms i.e. Artificial Neural Network (ANN) and Decision Trees regression model are selected for the purpose. The input variables are Tool Rotational Speed (RPM), Tool Traverse Speed (mm/min) and Axial Force (KN) while the output variable is Ultimate Tensile Strength (MPa). It is observed that in case of the Artificial Neural Networks the Root Mean Square Errors for training and testing sets are 0.842 and 0.808 respectively while in case of Decision Trees regression model, the training and testing sets result Root Mean Square Errors of 11.72 and 14.61. So, it can be concluded that ANN algorithm gives better and accurate result than Decision Tree regression algorithm.

Naive Bayes is one of most effective classification algorithms. In many applications, however, a ranking of examples are more desirable than just classification. How to extend naive Bayes to improve its ranking performance is an... more

Naive Bayes is one of most effective classification algorithms. In many applications, however, a ranking of examples are more desirable than just classification. How to extend naive Bayes to improve its ranking performance is an interesting and useful question in practice. Weighted naive Bayes is an extension of naive Bayes, in which attributes have different weights. This paper investigates how to learn a weighted naive Bayes with accurate ranking from data, or more precisely, how to learn the weights of a weighted naive Bayes to produce accurate ranking. We explore various methods: the gain ratio method, the hill climbing method, and the Markov chain Monte Carlo method, the hill climbing method combined with the gain ratio method, and the Markov chain Monte Carlo method combined with the gain ratio method. Our experiments show that a weighted naive Bayes trained to produce accurate ranking outperforms naive Bayes.

In this study, electroencephalogram (EEG) signals obtained by a single-electrode device from 24 subjects - 10 with Alzheimer's disease (AD) and 14 age-matched Controls (CN) - were analyzed using Discrete Wavelet Transform (DWT). The... more

In this study, electroencephalogram (EEG) signals obtained by a single-electrode device from 24 subjects - 10 with Alzheimer's disease (AD) and 14 age-matched Controls (CN) - were analyzed using Discrete Wavelet Transform (DWT). The focus of the study is to determine the discriminating EEG features of AD patients while subjected to cognitive and auditory tasks, since AD is characterized by progressive impairments in cognition and memory. At each recording block, DWT extracts EEG features corresponding to major brain frequency bands. T-test and Kruskal-Wallis methods were used to determine the statistically significant features of EEG signals from AD patients compared to Controls. A decision tree algorithm was then used to identify the dominant features for AD patients. It was determined that the mean value of the low-δ (1 - 2 Hz) frequency band during the Paced Auditory Serial Addition Test with 2.0 (s) interval and the mean value of the δ frequency band (12 - 30 Hz) during 6 Hz...

OBJECT Cervicomedullary tumors (CMTs) represent a heterogeneous group of intrinsic neoplasms that are typically low grade and generally carry a good prognosis. This single-institution study was undertaken to document the outcomes and... more

OBJECT Cervicomedullary tumors (CMTs) represent a heterogeneous group of intrinsic neoplasms that are typically low grade and generally carry a good prognosis. This single-institution study was undertaken to document the outcomes and current treatment philosophy for these challenging neoplasms. METHODS The charts of all pediatric patients with CMTs who received treatment at St. Jude Children's Research Hospital between January 1988 and May 2013 were retrospectively reviewed. Demographic, surgical, clinical, radiological, pathological, and survival data were collected. Treatment-free survival and overall survival were estimated, and predictors of recurrence were analyzed. RESULTS Thirty-one children (16 boys, 15 girls) with at least 12 months of follow-up data were identified. The median age at diagnosis was 6 years (range 7 months-17 years) and the median follow-up was 4.3 years. Low-grade tumors (Grade I or II) were present in 26 (84%) patients. Thirty patients underwent either...

BACKGROUND AND SIGNIFICANCE Falls are among the most common and serious problems facing elderly persons. Falling is associated with considerable mortality, morbidity, reduced functioning, and premature nursing home admissions. 1–5 Falls... more

BACKGROUND AND SIGNIFICANCE Falls are among the most common and serious problems facing elderly persons. Falling is associated with considerable mortality, morbidity, reduced functioning, and premature nursing home admissions. 1–5 Falls generally result from an interaction of multiple and diverse risk factors and situations, many of which can be corrected. This interaction is modified by age, disease, and the presence of hazards in the environment. 6 Frequently, older people are not aware of their risks of falling, and neither recognize risk factors nor report these issues to their physicians. Consequently opportunities for prevention of falling are often overlooked with risks becoming evident only after injury and disability have already occurred. 7–9 Both the incidence of falls and the severity of fallrelated complications rise steadily after age 60. In the age 65-and-over population as a whole, approximately 35% to 40% of community-dwelling, generally healthy older persons fall a...

Neuropsychologists routinely rely on response validity measures to evaluate the authenticity of test performances. However, the relationship between cognitive and psychological response validity measures is not clearly understood. It... more

Neuropsychologists routinely rely on response validity measures to evaluate the authenticity of test performances. However, the relationship between cognitive and psychological response validity measures is not clearly understood. It remains to be seen whether psychological test results can predict the outcome of response validity testing in clinical and civil forensic samples. The present analysis applied a unique statistical approach, classification tree methodology (Optimal Data Analysis: ODA), in a sample of 307 individuals who had completed the MMPI-2 and a variety of cognitive effort measures. One hundred ninety-eight participants were evaluated in a secondary gain context, and 109 had no identifiable secondary gain. Through recurrent dichotomous discriminations, ODA provided optimized linear decision trees to classify either sufficient effort (SE) or insufficient effort (IE) according to various MMPI-2 scale cutoffs. After “pruning” of an initial, complex classification tree,...

When computer security violations are detected, com-puter forensic analysts attempting to determine the relevant causes and effects are forced to perform the tedious tasks of finding and preserving useful clues in large networks of... more

When computer security violations are detected, com-puter forensic analysts attempting to determine the relevant causes and effects are forced to perform the tedious tasks of finding and preserving useful clues in large networks of op-erational machines. To augment a computer crime ...

The need for comparative effectiveness (CE) data continues to grow, fuelled by market demand as well as health reform. There may be an assumption that new drugs result in improved efficacy compared with the standard of care, therefore... more

The need for comparative effectiveness (CE) data continues to grow, fuelled by market demand as well as health reform. There may be an assumption that new drugs result in improved efficacy compared with the standard of care, therefore warranting premium prices. Gout treatment has recently become controversial, as expensive new drugs enter the market with limited CE data. The authors reviewed published clinical trials and conducted a cost effectiveness analysis on a new drug (febuxostat) versus the standard (allopurinol) to illustrate the limitations in using these data to inform evidence-based decision-making. Although febuxostat trials included allopurinol as a comparator, methodological limitations make comparative effectiveness evaluations difficult. However, when available trial data were input to a decision analytic model, the authors found that a significant reduction in febuxostat cost would be required in order for it to dominate allopurinol in cost effectiveness analysis. T...

Gradient Boosting Decision Trees (GBDT) algorithms have been proven to be among the best algorithms in machine learning. XGBoost, the most popular GBDT algorithm, has won many competitions on websites like Kaggle. However, XGBoost is not... more

Gradient Boosting Decision Trees (GBDT) algorithms have been proven to be among the best algorithms in machine learning. XGBoost, the most popular GBDT algorithm, has won many competitions on websites like Kaggle. However, XGBoost is not the only GBDT algorithm with state-of-the-art performance. There are other GBDT algorithms that have more advantages than XGBoost and sometimes even more potent like LightGBM and CatBoost. This paper aims to compare the performance of CPU implementation of the top three gradient boosting algorithms. We start by explaining how the three algorithms work and the hyperparameters similarities between them. Then we use a variety of performance criteria to evaluate their performance. We divide the performance criteria into four: accuracy, speed, reliability, and ease of use. The performance of the three algorithms has been tested with five classification and regression problems. Our findings show that the LightGBM algorithm has the best performance of the ...