Decision Tree Research Papers - Academia.edu (original) (raw)

Résumé Nous proposons et testons deux méthodes de prédiction de la capacité d'un système à répondre à une question factuelle. Une telle prédiciton permet de déterminer si l'on doit initier un dialogue afin de préciser ou de... more

Résumé Nous proposons et testons deux méthodes de prédiction de la capacité d'un système à répondre à une question factuelle. Une telle prédiciton permet de déterminer si l'on doit initier un dialogue afin de préciser ou de reformuler la question posée par l'utilisateur. La première approche que nous proposons est une adaptation d'une méthode de prédiction dans le domaine de la recherche documentaire, basée soit sur des machines à vecteurs supports (SVM) soit sur des arbres de décision, avec des critères tels que le ...

Data mining is a generous field for researchers due to its various approaches on knowledge discovery in enormous volumes of data that are stored in different formats. At present, data are widely used all over the world, covering areas... more

Data mining is a generous field for researchers due to its various approaches on knowledge discovery in enormous volumes of data that are stored in different formats. At present, data are widely used all over the world, covering areas such as: education, industry, medicine, banking, inssurance companies, research laboratories, business, military domain etc. The major gain from applying data mining techniques is the discovery of unknown patterns and relations between data which can further help in the decision-making processes. There are two forms of data analysis used to extract models by describing important classes or to predict future data trends: classification and prediction. In this paper, the authors present a comparative study of classification algorithms (i.e. Decision Tree, Naïve Bayes and Random Forest) that are currently applied to demographic data referring to death statistics using KNIME Analytics Platform. Our study was based on statistical data provided by the Nation...

Incremental cost per QALY figures are often grouped in league tables, which imply that interventions at the top (with lower cost per QALY figures) should take priority over those further down (see table). Many commentators have cautioned... more

Incremental cost per QALY figures are often grouped in league tables, which imply that interventions at the top (with lower cost per QALY figures) should take priority over those further down (see table). Many commentators have cautioned against the unthinking use of league tables ...

Using modern methods of the electronic commerce in daily life transactions is increasing because of the growth and the comfortable access of the people to the internet and social networks. The electronic payment systems are one of the... more

Using modern methods of the electronic commerce in daily life transactions is increasing because of the growth and the comfortable access of the people to the internet and social networks. The electronic payment systems are one of the most important electronic commerce methods and the electronic payment fraud is a major problem.For example, the credit card fraud loss increases every year and is regarded as one of the important issues in the credit card institutes and corporations. Therefore, fraud detection is considered as an important research challenge. Fraud reduction is a complicated process requiring a body of knowledge in many scientific fields. Based on the kind of the fraud the banks or the credit card institutes face, different measures may be taken. This paper compares and analyzes the available recent findings on the credit card fraud detection techniques. The objectives of the present study are first to detect different credit card and electronic commerce fraud and then to investigate the strategies used for the purpose of detection.

In this paper, we propose a hybrid approach of Arabic scripts web page language identification based on decision tree and ARTMAP approaches. We use the decision tree approach to find the general identities of a web document, be it an... more

In this paper, we propose a hybrid approach of Arabic scripts web page language identification based on decision tree and ARTMAP approaches. We use the decision tree approach to find the general identities of a web document, be it an Arabic script-based or a non-Arabic-based. Then, we use the selected representations of identified pages from the decision tree approach as

Prosody has been widely used in many speech-related applications including speaker and word recognition, emotion and accent identification, topic and sentence segmentation, and text-to-speech applications. An important application we... more

Prosody has been widely used in many speech-related applications including speaker and word recognition, emotion and accent identification, topic and sentence segmentation, and text-to-speech applications. An important application we investigate is that of identifying question sentences in Arabic monologue lectures. Languages other than Arabic have received a lot of attention in this regard. We approach this problem by first segmenting the sentences from the continuous speech using intensity and duration features. Prosodic features are, then, extracted from each sentence. These features are used as input to decision trees to classify each sentence into either question or non question sentence. Our results suggest that questions are cued by more than one type of prosodic features in natural Arabic speech. We used C4.5 decision trees for classification and achieved 75.7% accuracy. Feature specific analysis further reveals that energy and fundamental frequency features are mainly responsible for discriminating between questions and non-question sentences.

The study of optical font recognition has becoming more popular nowadays. In line to that, global analysis approach is extensively used to identify various font type to classify writer identity. Objective of this paper is to propose an... more

The study of optical font recognition has becoming more popular nowadays. In line to that, global analysis approach is extensively used to identify various font type to classify writer identity. Objective of this paper is to propose an enhanced global analysis method. Based on statistical analysis of edge pixels relationships, a novel method in feature extraction for binary images has proposed. We test the proposed method on Arabic calligraphy script image for optical font recognition application. We classify those images using Multilayer Network, Bayes network and Decision Tree classifiers to identify the Arabic calligraphy type. The experiments results shows that our proposed method has boost up the overall performance of the optical font recognition.

In the year 1999, when T. R Golub first presented an idea for classifying cancer at the molecular level, this boosted research in cancer diagnosis to a whole new level. The researchers began to analyze the disease at the genetic level... more

In the year 1999, when T. R Golub first presented an idea for classifying cancer at the molecular level, this boosted research in cancer diagnosis to a whole new level. The researchers began to analyze the disease at the genetic level with the help of microarray databases. Then there were many new algorithms designed by researchers to classify different types of cancer. The objective of this paper is to present a tool designed exclusively to predict and classify leukemia into its types. The leukemia dataset published by Golub is used for this purpose. The first step is to identify the most significant genes causing cancer from the training set. These selected genes then are used to build the classifier based on decision rules, and eventually to predict the type of leukamia. This classifier which is modeled based on decision rules is found to work with an accuracy of 94%. The algorithm is quite simple in terms of complexity. It is possible to use a minimum number of genes for classification purposes rather than using a large set of genes. The genes that are responsible for prognosis of cancer are mainly selected for designing the classifier.

Naive Bayes is one of most effective classification algorithms. In many applications, however, a ranking of examples are more desirable than just classification. How to extend naive Bayes to improve its ranking performance is an... more

Naive Bayes is one of most effective classification algorithms. In many applications, however, a ranking of examples are more desirable than just classification. How to extend naive Bayes to improve its ranking performance is an interesting and useful question in practice. Weighted naive Bayes is an extension of naive Bayes, in which attributes have different weights. This paper investigates how to learn a weighted naive Bayes with accurate ranking from data, or more precisely, how to learn the weights of a weighted naive Bayes to produce accurate ranking. We explore various methods: the gain ratio method, the hill climbing method, and the Markov chain Monte Carlo method, the hill climbing method combined with the gain ratio method, and the Markov chain Monte Carlo method combined with the gain ratio method. Our experiments show that a weighted naive Bayes trained to produce accurate ranking outperforms naive Bayes.

Use of morphing engine in metamorphic and polymorphic malware, and virus creation kits aid malware authors to produce a plenty number of variants for a virus. These variants belong to a family and have common behavioral and some... more

Use of morphing engine in metamorphic and polymorphic malware, and virus creation kits aid malware authors to produce a plenty number of variants for a virus. These variants belong to a family and have common behavioral and some statistical characteristics. However, these variants are not detectable via a single common string signature. Some statistical analyses have been tested in recent years to fight against these types of multi-variants family malware. In this research, we introduce and examine an opcodes statistics-based classifier using decision tree. This method is very simple in implementation. Our experimental outcome shows that different malware family executable files are classifiable using their opcodes statistical feature, with a high degree of reliability.

Improving student's academic performance is not an easy task for the academic community of higher learning. The academic performance of engineering and science students during their first year at university is a turning point in... more

Improving student's academic performance is not an easy task for the academic community of higher learning. The academic performance of engineering and science students during their first year at university is a turning point in their educational path and usually encroaches on their General Point Average (GPA) in a decisive manner. The students evaluation factors like class quizzes mid and final exam assignment lab - work are studied. It is recommended that all these correlated information should be conveyed to the class teacher before the conduction of final exam. This study will help the teachers to reduce the drop out ratio to a significant level and improve the performance of students. In this paper, we present a hybrid procedure based on Decision Tree of Data mining method and Data Clustering that enables academicians to predict student's GPA and based on that instructor can take necessary step to improve student academic performance Graded Point Average (GPA) is a co...

This paper details the application of a decision tree algorithm for classification of soil types. The productivity of agriculture depends on environmental conditions and soil types. Soil dataset of particular area is downloaded from the... more

This paper details the application of a decision tree algorithm for classification of soil types. The productivity of agriculture depends on environmental conditions and soil types. Soil dataset of particular area is downloaded from the Kaggle website for the purpose of finding the classification of soil types and based on the types of soil predicted the agriculturist will sow the seed. In this paper Soil types are classified by applying Bayesian approach to Decision Tree algorithm with Bayesian model is used for finding the classification of soil types. The idea behind is rather simple but powerful. The proposed algorithm of ers some unique features not to be found in any other tree inducers while at the same time it can produce better results for many dif icult problems. Experimental results are presented which illustrate the performance of generating best decision tree for classifying soil type from the given soil dataset. The Algorithm of Bayesian approach to Decision Tree helps to classify the soil types more accurately than the existing Algorithms KNN, SVM and Decision Tree selected for this research paper.

In polytechnic system, a student must take the elective subjects at least 3 subjects to complete their study. The elective subjects were chosen based on their interest and first come first serve. The result obtained for elective subjects... more

In polytechnic system, a student must take the elective subjects at least 3 subjects to complete their study. The elective subjects were chosen based on their interest and first come first serve. The result obtained for elective subjects in final examination will affect their future. It is important to predict whether they pass or fail in final examination. Literature survey (LS) was used to obtain the information about students' profile and current approaches in predicting the students' performance using data mining. In this paper, the researcher uses data mining which is decision tree method to predict the students' performance in elective subject. The aim of this research is to evaluate the students result in choosing the correct elective subjects. This research is focused on the ICT students who select DBM3033 as an elective subject. Two phases involved which are prepossessing data and mining data. RapidMiner software is used in mining data process. Classification technique is applied for decision tree method. The research findings showed that students whose result weak in both SPM Mathematics and DBM1033 are predicted as fail in final examination for DBM3033.

In this paper, several experiments about video categorization using a supervised learning approach are presented. To this end, the VideoCLEF 2008 evaluation forum has been chosen as experimental framework. After an analysis of the... more

In this paper, several experiments about video categorization using a supervised learning approach are presented. To this end, the VideoCLEF 2008 evaluation forum has been chosen as experimental framework. After an analysis of the VideoCLEF corpus, it was found that video transcriptions are not the best source of information in order to identify the thematic of video streams. Therefore, two

Modern web search engines are federated — a user query is sent to the numerous specialized search engines called verticals like web (text documents), News, Image, Video, etc. and the results returned by these engines are then aggregated... more

Modern web search engines are federated — a user query is sent to the numerous specialized search engines called verticals like web (text documents), News, Image, Video, etc. and the results returned by these engines are then aggregated and composed into a search result page (SERP) and presented to the user. For a specific query, multiple verticals could be relevant, which makes the placement of these vertical results within blocks of textual web results challenging: how do we represent, assess, and compare the relevance of these heterogeneous entities? In this paper we present a machine-learning framework for SERP composition in the presence of multiple relevant verticals. First, instead of using the traditional label generation method of human judgment guidelines and trained judges, we use a randomized online auditioning system that allows us to evaluate triples of the form

What are the advantages to managers in using the design decision tree? There appear to be several: 1. 1. It provides a broad framework for identifying the key factors a manager should think about in considering an organizational design.... more

What are the advantages to managers in using the design decision tree? There appear to be several: 1. 1. It provides a broad framework for identifying the key factors a manager should think about in considering an organizational design. For example: What is our environment? ...

When computer security violations are detected, com-puter forensic analysts attempting to determine the relevant causes and effects are forced to perform the tedious tasks of finding and preserving useful clues in large networks of... more

When computer security violations are detected, com-puter forensic analysts attempting to determine the relevant causes and effects are forced to perform the tedious tasks of finding and preserving useful clues in large networks of op-erational machines. To augment a computer crime ...

Every mobile operator of today’s world switches their technology over from 2G (second generation) to 3G (third generation) network. Operators are keen analyzing their CDR (call detail record) obtained over the past usage for predicting... more

Every mobile operator of today’s world switches their technology over from 2G (second generation) to 3G (third generation) network. Operators are keen analyzing their CDR (call detail record) obtained over the past usage for predicting the behavior of their customers and their usage. The operators are willing to mine knowledge from real-world dataset which implies the pattern of user mentality on this changing world. To identify the usage of 2G and 3G services the classification models were trained using the data collected from PAKDD 2006 dataset. In order to obtain the prediction accuracy, the classifiers were evaluated using 10 folds cross validation. On comparing the results of the experiment, J48 performed more accurately and random tree consumed less time.