Comparative Analysis of Various Decision Tree Classification Algorithms using WEKA (original) (raw)

Decision Tree Classification Using Weka

For this exercise, you will use WEKA Explorer interface to run J48 decision tree classification algorithm. J48 classification is a supervised learning algorithm, where the class of an instance in the training set is known. We use the training data to construct the decision model for the instance to class assignment, and we use the model to predict a class of the instances in the test data. In this exercise, you will explorer the algorithm parameters and test options and will compare the run results. The results interpretation includes the classification model, confusion matrix, and detailed accuracy by class.

COMPARATIVE ANALYSIS OF CLASSIFICATION TECHNIQUES USING WEKA

Data Mining is the process of extracting interesting, non-trivial, implicit, previously unknown and potentially useful patterns or knowledge with the help of various techniques from various data sources. Classification is the process of finding a model that describes and distinguishes data classes or concepts. There exist several algorithms for classification in data mining, these algorithms have their strengths and weaknesses, and there is no single algorithm that is most suitable for all classes of data. This project is directed at evaluating the performance of three classification algorithms, i.e., decision tree algorithm, naïve bayes algorithm, and k-nearest Neighbour algorithm. Waikato Environment for Knowledge Analysis (WEKA) was used to analyze the algorithms; performance parameters include classification accuracy, error rate, execution time, confusion matrix, and area under the curve. Five datasets were used for the analysis, which are the Iris dataset, chronic kidney disease dataset, Breast cancer dataset, diabetes dataset, and hypothyroid dataset. The datasets were obtained from the UCI Machine Repository and split into training and testing; 60% 40% and 70% 30%. The decision tree algorithm was found to be more accurate than the naive bayes algorithm and K-NN algorithm. In terms of Execution time, K-NN outperforms naive bayes and decision trees on the five datasets. Moreover, K-NN has more percentage of error recorded on average on the five datasets. Therefore, no particular algorithm is best suited for a specific situation, the performance of classification algorithms depends on the type and size of datasets, i.e., one algorithm is more appropriate for one dataset while another algorithm is not appropriate for the same dataset.

Classification Based on Decision Tree Algorithm for Machine Learning

Journal of applied science and technology trends, 2021

Decision tree classifiers are regarded to be a standout of the most well-known methods to data classification representation of classifiers. Different researchers from various fields and backgrounds have considered the problem of extending a decision tree from available data, such as machine study, pattern recognition, and statistics. In various fields such as medical disease analysis, text classification, user smartphone classification, images, and many more the employment of Decision tree classifiers has been proposed in many ways. This paper provides a detailed approach to the decision trees. Furthermore, paper specifics, such as algorithms/approaches used, datasets, and outcomes achieved, are evaluated and outlined comprehensively. In addition, all of the approaches analyzed were discussed to illustrate the themes of the authors and identify the most accurate classifiers. As a result, the uses of different types of datasets are discussed and their findings are analyzed.

Evaluation of Various Classification Techniques of Weka Using Different Datasets

International Journal of Advance Research and Innovative Ideas in Education, 2016

In this paper we have compared various classification methods using UCI machine learning dataset under WEKA. We have used three measuring factors which names are Accuracy, kappa statistics and mean absolute error for execution by each technique is observed during experiment. This work has been carried out to make a performance evolution of J48, Multilayerperceptron, Naive Bayes and SMO classifier. On Account of this work we have used four type of secondary data.

Comparison of Different Datasets Using Various Classification Techniques with Weka

2014

Data Mining refers to mining or extracting knowledge from huge volume of data. Classification is used to classify each item in set of data into one of the predefined set of classes. In data mining, an important technique is classification, generally used in broad applications, which classifies various kinds of data. In this paper, different datasets from University of California, Irvine (UCI) are compared with different classification techniques. Each technique has been evaluated with respect to accuracy and execution time and performance evaluation has been carried out with J48, Simple CART (Classification and Regression Testing), and BayesNet and NaiveBayesUpdatable Classification algorithm.

Comparative Analysis of Decision Tree Classification Algorithms

At the present time, the amount of data stored in educational database is increasing swiftly. These databases contain hidden information for improvement of student's performance. Classification of data objects is a data mining and knowledge management technique used in grouping similar data objects together. There are many classification algorithms available in literature but decision tree is the most commonly used because of its ease of execution and easier to understand compared to other classification algorithms. The ID3, C4.5 and CART decision tree algorithms former applied on the data of students to predict their performance. But all these are used only for small data set and required that all or a portion of the entire dataset remain permanently in me mory. This limits their suitability for mining over large databases. This problem is solved by SPRINT and SLIQ decision tree algorithm. In serial implementation of SPRINT and SLIQ, the training data set is recursively partitioned using breadth-first technique. In this paper, all the algorithms are explained one by one. Performance and results are compared of all algorithms and evaluation is done by already existing datasets. All the algorithms have a satisfactory performance but accuracy is more witnessed in case of SPRINT algorithm.

A Comparative Study On Decision Tree Classification Algorithms In Data Mining

2008

In Data mining, Classification of objects based on their features into pre-defined categories is a widely studied problem with rigorous applications in fraud detection, artificial intelligence methods and many other fields. Among the various classification algorithms available in literature the decision tree is one of the most practical and effective methods and uses inductive learning. In this paper we reviewed various decision tree algorithms with their limitations and also we evaluated their performance with experimental analysis based on sample data.

Analysis of Various Decision Tree Algorithms for Classification in Data Mining

International Journal of Computer Applications, 2017

Today the computer technology and computer network technology has developed so much and is still developing with pace.Thus, the amount of data in the information industry is getting higher day by day. This large amount of data can be helpful for analyzing and extracting useful knowledge from it. The hidden patterns of data are analyzed and then categorized into useful knowledge. This process is known as Data Mining. [4].Among the various data mining techniques, Decision Tree is also the popular one. Decision tree uses divide and conquer technique for the basic learning strategy. A decision tree is a flow chart-like structure in which each internal node represents a "test" on an attribute where each branch represents the outcome of the test and each leaf node represents a class label. This paper discusses various algorithms of the decision tree (ID3, C4.5, CART), their features, advantages, and disadvantages.

A Survey on Decision Tree Algorithm for Classification

INTERNATIONAL JOURNAL OF ENGINEERING DEVELOPMENT AND RESEARCH (IJEDR) (ISSN:2321-9939), 2014

"Data mining is the process of discovering or extracting new patterns from large data sets involving methods from statistics and artificial intelligence. Classification and prediction are the techniques used to make out important data classes and predict probable trend .The Decision Tree is an important classification method in data mining classification. It is commonly used in marketing, surveillance, fraud detection, scientific discovery. As the classical algorithm of the decision tree ID3, C4.5, C5.0 algorithms have the merits of high classifying speed, strong learning ability and simple construction. However, these algorithms are also unsatisfactory in practical application. When using it to classify, there does exists the problem of inclining to choose attribute which have more values, and overlooking attributes which have less values. This paper provides focus on the various algorithms of Decision tree their characteristic, challenges, advantage and disadvantage. Keywords- Decision tree algorithms, ID3, C4.5, C5.0, classification techniques "

Implementation and Comparison of Decision Tree Based Algorithms

Machine learning is often used to predict outcomes of events by training itself or learning from a dataset. Decision trees is one of the commonly used method to represent those outcomes. The common algorithms used are, ID3, C4.5 and CART. ID3 and C4.5 algorithms were introduced by J.R Quinlan while CART was developed by Leo Breiman, et al. While CART is based on Gini impurity, ID3 and C4.5 are based on Information gain and Gain ratio respectively. In this paper we have implemented the mentioned algorithms using Python and compared these three algorithms on the basis of accuracy, time taken and according to nature of attributes.