Classifiers' Accuracy Prediction based on Data Characterization, Multimedia Analysis and Data Mining Competence Center German Research Center for Artificial Intelligence (DFKI GmbH (original) (raw)

Accuracy Measures for the Comparison of Classifiers

2011

The selection of the best classification algorithm for a given dataset is a very widespread problem. It is also a complex one, in the sense it requires to make several important methodological choices. Among them, in this work we focus on the measure used to assess the classification performance and rank the algorithms. We present the most popular measures and discuss their properties. Despite the numerous measures proposed over the years, many of them turn out to be equivalent in this specific case. They can also lead to interpretation problems and be unsuitable for our purpose. Consequently, the classic overall success rate or marginal rates should be preferred for this specific task.

A Comparative Framework for Evaluating Classification Algorithms

Data mining methods have been widely used for extracting precious knowledge from large amounts of data. Classification algorithms are the most popular models. The model is selected with respect to its classification accuracy; therefore, the performance of each classifier plays a very crucial role. This paper discusses the application of some classification models on multiple datasets and compares the accuracy of the results. The relationship between dataset characteristics and accuracy is also debated, and finally, a regression model is introduced for predicting the classifier accuracy on a given dataset.

Accuracy Prediction Using Analysis Methods and F-Measures

Journal of Physics: Conference Series

Accuracy prediction is basically used in machine learining for evaluating the accuracy of data to get better results during analysis of data for various purposes like financial analysis, credit card fraud detection and sales prediction. Predicting the accuracy of data is necessary for making better decisions in field of business, engineering, medical science and analytics. We introduce a methodology for analysis that improves the accuracy of data while ensuring that the performance of the algorithm also improves so that it improves decision making so that it can be used in real world applications. The analysis involves three phases, first is product analysis phase which involves product analysis and SWOT analysis. Then comes analysis phase where we use various techniques like Straight line method of depreciation, moving average technique, simple linear regression and multiple linear regression. These methods are used for analyzing the trend in data and for comparison. Then comes the next phase where we calculate accuracy and find optimal value. For that we first add more data, then we select essential features for getting accurate results. For that we use multiple algorithms. Multiple algorithms basically consists of algorithms that are used for clustering, classification and comparison. These algorithms are used for creating a better machine learning model by using ensemble method. Ensemble method is basically a method of combining various weak algorithms to create a more accurate algorithm that gives better performance. For checking and performance and getting an accurate value we use Algorithm tuning. Algorithm tuning is used for getting an improved algorithm that gives less error percentage is assists in making predictions. This gives an accurate and optimized model for training the data.

Empirical Evaluation of Classifiers' Performance Using Data Mining Algorithm

2015

The field of data mining and knowledge discovery in databases (KDD) has been growing in leaps and bounds, and has shown great potential for the future[10]. Data classification is an important task in KDD (knowledge discovery in databases) process. It has several potential applications. The performance of a classifier is strongly dependent on the learning algorithm. In this paper, we describe our experiment on data classification considering several classification models. We tabulate the experimental results and present a comparative analysis thereof. Key word- Knowledge discovery in databases, classifier, data classification.

Classifier quality definition on the basis of the estimation calculation approach

The generalized problem of classifiers training, using training sets has been considered. The accents on the principle moments, reasoning the classifiers overtraining, the ways of impact estimation of these effects degree and the methods of their control have been made. The approach, giving the possibility to estimate the effect of training set reduction on the basis of probabilistically combinatorial method has been considered. The training set reduction effect estimation gives the possibility to establish the redundancy of the training data and define the relative of the negative training objects such as spikes and spurious objects.

Using accuracy analysis to find the best classifier for Intelligent

An Intelligent Personal Assistant (IPA) is an agent that has the purpose of helping the user with his daily tasks. This paper is focused on IPAs for Internet of Things (IoT) environments. In this sense, a good IPA has the capability of surveying his user behaviour and suggest tasks or make decisions with the intention of simplifying the user interaction with his surroundings. With this in mind, this paper focuses on studying the accuracy of various classifiers, with the objective of finding the one that suits better the needs of an IPA for IoT. The aim is to test each algorithm with a dataset of events, that relate to past behaviours of the user, and find if there is an opportunity to notify the user that he/she may want to take an action or create an automation based on the learned behaviour.

Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation

Commonly used evaluation measures including Recall, Precision, F-Measure and Rand Accuracy are biased and should not be used without clear understanding of the biases, and corresponding identification of chance or base case levels of the statistic. Using these measures a system that performs worse in the objective sense of Informedness, can appear to perform better under any of these commonly used measures. We discuss several concepts and measures that reflect the probability that prediction is informed versus chance. Informedness and introduce Markedness as a dual measure for the probability that prediction is marked versus chance. Finally we demonstrate elegant connections between the concepts of Informedness, Markedness, Correlation and Significance as well as their intuitive relationships with Recall and Precision, and outline the extension from the dichotomous case to the general multi-class case.