Classifiers' Accuracy Prediction based on Data Characterization, Multimedia Analysis and Data Mining Competence Center German Research Center for Artificial Intelligence (DFKI GmbH (original) (raw)

Accuracy Measures for the Comparison of Classifiers

2011

The selection of the best classification algorithm for a given dataset is a very widespread problem. It is also a complex one, in the sense it requires to make several important methodological choices. Among them, in this work we focus on the measure used to assess the classification performance and rank the algorithms. We present the most popular measures and discuss their properties. Despite the numerous measures proposed over the years, many of them turn out to be equivalent in this specific case. They can also lead to interpretation problems and be unsuitable for our purpose. Consequently, the classic overall success rate or marginal rates should be preferred for this specific task.

A Comparative Framework for Evaluating Classification Algorithms

Data mining methods have been widely used for extracting precious knowledge from large amounts of data. Classification algorithms are the most popular models. The model is selected with respect to its classification accuracy; therefore, the performance of each classifier plays a very crucial role. This paper discusses the application of some classification models on multiple datasets and compares the accuracy of the results. The relationship between dataset characteristics and accuracy is also debated, and finally, a regression model is introduced for predicting the classifier accuracy on a given dataset.

Accuracy Prediction Using Analysis Methods and F-Measures

Journal of Physics: Conference Series

Accuracy prediction is basically used in machine learining for evaluating the accuracy of data to get better results during analysis of data for various purposes like financial analysis, credit card fraud detection and sales prediction. Predicting the accuracy of data is necessary for making better decisions in field of business, engineering, medical science and analytics. We introduce a methodology for analysis that improves the accuracy of data while ensuring that the performance of the algorithm also improves so that it improves decision making so that it can be used in real world applications. The analysis involves three phases, first is product analysis phase which involves product analysis and SWOT analysis. Then comes analysis phase where we use various techniques like Straight line method of depreciation, moving average technique, simple linear regression and multiple linear regression. These methods are used for analyzing the trend in data and for comparison. Then comes the next phase where we calculate accuracy and find optimal value. For that we first add more data, then we select essential features for getting accurate results. For that we use multiple algorithms. Multiple algorithms basically consists of algorithms that are used for clustering, classification and comparison. These algorithms are used for creating a better machine learning model by using ensemble method. Ensemble method is basically a method of combining various weak algorithms to create a more accurate algorithm that gives better performance. For checking and performance and getting an accurate value we use Algorithm tuning. Algorithm tuning is used for getting an improved algorithm that gives less error percentage is assists in making predictions. This gives an accurate and optimized model for training the data.

Empirical Evaluation of Classifiers' Performance Using Data Mining Algorithm

2015

The field of data mining and knowledge discovery in databases (KDD) has been growing in leaps and bounds, and has shown great potential for the future[10]. Data classification is an important task in KDD (knowledge discovery in databases) process. It has several potential applications. The performance of a classifier is strongly dependent on the learning algorithm. In this paper, we describe our experiment on data classification considering several classification models. We tabulate the experimental results and present a comparative analysis thereof. Key word- Knowledge discovery in databases, classifier, data classification.

Classifier quality definition on the basis of the estimation calculation approach

The generalized problem of classifiers training, using training sets has been considered. The accents on the principle moments, reasoning the classifiers overtraining, the ways of impact estimation of these effects degree and the methods of their control have been made. The approach, giving the possibility to estimate the effect of training set reduction on the basis of probabilistically combinatorial method has been considered. The training set reduction effect estimation gives the possibility to establish the redundancy of the training data and define the relative of the negative training objects such as spikes and spurious objects.

Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation

2006

Different evaluation measures assess different characteristics of machine learning algorithms. The empirical evaluation of algorithms and classifiers is a matter of on-going debate between researchers. Although most measures in use today focus on a classifier's ability to identify classes correctly, we suggest that, in certain cases, other properties, such as failure avoidance or class discrimination may also be useful. We suggest the application of measures which evaluate such properties. These measures -Youden's index, likelihood, Discriminant power -are used in medical diagnosis. We show that these measures are interrelated, and we apply them to a case study from the field of electronic negotiations. We also list other learning problems which may benefit from the application of the proposed measures.

OINOS, an application suite for the performance evaluation of classifiers

2019

The last few years have been characterized by a big development of machine learning (ML) techniques, and their application has spread in many fields. The success of their use in a specific problem strongly depends on the approach used, the dataset formatting, and not only on the type of ML algorithm employed. Tools that allows the user to evaluate different classification approaches on the same problem, and their efficacy on different ML algorithms, are therefore becoming crucial. In this paper we present OINOS, a suite written in Python and Bash aimed to the evaluation of performances of different ML algorithms. This tool allows the user to face a classification problem with different classifiers and dataset formatting strategies, and to extract related performance metrics. The tool is presented and then tested on the classification of two diagnostic species from a public electroencephalography (EEG) database. The flexibility and ease of use of this tool allowed us to easily compar...

Reliability estimation of a statistical classifier

Pattern Recognition Letters, 2008

Pattern classification techniques derived from statistical principles have been widely studied and have proven powerful in addressing practical classification problems. In real-world applications, the challenge is often to cope with unseen patterns i.e., patterns which are very different from those examined during the training phase. The issue with unseen patterns is the lack of accuracy of the classifier output in the regions of pattern space where the density of training data is low, which could lead to a false classification output. This paper proposes a method for estimating the reliability of a classifier to cope with these situations. While existing methods for quantifying the reliability are often based on the class membership probability estimated on global approximations, the proposed method takes into account the local density of training data in the neighborhood of a test pattern. The calculations are further simplified by using the Gaussian mixture model (GMM) to calculate the local density of the training data. The reliability of a classifier output is defined in terms of a confidence interval on the class membership probability. The lower bound of a confidence interval or the local density of training data may be used to detect the unseen patterns. The effectiveness of the proposed method is demonstrated using real data sets and performance is compared with other reliability estimation methods.

EVALUATION OF PERFORMANCE MEASURES FOR CLASSIFIERS COMPARISON

ubicc.org

The selection of the best classification algorithm for a given dataset is a very widespread problem, occuring each time one has to choose a classifier to solve a real-world problem. It is also a complex task with many important methodological decisions to make. Among those, one of the most crucial is the choice of an appropriate measure in order to properly assess the classification performance and rank the algorithms. In this article, we focus on this specific task. We present the most popular measures and compare their behavior through discrimination plots. We then discuss their properties from a more theoretical perspective. It turns out several of them are equivalent for classifiers comparison purposes. Futhermore. they can also lead to interpretation problems. Among the numerous measures proposed over the years, it appears that the classical overall success rate and marginal rates are the more suitable for classifier comparison task.