The Use of Decision Threshold Adjustment in Classification (original) (raw)

The Use of Decision Threshold Adjustment in Classification for Cancer Prediction

Standard classification algorithms are generally designed to maximize the number of correct predictions (concordance). The criterion of maximizing the concordance may not be appropriate in certain applications. In practice, some applications may emphasize high sensitivity (e.g., clinical diagnostic tests) and others may emphasize high specificity (e.g., epidemiology screening studies). This paper considers effects of the decision threshold on sensitivity, specificity, and concordance for four classification methods: logistic regression, classification tree, Fisher's linear discriminant analysis, and a weighted k-nearest neighbor. We investigated the use of decision threshold adjustment to improve performance of either sensitivity or specificity of a classifier under specific conditions. We conducted a Monte Carlo simulation showing that as the decision threshold increases, the sensitivity decreases and the specificity increases; but, the concordance values in an interval around the maximum concordance are similar. For specified sensitivity and specificity levels, an optimal decision threshold might be determined in an interval around the maximum concordance that meets the specified requirement.

The Optimal Cut-off for Maximizing the Proportion of Correct Classification

2011

Introduction Recently, the binary logistic regression model has been used to predict the qualitative dependent variable into one of the two categories for the interesting characteristic using independent variables as predictors. Almost all of researches, conducted to classify individual subject or object into one of the two categories, have used the cut-off point equal to 0.5 to minimize classification error rate. For instance, a researcher in finance might be interested in using a group of financial distress to predict a chance of corporate acquisitions, a medical researcher would like to predict a chance of Down’s syndrome in young pregnant women using maternal serum biomarkers, cardiologist would like to predict the disappearance of left Atrial Thrombi among cardiovasculardisease patients, and a pediatrician would like to predict the outcome of pregnancies of unknown location . Almost all of problems or researches, conducted to predict one of the two categories using a group of i...

Performance and Interpretation of Classification Models

arXiv (Cornell University), 2015

Classification is a common statistical task in many areas. In order to ameliorate the performance of the existing methods, there are always some new classification procedures proposed. These procedures, especially those raised in the machine learning and data-mining literature, are usually complicated, and therefore extra effort is required to understand them and the impacts of individual variables in these procedures. However, in some applications, for example, pharmaceutical and medical related research, future developments and/or research plans will rely on the interpretation of the classification rule, such as the role of individual variables in a diagnostic rule/model. Hence, in these kinds of research, despite the optimal performance of the complicated models, the model with the balanced ease of interpretability and satisfactory performance is preferred. The complication of a classification rule might diminish its advantage in performance and become an obstacle to be used in those applications. In this paper, we study how to improve the classification performance, in terms of area under the receiver operating characteristic curve of a conventional logistic model, while retaining its ease of interpretation. The proposed method increases the sensitivity at the whole range of specificity and hence is especially useful when the performance in the high-specificity range of a receiver operating characteristic curve is of interest. Theoretical justification is presented, and numerical results using both simulated data and two real data sets are reported.

ON ASSESSING EFFICIENCY OF AN ALTERNATIVE CLASSIFIER THROUGH SENSITIVITY AND SPECIFICITY

Journal of Mathematical Sciences, 2017

We examine the efficiency of a competing classifier through sensitivity and specificity, utilizing a Monte Carlo Study. We observed that when sensitivity or specificity or both are low, the efficiency of such classifier is poor and not desirable. We found that even with large sample size empirical efficiency does not show any appreciable difference. Our results suggest that estimation of efficiency is not good when we have small sample sizes ( 30). We found that if the sensitivity or specificity or both are high ( 0.75), such classifier have good efficiency. This is slightly more relaxed than the results by other researchers where sensitivity and specificity of .80 or higher was recommended.

An Empirical Evaluation of the Classification Error of Two Thresholding Methods for Fisher's Classifier

International Conference on Artifical Intelligence and …

In this paper, we empirically analyze two methods for computing the threshold in Fisher's classifiers. One of these methods, which we call FC ∧ or traditional Fisher's classifier, obtains the threshold by computing the middle point between the two means in the projected space. The second method, which we call FC + , obtains the threshold by computing the optimal classifier in the transformed space. We conduct the analysis in widely used public datasets for cancer detection and protein classification. The empirical results show that FC + leads to smaller classification error than FC ∧ . The results on Cancer data demonstrate that minimizing the classification error in the transformed space leads to smaller classification error in the original multi-dimensional space. As opposed to this, the results on protein classification show that selecting the theshold that minimizes the error in the transformed space, assuming the data is normally distributed, does not necessarily lead to the best classifier.

Accuracy, Sensitivity and Specificity Measurement of Various Classification Techniques on Healthcare Data

Healthcare industry is a type of industry, where the data is very large and sensitive. The data is required to be handled very carefully without any mismanagement. There are various data mining techniques that have been used in healthcare industry but the research that has to be done now is on the performance of the various classification techniques. So that amongst the all, the best on can be chosen. In this paper, we aim to consider the accuracy percentage, sensitivity percentage and specificity percentage to provide a result.

Comparison of different classification algorithms in clinical decision-making

Expert Systems, 2007

This paper gives an integrated view of implementing automated diagnostic systems for clinical decision-making. Because of the importance of making the right decision, better classification procedures are necessary for clinical decisions. The major objective of the paper is to be a guide for readers who want to develop an automated decision support system for clinical practice. The purpose was to determine an optimum classification scheme with high diagnostic accuracy for this problem. Several different classification algorithms were tested and benchmarked for their performance. The performance of the classification algorithms is illustrated on two data sets: the Pima Indians diabetes and the Wisconsin breast cancer. The present research demonstrates that the support vector machines achieved diagnostic accuracies which were higher than those of other automated diagnostic systems.

The effect of threshold values on association rule based classification accuracy

2007

Classification Association Rule Mining (CARM) systems operate by applying an Association Rule Mining (ARM) method to obtain classification rules from a training set of previously classified data. The rules thus generated will be influenced by the choice of ARM parameters employed by the algorithm (typically support and confidence threshold values). In this paper we examine the effect that this choice has on the predictive accuracy of CARM methods.

Two New Parameters Based on Distances in a Receiver Operating Characteristic Chart for the Selection of Classification Models

Journal of Chemical Information and Modeling, 2011

Traditionally, the techniques used to measure the performance of a discriminant analysis QSAR model are derived from Wilk's lambda or by indices based on the confusion matrix. 1À5 It is well-known that the choice of the Wilk's lambda-based model involves the same problems of predictive choosing the best regression model based only on the determination coefficient R 2 . 6 Among the indices based on confusion matrix, the accuracy, sensitivity, specificity, precision, enrichment factor, and Matthews correlation coefficient should be remarked. 7 In the field of toxicology, Benigni et al. 8 quite successfully introduced the Receiver Operating Characteristic (ROC) chart where the true positive rate (or sensitivity) is plotted against the false positive rate (1-specificity). This chart has the advantage of comparing simultaneously the different aspects of the performance of several systems or models. 8 It has been observed too that ROC curves visually convey the same information as the confusion matrix in a much more intuitive and robust fashion. The Area under the ROC curve (AUC) can be directly computed for any classification model that attaches a probability, like discriminant analysis 10À14 and is also widely used in many disciplines. 15À20 The AUC is the probability of active compounds being ranked earlier than decoy compounds, and it can take values between 1 (perfect classifiers) and 0.5 (useless random classifiers). This AUC metric parameter is not sensitive to early recognition (quickly ability to recognize positives). Truchon and Bayly 21 have discussed several methods to address this problem using the parameter named Boltzmann-Enhanced Discrimination of ROC (BEDROC) based on Robust Initial Enhancement (RIE), 22 which provided a good early recognition of actives. Recently, McGaughey et al. used the Enrichment Factor (EF), 24 the AUC, the RIE and the BEDROC parameters to evaluate different virtual screening (VS) methods. The RIE and BEDROC ABSTRACT: There are several indices that provide an indication of different types on the performance of QSAR classification models, being the area under a Receiver Operating Characteristic (ROC) curve still the most powerful test to overall assess such performance. All ROC related parameters can be calculated for both the training and test sets, but, nevertheless, neither of them constitutes an absolute indicator of the classification performance by themselves. Moreover, one of the biggest drawbacks is the computing time needed to obtain the area under the ROC curve, which naturally slows down any calculation algorithm. The present study proposes two new parameters based on distances in a ROC curve for the selection of classification models with an appropriate balance in both training and test sets, namely the following: the ROC graph Euclidean distance (ROCED) and the ROC graph Euclidean distance corrected with Fitness Function (FIT(λ)) (ROCFIT). The behavior of these indices was observed through the study on the mutagenicity for four genotoxicity end points of a number of nonaromatic halogenated derivatives. It was found that the ROCED parameter gets a better balance between sensitivity and specificity for both the training and prediction sets than other indices such as the Matthews correlation coefficient, the Wilk's lambda, or parameters like the area under the ROC curve. However, when the ROCED parameter was used, the follow-on linear discriminant models showed the lower statistical significance. But the other parameter, ROCFIT, maintains the ROCED capabilities while improving the significance of the models due to the inclusion of FIT(λ).

A risk-based comparison of classification systems

Signal Processing, Sensor Fusion, and Target Recognition XVIII, 2009

Performance measures for families of classification system families that rely upon the analysis of receiver operating characteristics (ROCs), such as area under the ROC curve (AUC), often fail to fully address the issue of risk, especially for classification systems involving more than two classes. For the general case, we denote matrices of class prevalences, costs, and class-conditional probabilities, and assume costs are subjectively fixed, acceptable estimates for expected values of class-conditional probabilities exist, and mutual independence between a variable in one such matrix and those of any other matrix.