An Analysis on the Performance of a Classification based Outlier Detection System using Feature Selection (original) (raw)

An Analysis on the Performance of a K-Nearest-Neighbor Classification Based Outlier Detection System using Feature Selection and Dimensionality Reduction Techniques

The general idea of classification-based outlier detection method is to train a classification model that can distinguish normal data from outliers. In the previous work, we have implemented and evaluated three classification based outlier detection algorithms and found that the k-neighborhood algorithm was capable of identifying and classifying the outliers better than the other two compared algorithm in terms of accuracy, f-score, Sensitivity/Recall, error rate. Further, the cpu time of the k-neighborhood algorithm also minimum. In this work, the performance of outlier detection is evaluated using dimensionality reduction algorithms. The results clearly shows that the impact of dimensionality reduction algorithm on the cancer dataset is significantly improved the overall classification performance to a considerable level.

Improving the Performance of a Classification Based Outlier Detection System Using Dimensionality Reduction Techniques

International Journal of Advanced Research in Computer Science, 2017

In Data mining, outlier detection can be treated as a classification problem with the availability of training data set with class labels. It is possible to apply a classification based outlier detection method if the samples of cancer data set available with class information. The general idea of classification-based outlier detection method is to train a classification model that can distinguish normal data from outliers [7]. The previous work shows that some of the classification based outlier detection algorithms provide better sensitivity and some others provide better specificity. By combining the better part of these classification algorithms, a hybrid classification algorithm can design. This work proposed a KNN-DT hybrid classification algorithm and evaluated the performance of outlier detection. The results clearly show that the impact of such hybridizing significantly improved the overall classification performance to a considerable level.

AN EVALUATION OF CLUSTER BASED OUTLIER DETECTION STRATEGY BY FEATURE SELECTION TECHNIQUE IN DIABETES DATA SET

Detection of Outliers based on clusters is an important task on the field of data mining research. In this proposed work, feature selection method used to reduce an irrelevant data points and eliminating redundancy of data instances before clustering .After clustering data elements, outliers are identified and discarded based on threshold value. Genetic algorithm (GA) is used to extract the large amount of data sets into relevant attribute and finding the optimal set of parameters for clustering process. Features selected from biomedical data can be more essential in disease diagnostics and there are number of features that can be tested. The research work proposed both Euclidean and mahalonobis distance for identifying outliers. And outlier rejection is required during clustering process is absolutely necessary for avoiding life losses and improving efficiency of diagnostic works. Pima Indian diabetes data sets are taken from UCI machine learning repository. Various Experiments are conducted and compared in proposed method for selecting the relevant subsets, clustering and Outliers removal with less computational time.

Improving the Performance of a Classification Based Outlier Detection System using KNN-DT Hybrid Algorithm

2017

In Data mining, outlier detection can be treated as a classification problem with the availability of training data set with class labels. It is possible to apply a classification based outlier detection method if the samples of cancer data set available with class information. The general idea of classification-based outlier detection method is to train a classification model that can distinguish normal data from outliers [7]. The previous work shows that some of the classification based outlier detection algorithms provide better sensitivity and some others provide better specificity. By combining the better part of these classification algorithms, a hybrid classification algorithm can design. This work proposed a KNN-DT hybrid classification algorithm and evaluated the performance of outlier detection. The results clearly show that the impact of such hybridizing significantly improved the overall classification performance to a considerable level.

A Performance Analysis of the Innovative Methods Employed for Outlier Detection using Data Mining Algorithms with Three Different Applications

2016

Data Mining simply refers to the mining of very interesting patterns of the data from the massive data sets. Outlier detection is one of the important characteristics of data mining. It is a task that finds objects that are considerably dissimilar, incomparable or inconsistent with respect to the remaining data. Outlier detection has wide applications which include data analysis, network intrusion detection, financial fraud detection, and clinical diagnosis of diseases. This paper proposes three outlier detection models such as OFWDT (Outlier Finding with Decision Tree), OFWNB (Outlier Finding with Naïve Bayes) and OFWQR (Outlier Finding With Quartile Range) with three different applications. OFWDT model has three steps of a process. In the first step, groups the data in to number of clusters using Farthest First clustering algorithm. Due to minimize the size of dataset, the computation time reduced greatly.In the second step, outliers are detected from wisconsin breast cancer datas...

A hybrid filter based outlier detection machine learning models on medical databases

2020

Feature selection techniques play a vital role in the real-time medical databases. Since, most of the medical databases contain high dimensionality and large data size, it is difficult to find an essential key feature using traditional feature sub-set selection approaches. Also, conventional medical data filtering techniques fail to find the essential outliers due to large data size and feature space. In this work, a hybrid outlier detection and data transformation approaches are implemented to remove the noise in the medical databases. Proposed data filtering module is applicable to high dimensional data size and feature space for classification problem. Experimental results are simulated on different medical datasets such as tonsil and trauma databases with different feature space size and data size. Simulation results proved that the proposed outlier detection approach has better noise detection rate than the conventional approaches.

Outlier detection in test samples and supervised training set selection

2021

‎Outlier detection is a technique for recognizing samples out of the main population within a data set‎. ‎Outliers have negative impacts on classification‎. ‎The recognized outliers are deleted to improve the classification power generally‎. ‎This paper proposes a method for outlier detection in test samples besides a supervised training set selection‎. ‎Training set selection is done based on the intersection of three well known similarity measures namely‎, ‎jacquard‎, ‎cosine‎, ‎and dice‎. ‎Each test sample is evaluated against the selected training set for possible outlier detection‎. ‎The selected training set is used for a two-stage classification‎. ‎The accuracy of classifiers are increased after outlier deletion‎. ‎The majority voting function is used for further improvement of classifiers‎.

Comparative Analysis with Implementation of Cluster Based, Distance Based and Density Based Outlier Detection Techniques Using Different Healthcare Datasets

Outliers is view as an error data in information which is turned into important crisis that has been investigated in various areas of study plus functional fields. Several outlier detection methods have been implemented to assured functional fields, whereas several methods are supplementary basic. Various functional areas are also investigated in severe privacy like study on offense as well as terrorist behaviors. Through the improvement in information skills, the numeral of records, plus their measurement as well as difficulty, raise fast, that outcome in the need of computerized examination of huge quantity of various ordered data. For this intention, different data mining systems are utilized. The objective of these types of systems is to detect unseen dependencies from the records. Outlier detection in data mining is the detection of objects, remarks or observations that doesn't match to a predictable sample in a set of record. This detection technique is more beneficial in the several areas such as health trade, offense finding, fake operation, community protection and so on. In this paper we have studied different outlier detection algorithms such as Cluster based outlier detection, Distance based outlier detection plus Density based outlier detection. Result experimentation is done on different four dataset to identify the outliers and the comparative result shows that the cluster based methods are efficient for calculation of clusters and density-based outlier detection algorithm offers improved accuracy and faster execution for identification of outliers than other two outlier detection algorithm.

Advanced Filter Based Machine Learning Models on Clinical Databases for Outlier Detection

Revista Gestão Inovação e Tecnologias, 2021

Feature selection approaches are used to improve the efficiency of the clinical databases in the machine learning classification. Since, most of the conventional feature selection and classification approaches are difficult to handle high dimensionality for pattern evaluation. Also these models are difficult to filter noise on different heterogeneous features. In this work, a hybrid data transformation and outlier detection methods are developed on the clinical databases to improve the classification accuracy. Experimental results show that the present model has better accuracy in evaluating the accuracy than the conventional models on clinical databases.

Local Kernel Density Ratio-Based Feature Selection for Outlier Detection

2012

Abstract Selecting features is an important step of any machine learning task, though most of the focus has been to choose features relevant for classification and regression. In this work, we present a novel non-parametric evaluation criterion for filter-based feature selection which enhances outlier detection. Our proposed method seeks the subset of features that represents the inherent characteristics of the normal dataset while forcing outliers to stand out, making them more easily distinguished by outlier detection algorithms.