A Novel Fuzzy Clustering Method for Outlier Detection in Data Mining (original) (raw)
Related papers
New outlier detection method based on fuzzy clustering
WSEAS Transactions on …, 2010
In this paper, a new efficient method for outlier detection is proposed. The proposed method is based on fuzzy clustering techniques. The c-means algorithm is first performed, then small clusters are determined and considered as outlier clusters. Other outliers are then determined based on computing differences between objective function values when points are temporarily removed from the data set. If a noticeable change occurred on the objective function values, the points are considered outliers. Test results were performed on different well-known data sets in the data mining literature. The results showed that the proposed method gave good results.
Fuzzy clustering-based approach for outlier detection
2010
Outlier detection is an important task in a wide variety of application areas. In this paper, a proposed method based on fuzzy clustering approaches for outlier detection is presented. We first perform the c-means fuzzy clustering algorithm. Small clusters are then determined and considered as outlier clusters. The rest of outliers (if any) are then detected in the remaining clusters based on temporary removing a point from the data set and recalculating the objective function. If a noticeable change occurred in the Objective Function (OF), the point is considered an outlier. Experimental results show that our method works well. The test results show that the proposed approach gave good results when applied to different data sets.
A New Fuzzy Clustering by Outliers
Journal of Engineering and Applied Sciences
Abstract: This study presents a new approach for partitioning data sets affected by outliers. The proposed scheme consists of two main stages. The first stage is a preprocessing technique that aims to detect data value to be outliers by introducing the notion of object's proximity degree. The second stage is a new procedure based on the Fuzzy C-Means (FCM) algorithm and the concept of outliers clusters. It consists to introduce clusters for outliers in addition to regular clusters. The proposed algorithm initializes their centers by the detected possible outliers. Final and accurate decision is made about these possible outliers during the process. The performance of this approach is also illustrated through real and artificial examples.
A Review Paper on Comparison of Clustering Algorithms based on Outliers
Data mining, in general, deals with the discovery of non-trivial, hidden and interesting knowledge from different types of data. With the development of information technologies, the number of databases, as well as their dimension and complexity, grow rapidly. It is necessary what we need automated analysis of great amount of information. The analysis results are then used for making a decision by a human or program. One of the basic problems of data mining is the outlier detection. The outlier detection problem in some cases is similar to the classification problem. For example, the main concern of clustering-based outlier detection algorithms is to find clusters and outliers, which are often regarded as noise that should be removed in order to make more reliable clustering. In this thesis, the ability to detect outliers can be improved using a combined perspective from outlier detection and cluster identification. In proposed work comparison of four methods will be done like K-Mean, k-Mediods, Iterative k-Mean and density based method. Unlike the traditional clustering-based methods, the proposed algorithm provides much efficient outlier detection and data clustering capabilities in the presence of outliers, so comparison has been made. The purpose of our method is not only to produce data clustering but at the same time to find outliers from the resulting clusters. The goal is to model an unknown nonlinear function based on observed input-output pairs. The whole simulation of this proposed work has been taken in MATLAB environment.
Outlier Reduction using Hybrid Approach in Data Mining
International Journal of Modern Education and Computer Science, 2015
The Outlier detection is very active area of research in data mining where outlier is a mismatched data in dataset with respect to the other available data. In existing approaches the outlier detection done only on numeric dataset. For outlier detection if we use clustering method , then they mainly focus on those elements as outliers which are lying outside the clusters but it may possible that some of the unknown elements with any possible reasons became the part of the cluster so we have to concentrate on that also. The Proposed method uses hybrid approach to reduce the number of outliers. The number of outlier can only reduce by improving the cluster formulation method. The proposed method uses two data mining techniques for cluster formulation i.e. weighted k-means and neural network where weighted kmeans is the clustering technique that can apply on text and date data set as well as numeric data set. Weighted kmeans assign the weights to each element in dataset. The output of weighted k-means becomes the input for neural network where the neural network is the classification and clustering technique of data mining. Training is provided to the neural network and according to that neurons performed the testing. The neural network test the cluster formulated by weighted k-means to ensure that the clusters formulated by weighted k-means are group accordingly. There is lots of outlier detection methods present in data mining. The proposed method use Integrating Semantic Knowledge (SOF) for outlier detection. This method detects the semantic outlier where the semantic outlier is a data point that behaves differently with other data points in the same class or cluster. The main motive of this research work is to reduce the number of outliers by improving the cluster formulation methods so that outlier rate reduces and also to decrease the mean square error and improve the accuracy. The simulation result clearly shows that proposed method works pretty well as it significantly reduces the outlier.
An Analysis of Outlier Detection through clustering method
This research paper deals with an outlier which is known as an unusual behavior of any substance present in the spot. This is a detection process that can be employed for both anomaly detection and abnormal observation. This can be obtained through other members who belong to that data set. The deviation present in the outlier process can be attained by measuring certain terms like range, size, activity, etc. By detecting outlier one can easily reject the negativity present in the field. For instance, in healthcare, the health condition of a person can be determined through his latest health report or his regular activity. When found the person being inactive there may be a chance for that person to be sick. Many approaches have been used in this research paper for detecting outliers. The approaches used in this research are 1) Centroid based approach based on K-Means and Hierarchical Clustering algorithm and 2) through Clustering based approach. This approach may help in detecting outlier by grouping all similar elements in the same group. For grouping, the elements clustering method paves a way for it. This research paper will be based on the above mentioned 2 approaches.
An Outlier Detection Method Based on Clustering
2011 Second International Conference on Emerging Applications of Information Technology, 2011
In this paper we propose a clustering based method to capture outliers. We apply K-means clustering algorithm to divide the data set into clusters. The points which are lying near the centroid of the cluster are not probable candidate for outlier and we can prune out such points from each cluster. Next we calculate a distance based outlier score for remaining points. The computations needed to calculate the outlier score reduces considerably due to the pruning of some points. Based on the outlier score we declare the top n points with the highest score as outliers. The experimental results using real data set demonstrate that even though the number of computations is less, the proposed method performs better than the existing method.
Two novel outlier detection approaches based on unsupervised possibilistic and fuzzy clustering
PeerJ Computer Science
Outliers are data points that significantly deviate from other data points in a data set because of different mechanisms or unusual processes. Outlier detection is one of the intensively studied research topics for identification of novelties, frauds, anomalies, deviations or exceptions in addition to its use for data cleansing in data science. In this study, we propose two novel outlier detection approaches using the typicality degrees which are the partitioning result of unsupervised possibilistic clustering algorithms. The proposed approaches are based on finding the atypical data points below a predefined threshold value, a possibilistic level for evaluating a point as an outlier. The experiments on the synthetic and real data sets showed that the proposed approaches can be successfully used to detect outliers without considering the structure and distribution of the features in multidimensional data sets.
Outlier Removal Approach as a Continuous Process in Basic K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology, 2014
Clustering technique is used to put similar data items in a same group. K-mean clustering is a commonly used approach in clustering technique which is based on initial centroids selected randomly. However, the existing method does not consider the data preprocessing which is an important task before executing the clustering among the different database. This study proposes a new approach of k-mean clustering algorithm. Experimental analysis shows that the proposed method performs well on infectious disease data set when compare with the conventional kmeans clustering method.
An efficient clustering algorithm in the presence on outlier and doubtful data
2015
v ABSTRACT The presence of outlying observations is a common problem in most statistical analysis. This case is also true when using cluster analysis techniques. Cluster analysis basically detects homogeneous clusters with large heterogeneity among them. To deal with outliers, a correct procedure in cluster analysis is needed because usually outliers may appear joined together, which may lead to the wrong structure of clusters. New method of trimming in clustering (TCLUST) known as RTCLUST is proposed in this research that uses some information from TCLUST, partition around medoid (PAM), doubtful cluster and local outlier factor (LOF). TCLUST is a clustering method with constraint on the covariance matrices. For this case the constraint used was the eigenvalues. Spurious outlier model explains how to use the eigenvalues ratio, c for good clustering method. Good clustering is obtained using mean of discriminant. The value of c = 50 is obtained as a better value compared to the previo...