HRFuzzy: Holoentropy-enabled rough fuzzy classifier for evolving data streams (original) (raw)

HRNeuro-fuzzy: Adapting neuro-fuzzy classifier for recurring concept drift of evolving data streams using rough set theory and holoentropy

Journal of King Saud University - Computer and Information Sciences, 2016

Data stream classification plays a vital role in data mining techniques which extracts the most important patterns from the real world database. Nowadays, many applications like sensor network, video surveillance and network traffic generate a huge amount of data streams. Due to the ambiguity in input data, imprecise input information and concept drift, some problems arise in classifying the data stream. To resolve these problems, we propose a HRNeuro fuzzy system in this paper based on rough set theory and holoentropy function. At first, the input database is given to the PCA algorithm to reduce the dimension of the data. An adaptive neuro fuzzy classifier is utilized where the designing of membership function and rule base are the two important aspects. Then, neuro-fuzzy system undergoes updating when the change of detection occurs between the data streams. Here, the updating behaviour of membership function and rules are performed using rough set theory and holoentropy function. The experimental results are evaluated for the datasets and the performance is analysed by some metrics and compared with the existing systems such as JIT adaptive K-NN and HRFuzzy system. From the result, it is concluded that our proposed fuzzy classifier attains the higher accuracy of 96% which proves the efficient performance of data stream classification algorithm.

HRFuzzy Holoentropy enabled rough fuzzy classifier.pdf

Due to the continuous growth of recent applications such as, telecommunication, sensor data, financial applications, analyzing of data streams, conceptually endless sequences of data records, frequently arriving at high rates is important task in data mining. Among the various tasks involved in data mining, the classification of data streams poses various challenging issues as compared to popular algorithms of data classification. Since the classification algorithm performs endlessly, it must be able to adapt the classification model to handle the change of concept or boundaries between classes. In order to handle these issues, we have developed a new fuzzy system called, HRFuzzy for classification of evolving data streams. Here, rough set theory and holoentropy function are utilized to construct the dynamic classification model. In the fuzzy system, the rules are generated using k-means clustering and membership functions are dynamically updated using holoentropy function. The experimentation of the proposed HRFuzzy is performed using two different databases such as, skin segmentation dataset and localization data. The performance is compared with the adaptive k-NN classifier in terms of accuracy and time. From the outcome, we proved that the proposed HRFuzzy outperformed in both the metrics by giving the maximum performance.

THRFuzzy Tangential holoentropy enabled rough fuzzy classifier.pdf

The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is considered a vital process. The data analysis process consists of different tasks, among which the data stream classification approaches face more challenges than the other commonly used techniques. Even though the classification is a continuous process, it requires a design that can adapt the classification model so as to adjust the concept change or the boundary change between the classes. Hence, we design a novel fuzzy classifier known as THRFuzzy to classify new incoming data streams. Rough set theory along with tangential holoentropy function helps in the designing the dynamic classification model. The classification approach uses kernel fuzzy c-means (FCM) clustering for the generation of the rules and tangential holoentropy function to update the membership function. The performance of the proposed THRFuzzy method is verified using three datasets, namely skin segmentation, localization, and breast cancer datasets, and the evaluated metrics, accuracy and time, comparing its performance with HRFuzzy and adaptive k-NN classifiers. The experimental results conclude that THRFuzzy classifier shows better classification results providing a maximum accuracy consuming a minimal time than the existing classifiers.

HRNeurofuzzy Adapting neuro fuzzy classifier.pdf

Data stream classification plays a vital role in data mining techniques which extracts the most important patterns from the real world database. Nowadays, many applications like sensor network, video surveillance and network traffic generate a huge amount of data streams. Due to the ambiguity in input data, imprecise input information and concept drift, some problems arise in classifying the data stream. To resolve these problems, we propose a HRNeuro fuzzy system in this paper based on rough set theory and holoentropy function. At first, the input database is given to the PCA algorithm to reduce the dimension of the data. An adaptive neuro fuzzy classifier is utilized where the designing of membership function and rule base are the two important aspects. Then, neuro-fuzzy system undergoes updating when the change of detection occurs between the data streams. Here, the updating behaviour of membership function and rules are performed using rough set theory and holoentropy function. The experimental results are evaluated for the datasets and the performance is analysed by some metrics and compared with the existing systems such as JIT adaptive K-NN and HRFuzzy system. From the result, it is concluded that our proposed fuzzy classifier attains the higher accuracy of 96% which proves the efficient performance of data stream classification algorithm.

Classification model based on rough and fuzzy sets theory

… on Computational Intelligence Man-Machine Systems …, 2007

The paper reflects the trend of the past years which is based on the diffusion of various traditional approaches and methods to the way of tackling new problems. Two components of the computational intelligence are applied in a classification model. It means rough and fuzzy sets on the basis of which the data classification hybrid model is proposed. It even allows operating with uncertainty data. This model is carried out in MATLAB, and tested on more data files, and compared to others, already known classification methods.

Info-fuzzy algorithms for mining dynamic data streams

2008

Most data mining algorithms assume static behavior of the incoming data. In the real world, the situation is different and most continuously collected data streams are generated by dynamic processes, which may change over time, in some cases even drastically. The change in the underlying concept, also known as concept drift, causes the data mining model generated from past examples to become less accurate and relevant for classifying the current data. Most online learning algorithms deal with concept drift by generating a new model every time a concept drift is detected. On one hand, this solution ensures accurate and relevant models at all times, thus implying an increase in the classification accuracy. On the other hand, this approach suffers from a major drawback, which is the high computational cost of generating new models. The problem is getting worse when a concept drift is detected more frequently and, hence, a compromise in terms of computational effort and accuracy is needed. This work describes a series of incremental algorithms that are shown empirically to produce more accurate classification models than the batch algorithms in the presence of a concept drift while being computationally cheaper than existing incremental methods. The proposed incremental algorithms are based on an advanced decision-tree learning methodology called "info-fuzzy network" (IFN), which is capable to induce compact and accurate classification models. The algorithms are evaluated on real-world streams of traffic and intrusion detection data.

Complex Class Classification for Gradually Novel Classes in Data Stream Mining

Data Stream Mining is the process of extracting knowledge structures from continuous, rapid data records. Now a day's huge amount of data is processed & analyzed. So it is very important to classify data & information properly. The information is basically unstructured & continuous. So huge volume of continuous data which has multidimensional feature & often fast changing. It is required to construct model which adapt such changes & give fast response. Such information flow examples are network traffic, sensor data, call center records etc. Class evolution is now a day's important topic in data stream mining which handles such data. So in previous work proposed a model Class Based ensemble for Class evolution (CBCE) to maintain such a large amount of streams. But for complex & massive data result would be different. So complex class ensemble model (CCEM) is proposed for classification so huge & complex classes can be handled & classify & also proposed a model for class disappearance only so that more emphasize on class disappearance than class reoccurrence & novel class.

An Incremental Classifier from Data Streams

Lecture Notes in Computer Science, 2014

a novel evolving fuzzy rule-based classifier, namely parsimonious classifier (pClass), is proposed in this paper. pClass can set off its learning process either from scratch with an empty rule base or from an initially trained fuzzy model. Importantly, pClass not only adopts the open structure concept, where an automatic knowledge building process can be cultivated during the training process, which is well-known as a main pillar to learn from streaming examples, but also incorporates the so-called plug-and-play principle, where all learning modules are coupled in the training process, in order to diminish the requirement of pre-or post-processing steps, undermining the firm logic of the online classifier. In what follows, pClass is equipped with the rule growing, pruning, recall and input weighting techniques, which are fully performed on the fly in the training process. The viability of pClass has been tested exploiting real-world and synthetic data streams containing some sorts of concept drifts, and compared with state-of-the-art classifiers, where pClass can deliver the most encouraging numerical results in terms of the classification rate, number of fuzzy rule, number of rule base parameters and the runtime. 1 2 2 22 21 2 1 P J J J J = . More specifically, the input weight of a particular input attribute is elicited, when this input attributes is masked. This adds up to conceive the discrimination power of each input attribute, where the input weights are assigned as follow:

Evolving Ensemble Fuzzy Classifier

IEEE Transactions on Fuzzy Systems, 2018

the concept of ensemble learning offers a promising avenue in learning from data streams under complex environments because it better addresses the bias and variance dilemma than its single-model counterpart and features a reconfigurable structure, which is well-suited to the given context. While various extensions of ensemble learning for mining nonstationary data streams can be found in the literature, most of them are crafted under static base-classifier and revisit preceding samples in the sliding window for a retraining step. This feature causes computationally prohibitive complexity and is not flexible enough to cope with rapidly changing environments. Their complexities are often demanding because they involve a large collection of offline classifiers due to the absence of structural complexities reduction mechanisms and lack of an online feature selection mechanism. A novel evolving ensemble classifier, namely Parsimonious Ensemble (pENsemble), is proposed in this paper. pENsemble differs from existing architectures in the fact that it is built upon an evolving classifier from data streams, termed Parsimonious Classifier (pClass). pENsemble is equipped by an ensemble pruning mechanism, which estimates a localized generalization error of a base-classifier. A dynamic online feature selection scenario is integrated into the pENsemble. This method allows for dynamic selection and deselection of input features on the fly. pENsemble adopts a dynamic ensemble structure to output a final classification decision where it features a novel drift detection scenario to grow the ensemble's structure. The efficacy of the pENsemble has been numerically demonstrated through rigorous numerical studies with dynamic and evolving data streams where it delivers the most encouraging performance in attaining a tradeoff between accuracy and complexity.

An online rule weighting method to classify data streams

The 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP 2012), 2012

Evolving fuzzy rule-based structures represent extremely powerful methods for online classification of data streams. The fuzzy rules are generated, modified and removed automatically in these systems. One of the simplest but efficient algorithms of this type is evolving classifier (eClass) that constructs the rules without any prior knowledge, starting "from scratch". However, this algorithm cannot cope properly with drift and shift in the concept of data. In this paper, we propose a new efficient online method to increase the performance of this algorithm by setting a suitable weight for each rule and handle the drift and shift in the concept of data. By adjusting proper weights, the zone of influence of each rule can be easily controlled and changed regarding the restyling of the environment. Our weight adjusting algorithm is based on an efficient batch mode weight adjusting method that is developed to be consistent with characteristics of data streams. The proposed algorithm is evaluated on some standard data sets of UCI Repository and some real world data streams, and compared with the eClass algorithm. The results show that the proposed algorithm outperforms the eClass approach, and has significant improvement in most cases.