Data Stream Mining Algorithms in Big Data: A Survey (original) (raw)
The infrastructure build in the big data platform is reliable to challenge the commercial and non-commercial IT development communities of data streams in high dimensional data cluster modeling. The APSO ie., Accelerated Particle Swarm Optimization is a technique which commonly known for data's are sourced to accumulate their continuation in the batch model induction algorithms which is not feasible for the real time applications[8]. In this project, a new technique has been introduced ie., supervised machine learning methods for developing dynamic resource allocation which targets a user defined learning method to identify the workload patterns and also feature selection is used to process the loaded data in the searched space to form the subset of the optimal solution in size to interact their demands in computation. The main theme of this project is to feed up the data in a lightweight feature selection and to designed the streaming data by using APSO, which enables the swarm search layered forms related query dependent performance in the process scheduling and data accuracy in the iterative manner. Thus the Big data in APSO are put under the test of new feature selection algorithm for performance evaluation. I. INTRODUCTION Recently a lot of news in the media advocates the hype of Big Data that are manifested in three problematic issues. They are the 3V challenges known as: Velocity problem that gives rise to a huge amount of data to be handled at an escalating high speed; Variety problem that makes data processing and integration difficult because the data come from various sources and they are formatted differently; and Volume problem that makes storing, processing, and analysis over them both computational and archiving challenging. In views of these 3V challenges, the traditional data mining approach which are based on the full batch-mode learning may run short in meeting the demand of analytic efficiency. That is simply because the traditional data mining model construction techniques require loading in the full set of data, and then the data are partitioned according to some divide-and-conquer strategy; two classical algorithms are Classification And Regression Tree algorithm (CART) for decision tree induction and Rough-set discrimination[8]. Each time when fresh data arrive, which is typical in the data collection process that makes the big data inflate to bigger data, the traditional induction method needs to rerun and the model that was built needs to be built again with the inclusion of new data.
Sign up for access to the world's latest research.
checkGet notified about relevant papers
checkSave papers to use in your research
checkJoin the discussion with peers
checkTrack your impact
This document is currently being converted. Please check back in a few minutes.