Parallel and Distributed Data Mining: The Fastweka Tool (original) (raw)
Data mining refers to the process of extract useful information and knowledge from a given data set, using statistic techniques and machine learning algorithms. Due to the huge size of data and amount of computation involved in data mining, it is very difficult, using current data mining tools, for a single computer to efficiently deal with large data. In this scenario, parallel computers and distributed systems can be used to speed up the data mining process. This paper presents the FastWeka, a tool for speedup data mining tasks, using multicore computers and a peer-to-peer system as computing platform. By exploiting the inherent parallelism of the data mining cross-validation phase (using k-fold technique), Fastweka can achieve an improvement in the speed of data mining. Aiming to evaluate the tool, a forest cover dataset composed of 55 attributes and 581,012 records was considered as input of data mining algorithms. The computing times obtained when using FastWeka reveals a spe...
Sign up for access to the world's latest research.
checkGet notified about relevant papers
checkSave papers to use in your research
checkJoin the discussion with peers
checkTrack your impact