Topics and Terms Mining in Unstructured Data Stores (original) (raw)

2013 IEEE 16th International Conference on Computational Science and Engineering, 2013

Abstract

ABSTRACT One of the major challenges of the "Big Data" epoch is unstructured data mining. The problem arises due to the storage of high-dimensional data that has no standard schema. While knowledge discovery in database (KDD) algorithms were designed for data extraction, the algorithms best fit for structured data storages. Moreover, today, at the data storage level, NoSQL databases have been deployed in response to accommodate the unstructured data. However, the over-reliance on multiple APIs by NoSQL storages hampers efficient data extraction from different NoSQL storages. Also, there are limited numbers of tools available that can perform KDD tasks on NoSQL data stores. In this work, we explore the trend in unstructured data mining and detail the future direction and challenges. Then, focusing on topics and terms extraction from NoSQL databases, we propose a tool called TouchR2, which algorithmically relies on bloom filtering and parallelization. Using the CouchDB data storage as the test case, the evaluation of TouchR2 shows high accuracy for terms extraction and organization within a much optimized duration.

Richard Lomotey hasn't uploaded this paper.

Let Richard know you want this paper to be uploaded.

Ask for this paper to be uploaded.