On indexing error-tolerant set containment (original) (raw)

2010, Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

Survey of Scalable String Similarity Joins

2015

Similarity Join is an important operation in data integration and cleansing, record linkage, data deduplication and pattern matching. It finds similar sting pairs from two collections of strings. Number of approaches have been proposed as well as compared for string similarity joins. The rising era of big data demands for scalable algorithms to support large scale string similarity joins. In this paper we study the string similarity joins, their use. Further we look at three different techniques for scalable string similarity join using MapReduce, which areParallel set-similarity join, MGJoin and MassJoin. Finally, we try to compare them based on some common characteristics. Keywords— Similarity Join, MapReduce

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

On indexing error-tolerant set containment (original) (raw)

Related papers

Related papers

Related topics