On indexing error-tolerant set containment (original) (raw)
2010, Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Related papers
Survey of Scalable String Similarity Joins
2015
Similarity Join is an important operation in data integration and cleansing, record linkage, data deduplication and pattern matching. It finds similar sting pairs from two collections of strings. Number of approaches have been proposed as well as compared for string similarity joins. The rising era of big data demands for scalable algorithms to support large scale string similarity joins. In this paper we study the string similarity joins, their use. Further we look at three different techniques for scalable string similarity join using MapReduce, which areParallel set-similarity join, MGJoin and MassJoin. Finally, we try to compare them based on some common characteristics. Keywords— Similarity Join, MapReduce
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.