textreuse: Detect Text Reuse and Document Similarity (original) (raw)
Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.
| Version: | 0.1.5 |
|---|---|
| Depends: | R (≥ 3.1.1) |
| Imports: | assertthat (≥ 0.1), digest (≥ 0.6.8), dplyr (≥ 0.8.0), NLP (≥ 0.1.8), Rcpp (≥ 0.12.0), RcppProgress (≥ 0.1), stringr (≥ 1.0.0), tibble (≥ 3.0.1), tidyr (≥ 0.3.1) |
| LinkingTo: | BH, Rcpp, RcppProgress |
| Suggests: | testthat (≥ 0.11.0), knitr (≥ 1.11), rmarkdown (≥ 0.8), covr |
| Published: | 2020-05-15 |
| DOI: | 10.32614/CRAN.package.textreuse |
| Author: | Lincoln Mullen |
| Maintainer: | Lincoln Mullen |
| BugReports: | https://github.com/ropensci/textreuse/issues |
| License: | MIT + file |
| URL: | https://docs.ropensci.org/textreuse,https://github.com/ropensci/textreuse |
| NeedsCompilation: | yes |
| Materials: | README, NEWS |
| In views: | NaturalLanguageProcessing |
| CRAN checks: | textreuse results |
Documentation:
Downloads:
Reverse dependencies:
Linking:
Please use the canonical formhttps://CRAN.R-project.org/package=textreuseto link to this page.