textreuse: Detect Text Reuse and Document Similarity (original) (raw)

Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.

Version:	0.1.5
Depends:	R (≥ 3.1.1)
Imports:	assertthat (≥ 0.1), digest (≥ 0.6.8), dplyr (≥ 0.8.0), NLP (≥ 0.1.8), Rcpp (≥ 0.12.0), RcppProgress (≥ 0.1), stringr (≥ 1.0.0), tibble (≥ 3.0.1), tidyr (≥ 0.3.1)
LinkingTo:	BH, Rcpp, RcppProgress
Suggests:	testthat (≥ 0.11.0), knitr (≥ 1.11), rmarkdown (≥ 0.8), covr
Published:	2020-05-15
DOI:	10.32614/CRAN.package.textreuse
Author:	Lincoln Mullen [aut, cre]
Maintainer:	Lincoln Mullen
BugReports:	https://github.com/ropensci/textreuse/issues
License:	MIT + file
URL:	https://docs.ropensci.org/textreuse,https://github.com/ropensci/textreuse
NeedsCompilation:	yes
Materials:	README, NEWS
In views:	NaturalLanguageProcessing
CRAN checks:	textreuse results

Documentation:

Downloads:

Reverse dependencies:

Linking:

Please use the canonical formhttps://CRAN.R-project.org/package=textreuseto link to this page.