NUSS: Mixed N-Grams and Unigram Sequence Segmentation (original) (raw)
Segmentation of short text sequences - like hashtags - into the separated words sequence, done with the use of dictionary, which may be built on custom corpus of texts. Unigram dictionary is used to find most probable sequence, and n-grams approach is used to determine possible segmentation given the text corpus.
Version: | 0.1.0 |
---|---|
Depends: | R (≥ 3.5) |
Imports: | dplyr, magrittr, Rcpp, stringr, text2vec, textclean, utils |
LinkingTo: | BH, Rcpp |
Suggests: | testthat (≥ 3.0.0) |
Published: | 2024-08-19 |
DOI: | 10.32614/CRAN.package.NUSS |
Author: | Oskar Kosch [aut, cre] |
Maintainer: | Oskar Kosch |
BugReports: | https://github.com/theogrost/NUSS/issues |
License: | GPL (≥ 3) |
URL: | https://github.com/theogrost/NUSS |
NeedsCompilation: | yes |
Language: | en |
Materials: | README |
CRAN checks: | NUSS results |
Documentation:
Downloads:
Linking:
Please use the canonical formhttps://CRAN.R-project.org/package=NUSSto link to this page.