NUSS: Mixed N-Grams and Unigram Sequence Segmentation (original) (raw)

Segmentation of short text sequences - like hashtags - into the separated words sequence, done with the use of dictionary, which may be built on custom corpus of texts. Unigram dictionary is used to find most probable sequence, and n-grams approach is used to determine possible segmentation given the text corpus.

Version:	0.1.0
Depends:	R (≥ 3.5)
Imports:	dplyr, magrittr, Rcpp, stringr, text2vec, textclean, utils
LinkingTo:	BH, Rcpp
Suggests:	testthat (≥ 3.0.0)
Published:	2024-08-19
DOI:	10.32614/CRAN.package.NUSS
Author:	Oskar Kosch [aut, cre]
Maintainer:	Oskar Kosch
BugReports:	https://github.com/theogrost/NUSS/issues
License:	GPL (≥ 3)
URL:	https://github.com/theogrost/NUSS
NeedsCompilation:	yes
Language:	en
Materials:	README
CRAN checks:	NUSS results

Documentation:

Downloads:

Linking:

Please use the canonical formhttps://CRAN.R-project.org/package=NUSSto link to this page.