morphemepiece: Morpheme Tokenization (original) (raw)

Tokenize text into morphemes. The morphemepiece algorithm uses a lookup table to determine the morpheme breakdown of words, and falls back on a modified wordpiece tokenization algorithm for words not found in the lookup table.

Version:	1.2.3
Imports:	dlr (≥ 1.0.0), fastmatch, magrittr, memoise (≥ 2.0.0), morphemepiece.data, piecemaker (≥ 1.0.0), purrr (≥ 0.3.4), readr, rlang, stringr (≥ 1.4.0)
Suggests:	dplyr, fs, ggplot2, here, knitr, remotes, rmarkdown, testthat (≥ 3.0.0), utils
Published:	2022-04-16
DOI:	10.32614/CRAN.package.morphemepiece
Author:	Jonathan Bratt [aut, cre], Jon Harmon [aut], Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph]
Maintainer:	Jonathan Bratt <jonathan.bratt at macmillan.com>
BugReports:	https://github.com/macmillancontentscience/morphemepiece/issues
License:	Apache License (≥ 2)
URL:	https://github.com/macmillancontentscience/morphemepiece
NeedsCompilation:	no
Materials:	README, NEWS
CRAN checks:	morphemepiece results

Documentation:

Downloads:

Linking:

Please use the canonical formhttps://CRAN.R-project.org/package=morphemepieceto link to this page.