R Package stringi (original) (raw)
Toggle table of contents sidebar
stringi: Fast and Portable Character String Processing in R¶
stringi (pronounced “stringy”, IPA [strinɡi]) is THE R package for fast, portable, correct, consistent, and convenient string/text processing in any locale or character encoding.
Thanks to ICU – International Components for Unicode,stringi fully supports a wide range of Unicode standards (see also this video).
stri_extract_all(regex="\p{Emoji}", c("歡迎 欢迎! Χαίρετε! Bienvenidos! 😃❤🌍", "spam, spam, 🥓, 🍳, and spam"))
[[1]]
[1] "😃" "❤" "🌍"
[[2]]
[1] "🥓" "🍳"
stri_count_fixed("ACATGAACGGGTACACACTG", "ACA", overlap=TRUE)
[1] 3
stri_sort(c("cudný", "chladný", "hladný", "čudný"), locale="sk_SK")
[1] "cudný" "čudný" "hladný" "chladný"
stringi comes with numerous functions related to data cleansing, information extraction, and natural language processing:
- string concatenation, padding, wrapping, and substring extraction,
- pattern searching (e.g., with ICU Java-like regular expressions),
- collation, sorting, and ranking,
- random string generation,
- string transliteration, case mapping and folding,
- Unicode normalisation,
- date-time formatting and parsing,
and many more.
stringi is among the most often downloaded Rpackages.
You can obtain it from CRANby calling:
install.packages("stringi")
stringi’s source code is hosted onGitHub. It is distributed under the open source BSD-3-clauselicense.
The package’s API was inspired by that of the early (pre-tidyverse; v0.6.2) version of Hadley Wickham’sstringrpackage (and since the 2015 v1.0.0 stringr is powered by stringi). Moreover, Hadley suggested quite a few new package features. The contributions from Bartłomiej Tartanus andmany othersis greatly appreciated. Thanks!
See also: stringx – a set of wrappers around stringi with a base R-compatible API.
Note
To learn more about R, check out Marek’s open-access (free!) textbookDeep R Programming [3].
Citation: Gagolewski M.,stringi: Fast and portable character string processing in R,Journal of Statistical Software 103(2), 2022, 1–59,doi:10.18637/jss.v103.i02.