Addons :: XRegExp (original) (raw)
If you want, you can download XRegExp bundled with all addons as xregexp-all.js. Alternatively, you can download the individual addon scripts from GitHub. XRegExp's npm package uses xregexp-all.js
.
Unicode
The Unicode Base script adds base support for Unicode matching via the \p{…}
syntax. À la carte token addon packages add support for Unicode categories, scripts, and other properties. All Unicode tokens can be inverted using \P{…}
or \p{^…}
. Token names are case insensitive, and any spaces, hyphens, and underscores are ignored. You can omit the braces for token names that are a single letter.
Example
// Categories XRegExp('\p{Sc}\pN+'); // Sc = currency symbol, N = number // Can also use the full names \p{Currency_Symbol} and \p{Number}
// Scripts XRegExp('\p{Cyrillic}'); XRegExp('[\p{Latin}\p{Common}]'); // Can also use the Script= prefix to match ES2018: \p{Script=Cyrillic}
// Properties XRegExp('\p{ASCII}'); XRegExp('\p{Assigned}');
// In action...
const unicodeWord = XRegExp("^\pL+$"); // L = letter unicodeWord.test("Русский"); unicodeWord.test("日本語"); unicodeWord.test("العربية");
XRegExp("^\p{Katakana}+$").test("カタカナ");
By default, \p{…}
and \P{…}
support the Basic Multilingual Plane (i.e. code points up to U+FFFF
). You can opt-in to full 21-bit Unicode support (with code points up to U+10FFFF
) on a per-regex basis by using flag A
. In XRegExp, this is called astral mode. You can automatically add flag A
for all new regexes by running XRegExp.install('astral')
. When in astral mode, \p{…}
and \P{…}
always match a full code point rather than a code unit, using surrogate pairs for code points above U+FFFF
.
// Using flag A to match astral code points XRegExp('^\pS$').test('💩'); // -> false XRegExp('^\pS$', 'A').test('💩'); // -> true // Using surrogate pair U+D83D U+DCA9 to represent U+1F4A9 (pile of poo) XRegExp('^\pS$', 'A').test('\uD83D\uDCA9'); // -> true
// Implicit flag A XRegExp.install('astral'); XRegExp('^\pS$').test('💩'); // -> true
Opting in to astral mode disables the use of \p{…}
and \P{…}
within character classes. In astral mode, use e.g. (\pL|[0-9_])+
instead of [\pL0-9_]+
.
XRegExp.matchRecursive
See API: XRegExp.matchRecursive.
XRegExp.build
See API: XRegExp.build.