Unicode support - Factor Documentation (original) (raw)
The unicode vocabulary and its sub-vocabularies implement support for the Unicode 14.0 character set.
The Unicode character set contains most of the world's writing systems. Unicode is intended as a replacement for, and is a superset of, such legacy character sets as ASCII, Latin1, MacRoman, and so on. Unicode characters are called code points; Factor's Strings are sequences of code points.
The Unicode character set is accompanied by several standard algorithms for common operations like encoding text in files, capitalizing a string, finding the boundaries between words, and so on.
The Unicode algorithms implemented by the unicode vocabulary are: Case mapping
Collation and weak comparison
Unicode category syntax
Word and grapheme breaks
Unicode normalization
The following are mostly for internal use: Unicode category syntax
Unicode data tables
See also
ASCII, I/O encodings