Unicode support - Factor Documentation (original) (raw)

The unicode vocabulary and its sub-vocabularies implement support for the Unicode 14.0 character set.

The Unicode character set contains most of the world's writing systems. Unicode is intended as a replacement for, and is a superset of, such legacy character sets as ASCII, Latin1, MacRoman, and so on. Unicode characters are called code points; Factor's Strings are sequences of code points.

The Unicode character set is accompanied by several standard algorithms for common operations like encoding text in files, capitalizing a string, finding the boundaries between words, and so on.

The Unicode algorithms implemented by the unicode vocabulary are:
Case mapping
Collation and weak comparison
Unicode category syntax
Word and grapheme breaks
Unicode normalization

The following are mostly for internal use:
Unicode category syntax
Unicode data tables