GHC.Unicode (original) (raw)

Description

Implementations for the character predicates (isLower, isUpper, etc.) and the conversions (toUpper, toLower). The implementation uses libunicode on Unix systems if that is available.

Documentation

data GeneralCategory Source #

Unicode General Categories (column 2 of the UnicodeData table) in the order they are listed in the Unicode standard (the Unicode Character Database, in particular).

Examples

Expand

Basic usage:

>>> **:t OtherLetter** ****OtherLetter :: GeneralCategory

[Eq](Data-Eq.html#t:Eq "Data.Eq") instance:

>>> **UppercaseLetter == UppercaseLetter** ****True >>> **UppercaseLetter == LowercaseLetter** ****False

[Ord](Data-Ord.html#t:Ord "Data.Ord") instance:

>>> **NonSpacingMark <= MathSymbol** ****True

[Enum](Prelude.html#t:Enum "Prelude") instance:

>>> enumFromTo ModifierLetter SpacingCombiningMark** **[ModifierLetter,OtherLetter,NonSpacingMark,SpacingCombiningMark]

[Read](Text-Read.html#v:Read "Text.Read") instance:

>>> **read "DashPunctuation" :: GeneralCategory** **DashPunctuation >>> read "17" :: GeneralCategory** *** Exception: Prelude.read: no parse

[Show](Text-Show.html#t:Show "Text.Show") instance:

>>> show EnclosingMark** **"EnclosingMark"

[Bounded](Prelude.html#t:Bounded "Prelude") instance:

>>> **minBound :: GeneralCategory** ****UppercaseLetter >>> **maxBound :: GeneralCategory** ****NotAssigned

[Ix](Data-Ix.html#t:Ix "Data.Ix") instance:

>>> import Data.Ix ( index )** **>>> **index (OtherLetter,Control) FinalQuote** **12 >>> index (OtherLetter,Control) Format** *** Exception: Error in array index

Constructors

UppercaseLetter Lu: Letter, Uppercase
LowercaseLetter Ll: Letter, Lowercase
TitlecaseLetter Lt: Letter, Titlecase
ModifierLetter Lm: Letter, Modifier
OtherLetter Lo: Letter, Other
NonSpacingMark Mn: Mark, Non-Spacing
SpacingCombiningMark Mc: Mark, Spacing Combining
EnclosingMark Me: Mark, Enclosing
DecimalNumber Nd: Number, Decimal
LetterNumber Nl: Number, Letter
OtherNumber No: Number, Other
ConnectorPunctuation Pc: Punctuation, Connector
DashPunctuation Pd: Punctuation, Dash
OpenPunctuation Ps: Punctuation, Open
ClosePunctuation Pe: Punctuation, Close
InitialQuote Pi: Punctuation, Initial quote
FinalQuote Pf: Punctuation, Final quote
OtherPunctuation Po: Punctuation, Other
MathSymbol Sm: Symbol, Math
CurrencySymbol Sc: Symbol, Currency
ModifierSymbol Sk: Symbol, Modifier
OtherSymbol So: Symbol, Other
Space Zs: Separator, Space
LineSeparator Zl: Separator, Line
ParagraphSeparator Zp: Separator, Paragraph
Control Cc: Other, Control
Format Cf: Other, Format
Surrogate Cs: Other, Surrogate
PrivateUse Co: Other, Private Use
NotAssigned Cn: Other, Not Assigned

generalCategory :: Char -> GeneralCategory Source #

The Unicode general category of the character. This relies on the[Enum](Prelude.html#t:Enum "Prelude") instance of [GeneralCategory](GHC-Unicode.html#t:GeneralCategory "GHC.Unicode"), which must remain in the same order as the categories are presented in the Unicode standard.

Examples

Expand

Basic usage:

>>> **generalCategory 'a'** ****LowercaseLetter >>> **generalCategory 'A'** ****UppercaseLetter >>> **generalCategory '0'** ****DecimalNumber >>> **generalCategory '%'** ****OtherPunctuation >>> **generalCategory '♥'** ****OtherSymbol >>> **generalCategory '\31'** ****Control >>> **generalCategory ' '** ****Space

isAscii :: Char -> Bool Source #

Selects the first 128 characters of the Unicode character set, corresponding to the ASCII character set.

isLatin1 :: Char -> Bool Source #

Selects the first 256 characters of the Unicode character set, corresponding to the ISO 8859-1 (Latin-1) character set.

isControl :: Char -> Bool Source #

Selects control characters, which are the non-printing characters of the Latin-1 subset of Unicode.

isPrint :: Char -> Bool Source #

Selects printable Unicode characters (letters, numbers, marks, punctuation, symbols and spaces).

isUpper :: Char -> Bool Source #

Selects upper-case or title-case alphabetic Unicode characters (letters). Title case is used by a small number of letter ligatures like the single-character form of Lj.

isAlpha :: Char -> Bool Source #

Selects alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters). This function is equivalent to [isLetter](Data-Char.html#v:isLetter "Data.Char").

isAlphaNum :: Char -> Bool Source #

Selects alphabetic or numeric Unicode characters.

Note that numeric digits outside the ASCII range, as well as numeric characters which aren't digits, are selected by this function but not by[isDigit](GHC-Unicode.html#v:isDigit "GHC.Unicode"). Such characters may be part of identifiers but are not used by the printer and reader to represent numbers.

isSymbol :: Char -> Bool Source #

Selects Unicode symbol characters, including mathematical and currency symbols.

This function returns [True](Data-Bool.html#v:True "Data.Bool") if its argument has one of the following [GeneralCategory](GHC-Unicode.html#t:GeneralCategory "GHC.Unicode")s, or [False](Data-Bool.html#v:False "Data.Bool") otherwise:

These classes are defined in theUnicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Symbol".

Examples

Expand

Basic usage:

>>> **isSymbol 'a'** ****False >>> **isSymbol '6'** ****False >>> **isSymbol '='** ****True

The definition of "math symbol" may be a little counter-intuitive depending on one's background:

>>> **isSymbol '+'** ****True >>> **isSymbol '-'** ****False

toUpper :: Char -> Char Source #

Convert a letter to the corresponding upper-case letter, if any. Any other character is returned unchanged.

toLower :: Char -> Char Source #

Convert a letter to the corresponding lower-case letter, if any. Any other character is returned unchanged.

toTitle :: Char -> Char Source #

Convert a letter to the corresponding title-case or upper-case letter, if any. (Title case differs from upper case only for a small number of ligature letters.) Any other character is returned unchanged.