Data.Char (original) (raw)

Documentation

data Char [Source](/packages/archive///doc/html/src/GHC-Types.html#Char)

The character type [Char](Data-Char.html#t:Char) is an enumeration whose values represent Unicode (or equivalently ISO/IEC 10646) characters (seehttp://www.unicode.org/ for details). This set extends the ISO 8859-1 (Latin-1) character set (the first 256 characters), which is itself an extension of the ASCII character set (the first 128 characters). A character literal in Haskell has type [Char](Data-Char.html#t:Char).

To convert a [Char](Data-Char.html#t:Char) to or from the corresponding [Int](Data-Int.html#t:Int) value defined by Unicode, use [toEnum](Prelude.html#t:toEnum) and [fromEnum](Prelude.html#t:fromEnum) from the[Enum](Prelude.html#t:Enum) class respectively (or equivalently ord and chr).

Character classification

Unicode characters are divided into letters, numbers, marks, punctuation, symbols, separators (including spaces) and others (including control characters).

isControl :: Char -> BoolSource

Selects control characters, which are the non-printing characters of the Latin-1 subset of Unicode.

isUpper :: Char -> BoolSource

Selects upper-case or title-case alphabetic Unicode characters (letters). Title case is used by a small number of letter ligatures like the single-character form of Lj.

isAlpha :: Char -> BoolSource

Selects alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters). This function is equivalent to [isLetter](Data-Char.html#t:isLetter).

isAlphaNum :: Char -> BoolSource

Selects alphabetic or numeric digit Unicode characters.

Note that numeric digits outside the ASCII range are selected by this function but not by [isDigit](Data-Char.html#v:isDigit). Such digits may be part of identifiers but are not used by the printer and reader to represent numbers.

isPrint :: Char -> BoolSource

Selects printable Unicode characters (letters, numbers, marks, punctuation, symbols and spaces).

isLetter :: Char -> BoolSource

Selects alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters). This function is equivalent to [isAlpha](Data-Char.html#t:isAlpha).

isMark :: Char -> BoolSource

Selects Unicode mark characters, e.g. accents and the like, which combine with preceding letters.

isNumber :: Char -> BoolSource

Selects Unicode numeric characters, including digits from various scripts, Roman numerals, etc.

Subranges

isAscii :: Char -> BoolSource

Selects the first 128 characters of the Unicode character set, corresponding to the ASCII character set.

isLatin1 :: Char -> BoolSource

Selects the first 256 characters of the Unicode character set, corresponding to the ISO 8859-1 (Latin-1) character set.

Unicode general categories

data GeneralCategory Source

Unicode General Categories (column 2 of the UnicodeData table) in the order they are listed in the Unicode standard.

Constructors

UppercaseLetter Lu: Letter, Uppercase
LowercaseLetter Ll: Letter, Lowercase
TitlecaseLetter Lt: Letter, Titlecase
ModifierLetter Lm: Letter, Modifier
OtherLetter Lo: Letter, Other
NonSpacingMark Mn: Mark, Non-Spacing
SpacingCombiningMark Mc: Mark, Spacing Combining
EnclosingMark Me: Mark, Enclosing
DecimalNumber Nd: Number, Decimal
LetterNumber Nl: Number, Letter
OtherNumber No: Number, Other
ConnectorPunctuation Pc: Punctuation, Connector
DashPunctuation Pd: Punctuation, Dash
OpenPunctuation Ps: Punctuation, Open
ClosePunctuation Pe: Punctuation, Close
InitialQuote Pi: Punctuation, Initial quote
FinalQuote Pf: Punctuation, Final quote
OtherPunctuation Po: Punctuation, Other
MathSymbol Sm: Symbol, Math
CurrencySymbol Sc: Symbol, Currency
ModifierSymbol Sk: Symbol, Modifier
OtherSymbol So: Symbol, Other
Space Zs: Separator, Space
LineSeparator Zl: Separator, Line
ParagraphSeparator Zp: Separator, Paragraph
Control Cc: Other, Control
Format Cf: Other, Format
Surrogate Cs: Other, Surrogate
PrivateUse Co: Other, Private Use
NotAssigned Cn: Other, Not Assigned

Case conversion

toUpper :: Char -> CharSource

Convert a letter to the corresponding upper-case letter, if any. Any other character is returned unchanged.

toLower :: Char -> CharSource

Convert a letter to the corresponding lower-case letter, if any. Any other character is returned unchanged.

toTitle :: Char -> CharSource

Convert a letter to the corresponding title-case or upper-case letter, if any. (Title case differs from upper case only for a small number of ligature letters.) Any other character is returned unchanged.

Single digit characters

digitToInt :: Char -> IntSource

Convert a single digit [Char](Data-Char.html#t:Char) to the corresponding [Int](Data-Int.html#t:Int). This function fails unless its argument satisfies [isHexDigit](Data-Char.html#v:isHexDigit), but recognises both upper and lower-case hexadecimal digits (i.e. '0'..'9', 'a'..'f', 'A'..'F').

intToDigit :: Int -> CharSource

Convert an [Int](Data-Int.html#t:Int) in the range 0..15 to the corresponding single digit [Char](Data-Char.html#t:Char). This function fails on other inputs, and generates lower-case hexadecimal digits.

Numeric representations

String representations

showLitChar :: Char -> ShowSSource

Convert a character to a string using only printable characters, using Haskell source-language escape conventions. For example:

showLitChar '\n' s = "\n" ++ s

lexLitChar :: ReadS StringSource

Read a string representation of a character, using Haskell source-language escape conventions. For example:

lexLitChar "\nHello" = [("\n", "Hello")]

readLitChar :: ReadS CharSource

Read a string representation of a character, using Haskell source-language escape conventions, and convert it to the character that it encodes. For example:

readLitChar "\nHello" = [('\n', "Hello")]