Data.Char (original) (raw)

Documentation

data Char [Source](/packages/archive///doc/html/src/GHC-Types.html#Char)

The character type [Char](Data-Char.html#t:Char) is an enumeration whose values represent Unicode (or equivalently ISO/IEC 10646) characters (seehttp://www.unicode.org/ for details). This set extends the ISO 8859-1 (Latin-1) character set (the first 256 characters), which is itself an extension of the ASCII character set (the first 128 characters). A character literal in Haskell has type [Char](Data-Char.html#t:Char).

To convert a [Char](Data-Char.html#t:Char) to or from the corresponding [Int](Data-Int.html#t:Int) value defined by Unicode, use [toEnum](Prelude.html#t:toEnum) and [fromEnum](Prelude.html#t:fromEnum) from the[Enum](Prelude.html#t:Enum) class respectively (or equivalently ord and chr).

Character classification

Unicode characters are divided into letters, numbers, marks, punctuation, symbols, separators (including spaces) and others (including control characters).

isControl :: Char -> Bool Source

Selects control characters, which are the non-printing characters of the Latin-1 subset of Unicode.

isUpper :: Char -> Bool Source

Selects upper-case or title-case alphabetic Unicode characters (letters). Title case is used by a small number of letter ligatures like the single-character form of Lj.

isAlpha :: Char -> Bool Source

Selects alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters). This function is equivalent to [isLetter](Data-Char.html#t:isLetter).

isAlphaNum :: Char -> Bool Source

Selects alphabetic or numeric digit Unicode characters.

Note that numeric digits outside the ASCII range are selected by this function but not by [isDigit](Data-Char.html#v:isDigit). Such digits may be part of identifiers but are not used by the printer and reader to represent numbers.

isPrint :: Char -> Bool Source

Selects printable Unicode characters (letters, numbers, marks, punctuation, symbols and spaces).

isLetter :: Char -> Bool Source

isMark :: Char -> Bool Source

Selects Unicode mark characters, e.g. accents and the like, which combine with preceding letters.

isNumber :: Char -> Bool Source

Selects Unicode numeric characters, including digits from various scripts, Roman numerals, etc.

Subranges

isAscii :: Char -> Bool Source

Selects the first 128 characters of the Unicode character set, corresponding to the ASCII character set.

isLatin1 :: Char -> Bool Source

Selects the first 256 characters of the Unicode character set, corresponding to the ISO 8859-1 (Latin-1) character set.

Unicode general categories

data GeneralCategory Source

Unicode General Categories (column 2 of the UnicodeData table) in the order they are listed in the Unicode standard.

Constructors

UppercaseLetter	Lu: Letter, Uppercase
LowercaseLetter	Ll: Letter, Lowercase
TitlecaseLetter	Lt: Letter, Titlecase
ModifierLetter	Lm: Letter, Modifier
OtherLetter	Lo: Letter, Other
NonSpacingMark	Mn: Mark, Non-Spacing
SpacingCombiningMark	Mc: Mark, Spacing Combining
EnclosingMark	Me: Mark, Enclosing
DecimalNumber	Nd: Number, Decimal
LetterNumber	Nl: Number, Letter
OtherNumber	No: Number, Other
ConnectorPunctuation	Pc: Punctuation, Connector
DashPunctuation	Pd: Punctuation, Dash
OpenPunctuation	Ps: Punctuation, Open
ClosePunctuation	Pe: Punctuation, Close
InitialQuote	Pi: Punctuation, Initial quote
FinalQuote	Pf: Punctuation, Final quote
OtherPunctuation	Po: Punctuation, Other
MathSymbol	Sm: Symbol, Math
CurrencySymbol	Sc: Symbol, Currency
ModifierSymbol	Sk: Symbol, Modifier
OtherSymbol	So: Symbol, Other
Space	Zs: Separator, Space
LineSeparator	Zl: Separator, Line
ParagraphSeparator	Zp: Separator, Paragraph
Control	Cc: Other, Control
Format	Cf: Other, Format
Surrogate	Cs: Other, Surrogate
PrivateUse	Co: Other, Private Use
NotAssigned	Cn: Other, Not Assigned

Case conversion

toUpper :: Char -> Char Source

Convert a letter to the corresponding upper-case letter, if any. Any other character is returned unchanged.

toLower :: Char -> Char Source

Convert a letter to the corresponding lower-case letter, if any. Any other character is returned unchanged.

toTitle :: Char -> Char Source

Convert a letter to the corresponding title-case or upper-case letter, if any. (Title case differs from upper case only for a small number of ligature letters.) Any other character is returned unchanged.

Single digit characters

digitToInt :: Char -> Int Source

Convert a single digit [Char](Data-Char.html#t:Char) to the corresponding [Int](Data-Int.html#t:Int). This function fails unless its argument satisfies [isHexDigit](Data-Char.html#v:isHexDigit), but recognises both upper and lower-case hexadecimal digits (i.e. '0'..'9', 'a'..'f', 'A'..'F').

intToDigit :: Int -> Char Source

Convert an [Int](Data-Int.html#t:Int) in the range 0..15 to the corresponding single digit [Char](Data-Char.html#t:Char). This function fails on other inputs, and generates lower-case hexadecimal digits.

Numeric representations

String representations

showLitChar :: Char -> ShowS Source

Convert a character to a string using only printable characters, using Haskell source-language escape conventions. For example:

showLitChar '\n' s = "\n" ++ s

lexLitChar :: ReadS String Source

Read a string representation of a character, using Haskell source-language escape conventions. For example:

lexLitChar "\nHello" = [("\n", "Hello")]

readLitChar :: ReadS Char Source

Read a string representation of a character, using Haskell source-language escape conventions, and convert it to the character that it encodes. For example:

readLitChar "\nHello" = [('\n', "Hello")]