Data.Char (original) (raw)
Documentation
data Char [Source](/packages/archive///doc/html/src/GHC-Types.html#Char)
The character type [Char](Data-Char.html#t:Char)
is an enumeration whose values represent Unicode (or equivalently ISO/IEC 10646) characters (seehttp://www.unicode.org/ for details). This set extends the ISO 8859-1 (Latin-1) character set (the first 256 characters), which is itself an extension of the ASCII character set (the first 128 characters). A character literal in Haskell has type [Char](Data-Char.html#t:Char)
.
To convert a [Char](Data-Char.html#t:Char)
to or from the corresponding [Int](Data-Int.html#t:Int)
value defined by Unicode, use [toEnum](Prelude.html#t:toEnum)
and [fromEnum](Prelude.html#t:fromEnum)
from the[Enum](Prelude.html#t:Enum)
class respectively (or equivalently ord
and chr
).
Character classification
Unicode characters are divided into letters, numbers, marks, punctuation, symbols, separators (including spaces) and others (including control characters).
isControl :: Char -> BoolSource
Selects control characters, which are the non-printing characters of the Latin-1 subset of Unicode.
Selects upper-case or title-case alphabetic Unicode characters (letters). Title case is used by a small number of letter ligatures like the single-character form of Lj.
Selects alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters). This function is equivalent to [isLetter](Data-Char.html#t:isLetter)
.
isAlphaNum :: Char -> BoolSource
Selects alphabetic or numeric digit Unicode characters.
Note that numeric digits outside the ASCII range are selected by this function but not by [isDigit](Data-Char.html#v:isDigit)
. Such digits may be part of identifiers but are not used by the printer and reader to represent numbers.
Selects printable Unicode characters (letters, numbers, marks, punctuation, symbols and spaces).
isLetter :: Char -> BoolSource
Selects alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters). This function is equivalent to [isAlpha](Data-Char.html#t:isAlpha)
.
Selects Unicode mark characters, e.g. accents and the like, which combine with preceding letters.
isNumber :: Char -> BoolSource
Selects Unicode numeric characters, including digits from various scripts, Roman numerals, etc.
Subranges
Selects the first 128 characters of the Unicode character set, corresponding to the ASCII character set.
isLatin1 :: Char -> BoolSource
Selects the first 256 characters of the Unicode character set, corresponding to the ISO 8859-1 (Latin-1) character set.
Unicode general categories
data GeneralCategory Source
Unicode General Categories (column 2 of the UnicodeData table) in the order they are listed in the Unicode standard.
Constructors
UppercaseLetter | Lu: Letter, Uppercase |
---|---|
LowercaseLetter | Ll: Letter, Lowercase |
TitlecaseLetter | Lt: Letter, Titlecase |
ModifierLetter | Lm: Letter, Modifier |
OtherLetter | Lo: Letter, Other |
NonSpacingMark | Mn: Mark, Non-Spacing |
SpacingCombiningMark | Mc: Mark, Spacing Combining |
EnclosingMark | Me: Mark, Enclosing |
DecimalNumber | Nd: Number, Decimal |
LetterNumber | Nl: Number, Letter |
OtherNumber | No: Number, Other |
ConnectorPunctuation | Pc: Punctuation, Connector |
DashPunctuation | Pd: Punctuation, Dash |
OpenPunctuation | Ps: Punctuation, Open |
ClosePunctuation | Pe: Punctuation, Close |
InitialQuote | Pi: Punctuation, Initial quote |
FinalQuote | Pf: Punctuation, Final quote |
OtherPunctuation | Po: Punctuation, Other |
MathSymbol | Sm: Symbol, Math |
CurrencySymbol | Sc: Symbol, Currency |
ModifierSymbol | Sk: Symbol, Modifier |
OtherSymbol | So: Symbol, Other |
Space | Zs: Separator, Space |
LineSeparator | Zl: Separator, Line |
ParagraphSeparator | Zp: Separator, Paragraph |
Control | Cc: Other, Control |
Format | Cf: Other, Format |
Surrogate | Cs: Other, Surrogate |
PrivateUse | Co: Other, Private Use |
NotAssigned | Cn: Other, Not Assigned |
Case conversion
Convert a letter to the corresponding upper-case letter, if any. Any other character is returned unchanged.
Convert a letter to the corresponding lower-case letter, if any. Any other character is returned unchanged.
Convert a letter to the corresponding title-case or upper-case letter, if any. (Title case differs from upper case only for a small number of ligature letters.) Any other character is returned unchanged.
Single digit characters
digitToInt :: Char -> IntSource
Convert a single digit [Char](Data-Char.html#t:Char)
to the corresponding [Int](Data-Int.html#t:Int)
. This function fails unless its argument satisfies [isHexDigit](Data-Char.html#v:isHexDigit)
, but recognises both upper and lower-case hexadecimal digits (i.e. '0'
..'9'
, 'a'
..'f'
, 'A'
..'F'
).
intToDigit :: Int -> CharSource
Convert an [Int](Data-Int.html#t:Int)
in the range 0
..15
to the corresponding single digit [Char](Data-Char.html#t:Char)
. This function fails on other inputs, and generates lower-case hexadecimal digits.
Numeric representations
String representations
showLitChar :: Char -> ShowSSource
Convert a character to a string using only printable characters, using Haskell source-language escape conventions. For example:
showLitChar '\n' s = "\n" ++ s
lexLitChar :: ReadS StringSource
Read a string representation of a character, using Haskell source-language escape conventions. For example:
lexLitChar "\nHello" = [("\n", "Hello")]
readLitChar :: ReadS CharSource
Read a string representation of a character, using Haskell source-language escape conventions, and convert it to the character that it encodes. For example:
readLitChar "\nHello" = [('\n', "Hello")]