Data.Char (original) (raw)
Documentation
The character type [Char](Data-Char.html#t:Char "Data.Char") is an enumeration whose values represent Unicode (or equivalently ISO/IEC 10646) code points (i.e. characters, seehttp://www.unicode.org/ for details). This set extends the ISO 8859-1 (Latin-1) character set (the first 256 characters), which is itself an extension of the ASCII character set (the first 128 characters). A character literal in Haskell has type [Char](Data-Char.html#t:Char "Data.Char").
To convert a [Char](Data-Char.html#t:Char "Data.Char") to or from the corresponding [Int](Data-Int.html#t:Int "Data.Int") value defined by Unicode, use [toEnum](Prelude.html#v:toEnum "Prelude") and [fromEnum](Prelude.html#v:fromEnum "Prelude") from the[Enum](Prelude.html#v:Enum "Prelude") class respectively (or equivalently [ord](Data-Char.html#v:ord "Data.Char") and[chr](Data-Char.html#v:chr "Data.Char")).
Unicode characters are divided into letters, numbers, marks, punctuation, symbols, separators (including spaces) and others (including control characters).
isControl :: Char -> Bool Source #
Selects control characters, which are the non-printing characters of the Latin-1 subset of Unicode.
isUpper :: Char -> Bool Source #
Selects upper-case or title-case alphabetic Unicode characters (letters). Title case is used by a small number of letter ligatures like the single-character form of Lj.
isAlpha :: Char -> Bool Source #
Selects alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters). This function is equivalent to [isLetter](Data-Char.html#v:isLetter "Data.Char").
isAlphaNum :: Char -> Bool Source #
Selects alphabetic or numeric Unicode characters.
Note that numeric digits outside the ASCII range, as well as numeric characters which aren't digits, are selected by this function but not by[isDigit](Data-Char.html#v:isDigit "Data.Char"). Such characters may be part of identifiers but are not used by the printer and reader to represent numbers.
isPrint :: Char -> Bool Source #
Selects printable Unicode characters (letters, numbers, marks, punctuation, symbols and spaces).
isLetter :: Char -> Bool Source #
Selects alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters). This function is equivalent to[isAlpha](Data-Char.html#v:isAlpha "Data.Char").
This function returns [True](Data-Bool.html#v:True "Data.Bool") if its argument has one of the following [GeneralCategory](Data-Char.html#t:GeneralCategory "Data.Char")s, or [False](Data-Bool.html#v:False "Data.Bool") otherwise:
[UppercaseLetter](Data-Char.html#v:UppercaseLetter "Data.Char")[LowercaseLetter](Data-Char.html#v:LowercaseLetter "Data.Char")[TitlecaseLetter](Data-Char.html#v:TitlecaseLetter "Data.Char")[ModifierLetter](Data-Char.html#v:ModifierLetter "Data.Char")[OtherLetter](Data-Char.html#v:OtherLetter "Data.Char")
These classes are defined in theUnicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Letter".
Examples
Expand
Basic usage:
>>> **isLetter 'a'** ****True
>>> **isLetter 'A'** ****True
>>> **isLetter 'λ'** ****True
>>> **isLetter '0'** ****False
>>> **isLetter '%'** ****False
>>> **isLetter '♥'** ****False
>>> **isLetter '\31'** ****False
Ensure that [isLetter](Data-Char.html#v:isLetter "Data.Char") and [isAlpha](Data-Char.html#v:isAlpha "Data.Char") are equivalent.
>>> let chars = [(chr 0)..]** **>>> let letters = map isLetter chars** **>>> let alphas = map isAlpha chars** **>>> **letters == alphas** ****True
isMark :: Char -> Bool Source #
Selects Unicode mark characters, for example accents and the like, which combine with preceding characters.
This function returns [True](Data-Bool.html#v:True "Data.Bool") if its argument has one of the following [GeneralCategory](Data-Char.html#t:GeneralCategory "Data.Char")s, or [False](Data-Bool.html#v:False "Data.Bool") otherwise:
[NonSpacingMark](Data-Char.html#v:NonSpacingMark "Data.Char")[SpacingCombiningMark](Data-Char.html#v:SpacingCombiningMark "Data.Char")[EnclosingMark](Data-Char.html#v:EnclosingMark "Data.Char")
These classes are defined in theUnicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Mark".
Examples
Expand
Basic usage:
>>> **isMark 'a'** ****False
>>> **isMark '0'** ****False
Combining marks such as accent characters usually need to follow another character before they become printable:
>>> map isMark "ò"** **[False,True]
Puns are not necessarily supported:
>>> **isMark '✓'** ****False
isNumber :: Char -> Bool Source #
Selects Unicode numeric characters, including digits from various scripts, Roman numerals, et cetera.
This function returns [True](Data-Bool.html#v:True "Data.Bool") if its argument has one of the following [GeneralCategory](Data-Char.html#t:GeneralCategory "Data.Char")s, or [False](Data-Bool.html#v:False "Data.Bool") otherwise:
[DecimalNumber](Data-Char.html#v:DecimalNumber "Data.Char")[LetterNumber](Data-Char.html#v:LetterNumber "Data.Char")[OtherNumber](Data-Char.html#v:OtherNumber "Data.Char")
These classes are defined in theUnicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Number".
Examples
Expand
Basic usage:
>>> **isNumber 'a'** ****False
>>> **isNumber '%'** ****False
>>> **isNumber '3'** ****True
ASCII '0' through '9' are all numbers:
>>> **and $ map isNumber ['0'..'9']** ****True
Unicode Roman numerals are "numbers" as well:
>>> **isNumber 'Ⅸ'** ****True
isSymbol :: Char -> Bool Source #
Selects Unicode symbol characters, including mathematical and currency symbols.
This function returns [True](Data-Bool.html#v:True "Data.Bool") if its argument has one of the following [GeneralCategory](Data-Char.html#t:GeneralCategory "Data.Char")s, or [False](Data-Bool.html#v:False "Data.Bool") otherwise:
[MathSymbol](Data-Char.html#v:MathSymbol "Data.Char")[CurrencySymbol](Data-Char.html#v:CurrencySymbol "Data.Char")[ModifierSymbol](Data-Char.html#v:ModifierSymbol "Data.Char")[OtherSymbol](Data-Char.html#v:OtherSymbol "Data.Char")
These classes are defined in theUnicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Symbol".
Examples
Expand
Basic usage:
>>> **isSymbol 'a'** ****False
>>> **isSymbol '6'** ****False
>>> **isSymbol '='** ****True
The definition of "math symbol" may be a little counter-intuitive depending on one's background:
>>> **isSymbol '+'** ****True
>>> **isSymbol '-'** ****False
isSeparator :: Char -> Bool Source #
Selects Unicode space and separator characters.
This function returns [True](Data-Bool.html#v:True "Data.Bool") if its argument has one of the following [GeneralCategory](Data-Char.html#t:GeneralCategory "Data.Char")s, or [False](Data-Bool.html#v:False "Data.Bool") otherwise:
[Space](Data-Char.html#v:Space "Data.Char")[LineSeparator](Data-Char.html#v:LineSeparator "Data.Char")[ParagraphSeparator](Data-Char.html#v:ParagraphSeparator "Data.Char")
These classes are defined in theUnicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Separator".
Examples
Expand
Basic usage:
>>> **isSeparator 'a'** ****False
>>> **isSeparator '6'** ****False
>>> **isSeparator ' '** ****True
Warning: newlines and tab characters are not considered separators.
>>> **isSeparator '\n'** ****False
>>> **isSeparator '\t'** ****False
But some more exotic characters are (like HTML's ):
>>> **isSeparator '\160'** ****True
isAscii :: Char -> Bool Source #
Selects the first 128 characters of the Unicode character set, corresponding to the ASCII character set.
isLatin1 :: Char -> Bool Source #
Selects the first 256 characters of the Unicode character set, corresponding to the ISO 8859-1 (Latin-1) character set.
Unicode General Categories (column 2 of the UnicodeData table) in the order they are listed in the Unicode standard (the Unicode Character Database, in particular).
Examples
Expand
Basic usage:
>>> **:t OtherLetter** ****OtherLetter :: GeneralCategory
[Eq](Data-Eq.html#t:Eq "Data.Eq") instance:
>>> **UppercaseLetter == UppercaseLetter** ****True
>>> **UppercaseLetter == LowercaseLetter** ****False
[Ord](Data-Ord.html#t:Ord "Data.Ord") instance:
>>> **NonSpacingMark <= MathSymbol** ****True
[Enum](Prelude.html#t:Enum "Prelude") instance:
>>> enumFromTo ModifierLetter SpacingCombiningMark** **[ModifierLetter,OtherLetter,NonSpacingMark,SpacingCombiningMark]
[Read](Text-Read.html#v:Read "Text.Read") instance:
>>> **read "DashPunctuation" :: GeneralCategory** **DashPunctuation
>>> read "17" :: GeneralCategory** *** Exception: Prelude.read: no parse
[Show](Text-Show.html#t:Show "Text.Show") instance:
>>> show EnclosingMark** **"EnclosingMark"
[Bounded](Prelude.html#t:Bounded "Prelude") instance:
>>> **minBound :: GeneralCategory** ****UppercaseLetter
>>> **maxBound :: GeneralCategory** ****NotAssigned
[Ix](Data-Ix.html#t:Ix "Data.Ix") instance:
>>> import Data.Ix ( index )** **>>> **index (OtherLetter,Control) FinalQuote** **12
>>> index (OtherLetter,Control) Format** *** Exception: Error in array index
Constructors
| UppercaseLetter | Lu: Letter, Uppercase |
|---|---|
| LowercaseLetter | Ll: Letter, Lowercase |
| TitlecaseLetter | Lt: Letter, Titlecase |
| ModifierLetter | Lm: Letter, Modifier |
| OtherLetter | Lo: Letter, Other |
| NonSpacingMark | Mn: Mark, Non-Spacing |
| SpacingCombiningMark | Mc: Mark, Spacing Combining |
| EnclosingMark | Me: Mark, Enclosing |
| DecimalNumber | Nd: Number, Decimal |
| LetterNumber | Nl: Number, Letter |
| OtherNumber | No: Number, Other |
| ConnectorPunctuation | Pc: Punctuation, Connector |
| DashPunctuation | Pd: Punctuation, Dash |
| OpenPunctuation | Ps: Punctuation, Open |
| ClosePunctuation | Pe: Punctuation, Close |
| InitialQuote | Pi: Punctuation, Initial quote |
| FinalQuote | Pf: Punctuation, Final quote |
| OtherPunctuation | Po: Punctuation, Other |
| MathSymbol | Sm: Symbol, Math |
| CurrencySymbol | Sc: Symbol, Currency |
| ModifierSymbol | Sk: Symbol, Modifier |
| OtherSymbol | So: Symbol, Other |
| Space | Zs: Separator, Space |
| LineSeparator | Zl: Separator, Line |
| ParagraphSeparator | Zp: Separator, Paragraph |
| Control | Cc: Other, Control |
| Format | Cf: Other, Format |
| Surrogate | Cs: Other, Surrogate |
| PrivateUse | Co: Other, Private Use |
| NotAssigned | Cn: Other, Not Assigned |
generalCategory :: Char -> GeneralCategory Source #
The Unicode general category of the character. This relies on the[Enum](Prelude.html#t:Enum "Prelude") instance of [GeneralCategory](Data-Char.html#t:GeneralCategory "Data.Char"), which must remain in the same order as the categories are presented in the Unicode standard.
Examples
Expand
Basic usage:
>>> **generalCategory 'a'** ****LowercaseLetter
>>> **generalCategory 'A'** ****UppercaseLetter
>>> **generalCategory '0'** ****DecimalNumber
>>> **generalCategory '%'** ****OtherPunctuation
>>> **generalCategory '♥'** ****OtherSymbol
>>> **generalCategory '\31'** ****Control
>>> **generalCategory ' '** ****Space
toUpper :: Char -> Char Source #
Convert a letter to the corresponding upper-case letter, if any. Any other character is returned unchanged.
toLower :: Char -> Char Source #
Convert a letter to the corresponding lower-case letter, if any. Any other character is returned unchanged.
toTitle :: Char -> Char Source #
Convert a letter to the corresponding title-case or upper-case letter, if any. (Title case differs from upper case only for a small number of ligature letters.) Any other character is returned unchanged.
digitToInt :: Char -> Int Source #
Convert a single digit [Char](Data-Char.html#t:Char "Data.Char") to the corresponding [Int](Data-Int.html#t:Int "Data.Int"). This function fails unless its argument satisfies [isHexDigit](Data-Char.html#v:isHexDigit "Data.Char"), but recognises both upper- and lower-case hexadecimal digits (that is, '0'..'9', 'a'..'f', 'A'..'F').
Examples
Expand
Characters '0' through '9' are converted properly to0..9:
>>> map digitToInt ['0'..'9']** **[0,1,2,3,4,5,6,7,8,9]
Both upper- and lower-case 'A' through 'F' are converted as well, to 10..15.
>>> map digitToInt ['a'..'f']** **[10,11,12,13,14,15]
>>> map digitToInt ['A'..'F']** **[10,11,12,13,14,15]
Anything else throws an exception:
>>> digitToInt 'G'** ***** Exception: Char.digitToInt: not a digit 'G'
>>> digitToInt '♥'** ***** Exception: Char.digitToInt: not a digit '\9829'
intToDigit :: Int -> Char Source #
Convert an [Int](Data-Int.html#t:Int "Data.Int") in the range 0..15 to the corresponding single digit [Char](Data-Char.html#t:Char "Data.Char"). This function fails on other inputs, and generates lower-case hexadecimal digits.
Numeric representationsString representations
showLitChar :: Char -> ShowS Source #
Convert a character to a string using only printable characters, using Haskell source-language escape conventions. For example:
showLitChar '\n' s = "\n" ++ s
lexLitChar :: ReadS String Source #
Read a string representation of a character, using Haskell source-language escape conventions. For example:
lexLitChar "\nHello" = [("\n", "Hello")]
readLitChar :: ReadS Char Source #
Read a string representation of a character, using Haskell source-language escape conventions, and convert it to the character that it encodes. For example:
readLitChar "\nHello" = [('\n', "Hello")]