Data.Char (original) (raw)
Documentation
data Char #
The character type [Char](Data-Char.html#t:Char "Data.Char")
is an enumeration whose values represent Unicode (or equivalently ISO/IEC 10646) code points (i.e. characters, seehttp://www.unicode.org/ for details). This set extends the ISO 8859-1 (Latin-1) character set (the first 256 characters), which is itself an extension of the ASCII character set (the first 128 characters). A character literal in Haskell has type [Char](Data-Char.html#t:Char "Data.Char")
.
To convert a [Char](Data-Char.html#t:Char "Data.Char")
to or from the corresponding [Int](Data-Int.html#t:Int "Data.Int")
value defined by Unicode, use [toEnum](Prelude.html#v:toEnum "Prelude")
and [fromEnum](Prelude.html#v:fromEnum "Prelude")
from the[Enum](Prelude.html#v:Enum "Prelude")
class respectively (or equivalently ord
and chr
).
Unicode characters are divided into letters, numbers, marks, punctuation, symbols, separators (including spaces) and others (including control characters).
isControl :: Char -> Bool Source #
Selects control characters, which are the non-printing characters of the Latin-1 subset of Unicode.
isUpper :: Char -> Bool Source #
Selects upper-case or title-case alphabetic Unicode characters (letters). Title case is used by a small number of letter ligatures like the single-character form of Lj.
isAlpha :: Char -> Bool Source #
Selects alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters). This function is equivalent to [isLetter](Data-Char.html#v:isLetter "Data.Char")
.
isAlphaNum :: Char -> Bool Source #
Selects alphabetic or numeric Unicode characters.
Note that numeric digits outside the ASCII range, as well as numeric characters which aren't digits, are selected by this function but not by[isDigit](Data-Char.html#v:isDigit "Data.Char")
. Such characters may be part of identifiers but are not used by the printer and reader to represent numbers.
isPrint :: Char -> Bool Source #
Selects printable Unicode characters (letters, numbers, marks, punctuation, symbols and spaces).
isLetter :: Char -> Bool Source #
Selects alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters). This function is equivalent to[isAlpha](Data-Char.html#v:isAlpha "Data.Char")
.
This function returns [True](Data-Bool.html#v:True "Data.Bool")
if its argument has one of the following [GeneralCategory](Data-Char.html#t:GeneralCategory "Data.Char")
s, or [False](Data-Bool.html#v:False "Data.Bool")
otherwise:
[UppercaseLetter](Data-Char.html#v:UppercaseLetter "Data.Char")
[LowercaseLetter](Data-Char.html#v:LowercaseLetter "Data.Char")
[TitlecaseLetter](Data-Char.html#v:TitlecaseLetter "Data.Char")
[ModifierLetter](Data-Char.html#v:ModifierLetter "Data.Char")
[OtherLetter](Data-Char.html#v:OtherLetter "Data.Char")
These classes are defined in theUnicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Letter".
Examples
Expand
Basic usage:
>>>
**isLetter 'a'** **
**True
>>>
**isLetter 'A'** **
**True
>>>
**isLetter 'λ'** **
**True
>>>
**isLetter '0'** **
**False
>>>
**isLetter '%'** **
**False
>>>
**isLetter '♥'** **
**False
>>>
**isLetter '\31'** **
**False
Ensure that [isLetter](Data-Char.html#v:isLetter "Data.Char")
and [isAlpha](Data-Char.html#v:isAlpha "Data.Char")
are equivalent.
>>>
let chars = [(chr 0)..]** **
>>>
let letters = map isLetter chars** **
>>>
let alphas = map isAlpha chars** **
>>>
**letters == alphas** **
**True
isMark :: Char -> Bool Source #
Selects Unicode mark characters, for example accents and the like, which combine with preceding characters.
This function returns [True](Data-Bool.html#v:True "Data.Bool")
if its argument has one of the following [GeneralCategory](Data-Char.html#t:GeneralCategory "Data.Char")
s, or [False](Data-Bool.html#v:False "Data.Bool")
otherwise:
[NonSpacingMark](Data-Char.html#v:NonSpacingMark "Data.Char")
[SpacingCombiningMark](Data-Char.html#v:SpacingCombiningMark "Data.Char")
[EnclosingMark](Data-Char.html#v:EnclosingMark "Data.Char")
These classes are defined in theUnicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Mark".
Examples
Expand
Basic usage:
>>>
**isMark 'a'** **
**False
>>>
**isMark '0'** **
**False
Combining marks such as accent characters usually need to follow another character before they become printable:
>>>
map isMark "ò"** **
[False,True]
Puns are not necessarily supported:
>>>
**isMark '✓'** **
**False
isNumber :: Char -> Bool Source #
Selects Unicode numeric characters, including digits from various scripts, Roman numerals, et cetera.
This function returns [True](Data-Bool.html#v:True "Data.Bool")
if its argument has one of the following [GeneralCategory](Data-Char.html#t:GeneralCategory "Data.Char")
s, or [False](Data-Bool.html#v:False "Data.Bool")
otherwise:
[DecimalNumber](Data-Char.html#v:DecimalNumber "Data.Char")
[LetterNumber](Data-Char.html#v:LetterNumber "Data.Char")
[OtherNumber](Data-Char.html#v:OtherNumber "Data.Char")
These classes are defined in theUnicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Number".
Examples
Expand
Basic usage:
>>>
**isNumber 'a'** **
**False
>>>
**isNumber '%'** **
**False
>>>
**isNumber '3'** **
**True
ASCII '0'
through '9'
are all numbers:
>>>
**and $ map isNumber ['0'..'9']** **
**True
Unicode Roman numerals are "numbers" as well:
>>>
**isNumber 'Ⅸ'** **
**True
isSymbol :: Char -> Bool Source #
Selects Unicode symbol characters, including mathematical and currency symbols.
This function returns [True](Data-Bool.html#v:True "Data.Bool")
if its argument has one of the following [GeneralCategory](Data-Char.html#t:GeneralCategory "Data.Char")
s, or [False](Data-Bool.html#v:False "Data.Bool")
otherwise:
[MathSymbol](Data-Char.html#v:MathSymbol "Data.Char")
[CurrencySymbol](Data-Char.html#v:CurrencySymbol "Data.Char")
[ModifierSymbol](Data-Char.html#v:ModifierSymbol "Data.Char")
[OtherSymbol](Data-Char.html#v:OtherSymbol "Data.Char")
These classes are defined in theUnicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Symbol".
Examples
Expand
Basic usage:
>>>
**isSymbol 'a'** **
**False
>>>
**isSymbol '6'** **
**False
>>>
**isSymbol '='** **
**True
The definition of "math symbol" may be a little counter-intuitive depending on one's background:
>>>
**isSymbol '+'** **
**True
>>>
**isSymbol '-'** **
**False
isSeparator :: Char -> Bool Source #
Selects Unicode space and separator characters.
This function returns [True](Data-Bool.html#v:True "Data.Bool")
if its argument has one of the following [GeneralCategory](Data-Char.html#t:GeneralCategory "Data.Char")
s, or [False](Data-Bool.html#v:False "Data.Bool")
otherwise:
[Space](Data-Char.html#v:Space "Data.Char")
[LineSeparator](Data-Char.html#v:LineSeparator "Data.Char")
[ParagraphSeparator](Data-Char.html#v:ParagraphSeparator "Data.Char")
These classes are defined in theUnicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Separator".
Examples
Expand
Basic usage:
>>>
**isSeparator 'a'** **
**False
>>>
**isSeparator '6'** **
**False
>>>
**isSeparator ' '** **
**True
Warning: newlines and tab characters are not considered separators.
>>>
**isSeparator '\n'** **
**False
>>>
**isSeparator '\t'** **
**False
But some more exotic characters are (like HTML's
):
>>>
**isSeparator '\160'** **
**True
isAscii :: Char -> Bool Source #
Selects the first 128 characters of the Unicode character set, corresponding to the ASCII character set.
isLatin1 :: Char -> Bool Source #
Selects the first 256 characters of the Unicode character set, corresponding to the ISO 8859-1 (Latin-1) character set.
Unicode General Categories (column 2 of the UnicodeData table) in the order they are listed in the Unicode standard (the Unicode Character Database, in particular).
Examples
Expand
Basic usage:
>>>
**:t OtherLetter** **
**OtherLetter :: GeneralCategory
[Eq](Data-Eq.html#t:Eq "Data.Eq")
instance:
>>>
**UppercaseLetter == UppercaseLetter** **
**True
>>>
**UppercaseLetter == LowercaseLetter** **
**False
[Ord](Data-Ord.html#t:Ord "Data.Ord")
instance:
>>>
**NonSpacingMark <= MathSymbol** **
**True
[Enum](Prelude.html#t:Enum "Prelude")
instance:
>>>
enumFromTo ModifierLetter SpacingCombiningMark** **
[ModifierLetter,OtherLetter,NonSpacingMark,SpacingCombiningMark]
Read
instance:
>>>
**read "DashPunctuation" :: GeneralCategory** **
DashPunctuation
>>>
read "17" :: GeneralCategory** **
* Exception: Prelude.read: no parse
[Show](Text-Show.html#t:Show "Text.Show")
instance:
>>>
show EnclosingMark** **
"EnclosingMark"
[Bounded](Prelude.html#t:Bounded "Prelude")
instance:
>>>
**minBound :: GeneralCategory** **
**UppercaseLetter
>>>
**maxBound :: GeneralCategory** **
**NotAssigned
[Ix](Data-Ix.html#t:Ix "Data.Ix")
instance:
>>>
import Data.Ix ( index )** **
>>>
**index (OtherLetter,Control) FinalQuote** **
12
>>>
index (OtherLetter,Control) Format** **
* Exception: Error in array index
Constructors
UppercaseLetter | Lu: Letter, Uppercase |
---|---|
LowercaseLetter | Ll: Letter, Lowercase |
TitlecaseLetter | Lt: Letter, Titlecase |
ModifierLetter | Lm: Letter, Modifier |
OtherLetter | Lo: Letter, Other |
NonSpacingMark | Mn: Mark, Non-Spacing |
SpacingCombiningMark | Mc: Mark, Spacing Combining |
EnclosingMark | Me: Mark, Enclosing |
DecimalNumber | Nd: Number, Decimal |
LetterNumber | Nl: Number, Letter |
OtherNumber | No: Number, Other |
ConnectorPunctuation | Pc: Punctuation, Connector |
DashPunctuation | Pd: Punctuation, Dash |
OpenPunctuation | Ps: Punctuation, Open |
ClosePunctuation | Pe: Punctuation, Close |
InitialQuote | Pi: Punctuation, Initial quote |
FinalQuote | Pf: Punctuation, Final quote |
OtherPunctuation | Po: Punctuation, Other |
MathSymbol | Sm: Symbol, Math |
CurrencySymbol | Sc: Symbol, Currency |
ModifierSymbol | Sk: Symbol, Modifier |
OtherSymbol | So: Symbol, Other |
Space | Zs: Separator, Space |
LineSeparator | Zl: Separator, Line |
ParagraphSeparator | Zp: Separator, Paragraph |
Control | Cc: Other, Control |
Format | Cf: Other, Format |
Surrogate | Cs: Other, Surrogate |
PrivateUse | Co: Other, Private Use |
NotAssigned | Cn: Other, Not Assigned |
generalCategory :: Char -> GeneralCategory Source #
The Unicode general category of the character. This relies on the[Enum](Prelude.html#t:Enum "Prelude")
instance of [GeneralCategory](Data-Char.html#t:GeneralCategory "Data.Char")
, which must remain in the same order as the categories are presented in the Unicode standard.
Examples
Expand
Basic usage:
>>>
**generalCategory 'a'** **
**LowercaseLetter
>>>
**generalCategory 'A'** **
**UppercaseLetter
>>>
**generalCategory '0'** **
**DecimalNumber
>>>
**generalCategory '%'** **
**OtherPunctuation
>>>
**generalCategory '♥'** **
**OtherSymbol
>>>
**generalCategory '\31'** **
**Control
>>>
**generalCategory ' '** **
**Space
toUpper :: Char -> Char Source #
Convert a letter to the corresponding upper-case letter, if any. Any other character is returned unchanged.
toLower :: Char -> Char Source #
Convert a letter to the corresponding lower-case letter, if any. Any other character is returned unchanged.
toTitle :: Char -> Char Source #
Convert a letter to the corresponding title-case or upper-case letter, if any. (Title case differs from upper case only for a small number of ligature letters.) Any other character is returned unchanged.
digitToInt :: Char -> Int Source #
Convert a single digit [Char](Data-Char.html#t:Char "Data.Char")
to the corresponding [Int](Data-Int.html#t:Int "Data.Int")
. This function fails unless its argument satisfies [isHexDigit](Data-Char.html#v:isHexDigit "Data.Char")
, but recognises both upper- and lower-case hexadecimal digits (that is, '0'
..'9'
, 'a'
..'f'
, 'A'
..'F'
).
Examples
Expand
Characters '0'
through '9'
are converted properly to0..9
:
>>>
map digitToInt ['0'..'9']** **
[0,1,2,3,4,5,6,7,8,9]
Both upper- and lower-case 'A'
through 'F'
are converted as well, to 10..15
.
>>>
map digitToInt ['a'..'f']** **
[10,11,12,13,14,15]
>>>
map digitToInt ['A'..'F']** **
[10,11,12,13,14,15]
Anything else throws an exception:
>>>
digitToInt 'G'** **
*** Exception: Char.digitToInt: not a digit 'G'
>>>
digitToInt '♥'** **
*** Exception: Char.digitToInt: not a digit '\9829'
intToDigit :: Int -> Char Source #
Convert an [Int](Data-Int.html#t:Int "Data.Int")
in the range 0
..15
to the corresponding single digit [Char](Data-Char.html#t:Char "Data.Char")
. This function fails on other inputs, and generates lower-case hexadecimal digits.
Numeric representationsString representations
showLitChar :: Char -> ShowS Source #
Convert a character to a string using only printable characters, using Haskell source-language escape conventions. For example:
showLitChar '\n' s = "\n" ++ s
lexLitChar :: ReadS String Source #
Read a string representation of a character, using Haskell source-language escape conventions. For example:
lexLitChar "\nHello" = [("\n", "Hello")]
readLitChar :: ReadS Char Source #
Read a string representation of a character, using Haskell source-language escape conventions, and convert it to the character that it encodes. For example:
readLitChar "\nHello" = [('\n', "Hello")]