Data.Char (original) (raw)

Documentation

data Char #

The character type [Char](Data-Char.html#t:Char "Data.Char") is an enumeration whose values represent Unicode (or equivalently ISO/IEC 10646) code points (i.e. characters, seehttp://www.unicode.org/ for details). This set extends the ISO 8859-1 (Latin-1) character set (the first 256 characters), which is itself an extension of the ASCII character set (the first 128 characters). A character literal in Haskell has type [Char](Data-Char.html#t:Char "Data.Char").

To convert a [Char](Data-Char.html#t:Char "Data.Char") to or from the corresponding [Int](Data-Int.html#t:Int "Data.Int") value defined by Unicode, use [toEnum](Prelude.html#v:toEnum "Prelude") and [fromEnum](Prelude.html#v:fromEnum "Prelude") from the[Enum](Prelude.html#v:Enum "Prelude") class respectively (or equivalently ord and chr).

Character classification

Unicode characters are divided into letters, numbers, marks, punctuation, symbols, separators (including spaces) and others (including control characters).

isControl :: Char -> Bool Source #

Selects control characters, which are the non-printing characters of the Latin-1 subset of Unicode.

isUpper :: Char -> Bool Source #

Selects upper-case or title-case alphabetic Unicode characters (letters). Title case is used by a small number of letter ligatures like the single-character form of Lj.

isAlpha :: Char -> Bool Source #

Selects alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters). This function is equivalent to [isLetter](Data-Char.html#v:isLetter "Data.Char").

isAlphaNum :: Char -> Bool Source #

Selects alphabetic or numeric Unicode characters.

Note that numeric digits outside the ASCII range, as well as numeric characters which aren't digits, are selected by this function but not by[isDigit](Data-Char.html#v:isDigit "Data.Char"). Such characters may be part of identifiers but are not used by the printer and reader to represent numbers.

isPrint :: Char -> Bool Source #

Selects printable Unicode characters (letters, numbers, marks, punctuation, symbols and spaces).

isLetter :: Char -> Bool Source #

Selects alphabetic Unicode characters (lower-case, upper-case and title-case letters, plus letters of caseless scripts and modifiers letters). This function is equivalent to[isAlpha](Data-Char.html#v:isAlpha "Data.Char").

This function returns [True](Data-Bool.html#v:True "Data.Bool") if its argument has one of the following [GeneralCategory](Data-Char.html#t:GeneralCategory "Data.Char")s, or [False](Data-Bool.html#v:False "Data.Bool") otherwise:

These classes are defined in theUnicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Letter".

Examples

Expand

Basic usage:

>>> **isLetter 'a'** ****True >>> **isLetter 'A'** ****True >>> **isLetter 'λ'** ****True >>> **isLetter '0'** ****False >>> **isLetter '%'** ****False >>> **isLetter '♥'** ****False >>> **isLetter '\31'** ****False

Ensure that [isLetter](Data-Char.html#v:isLetter "Data.Char") and [isAlpha](Data-Char.html#v:isAlpha "Data.Char") are equivalent.

>>> let chars = [(chr 0)..]** **>>> let letters = map isLetter chars** **>>> let alphas = map isAlpha chars** **>>> **letters == alphas** ****True

isMark :: Char -> Bool Source #

Selects Unicode mark characters, for example accents and the like, which combine with preceding characters.

This function returns [True](Data-Bool.html#v:True "Data.Bool") if its argument has one of the following [GeneralCategory](Data-Char.html#t:GeneralCategory "Data.Char")s, or [False](Data-Bool.html#v:False "Data.Bool") otherwise:

These classes are defined in theUnicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Mark".

Examples

Expand

Basic usage:

>>> **isMark 'a'** ****False >>> **isMark '0'** ****False

Combining marks such as accent characters usually need to follow another character before they become printable:

>>> map isMark "ò"** **[False,True]

Puns are not necessarily supported:

>>> **isMark '✓'** ****False

isNumber :: Char -> Bool Source #

Selects Unicode numeric characters, including digits from various scripts, Roman numerals, et cetera.

This function returns [True](Data-Bool.html#v:True "Data.Bool") if its argument has one of the following [GeneralCategory](Data-Char.html#t:GeneralCategory "Data.Char")s, or [False](Data-Bool.html#v:False "Data.Bool") otherwise:

These classes are defined in theUnicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Number".

Examples

Expand

Basic usage:

>>> **isNumber 'a'** ****False >>> **isNumber '%'** ****False >>> **isNumber '3'** ****True

ASCII '0' through '9' are all numbers:

>>> **and $ map isNumber ['0'..'9']** ****True

Unicode Roman numerals are "numbers" as well:

>>> **isNumber 'Ⅸ'** ****True

isSymbol :: Char -> Bool Source #

Selects Unicode symbol characters, including mathematical and currency symbols.

This function returns [True](Data-Bool.html#v:True "Data.Bool") if its argument has one of the following [GeneralCategory](Data-Char.html#t:GeneralCategory "Data.Char")s, or [False](Data-Bool.html#v:False "Data.Bool") otherwise:

These classes are defined in theUnicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Symbol".

Examples

Expand

Basic usage:

>>> **isSymbol 'a'** ****False >>> **isSymbol '6'** ****False >>> **isSymbol '='** ****True

The definition of "math symbol" may be a little counter-intuitive depending on one's background:

>>> **isSymbol '+'** ****True >>> **isSymbol '-'** ****False

isSeparator :: Char -> Bool Source #

Selects Unicode space and separator characters.

This function returns [True](Data-Bool.html#v:True "Data.Bool") if its argument has one of the following [GeneralCategory](Data-Char.html#t:GeneralCategory "Data.Char")s, or [False](Data-Bool.html#v:False "Data.Bool") otherwise:

These classes are defined in theUnicode Character Database, part of the Unicode standard. The same document defines what is and is not a "Separator".

Examples

Expand

Basic usage:

>>> **isSeparator 'a'** ****False >>> **isSeparator '6'** ****False >>> **isSeparator ' '** ****True

Warning: newlines and tab characters are not considered separators.

>>> **isSeparator '\n'** ****False >>> **isSeparator '\t'** ****False

But some more exotic characters are (like HTML's  ):

>>> **isSeparator '\160'** ****True

Subranges

isAscii :: Char -> Bool Source #

Selects the first 128 characters of the Unicode character set, corresponding to the ASCII character set.

isLatin1 :: Char -> Bool Source #

Selects the first 256 characters of the Unicode character set, corresponding to the ISO 8859-1 (Latin-1) character set.

Unicode general categories

data GeneralCategory Source #

Unicode General Categories (column 2 of the UnicodeData table) in the order they are listed in the Unicode standard (the Unicode Character Database, in particular).

Examples

Expand

Basic usage:

>>> **:t OtherLetter** ****OtherLetter :: GeneralCategory

[Eq](Data-Eq.html#t:Eq "Data.Eq") instance:

>>> **UppercaseLetter == UppercaseLetter** ****True >>> **UppercaseLetter == LowercaseLetter** ****False

[Ord](Data-Ord.html#t:Ord "Data.Ord") instance:

>>> **NonSpacingMark <= MathSymbol** ****True

[Enum](Prelude.html#t:Enum "Prelude") instance:

>>> enumFromTo ModifierLetter SpacingCombiningMark** **[ModifierLetter,OtherLetter,NonSpacingMark,SpacingCombiningMark]

Read instance:

>>> **read "DashPunctuation" :: GeneralCategory** **DashPunctuation >>> read "17" :: GeneralCategory** *** Exception: Prelude.read: no parse

[Show](Text-Show.html#t:Show "Text.Show") instance:

>>> show EnclosingMark** **"EnclosingMark"

[Bounded](Prelude.html#t:Bounded "Prelude") instance:

>>> **minBound :: GeneralCategory** ****UppercaseLetter >>> **maxBound :: GeneralCategory** ****NotAssigned

[Ix](Data-Ix.html#t:Ix "Data.Ix") instance:

>>> import Data.Ix ( index )** **>>> **index (OtherLetter,Control) FinalQuote** **12 >>> index (OtherLetter,Control) Format** *** Exception: Error in array index

Constructors

UppercaseLetter Lu: Letter, Uppercase
LowercaseLetter Ll: Letter, Lowercase
TitlecaseLetter Lt: Letter, Titlecase
ModifierLetter Lm: Letter, Modifier
OtherLetter Lo: Letter, Other
NonSpacingMark Mn: Mark, Non-Spacing
SpacingCombiningMark Mc: Mark, Spacing Combining
EnclosingMark Me: Mark, Enclosing
DecimalNumber Nd: Number, Decimal
LetterNumber Nl: Number, Letter
OtherNumber No: Number, Other
ConnectorPunctuation Pc: Punctuation, Connector
DashPunctuation Pd: Punctuation, Dash
OpenPunctuation Ps: Punctuation, Open
ClosePunctuation Pe: Punctuation, Close
InitialQuote Pi: Punctuation, Initial quote
FinalQuote Pf: Punctuation, Final quote
OtherPunctuation Po: Punctuation, Other
MathSymbol Sm: Symbol, Math
CurrencySymbol Sc: Symbol, Currency
ModifierSymbol Sk: Symbol, Modifier
OtherSymbol So: Symbol, Other
Space Zs: Separator, Space
LineSeparator Zl: Separator, Line
ParagraphSeparator Zp: Separator, Paragraph
Control Cc: Other, Control
Format Cf: Other, Format
Surrogate Cs: Other, Surrogate
PrivateUse Co: Other, Private Use
NotAssigned Cn: Other, Not Assigned

generalCategory :: Char -> GeneralCategory Source #

The Unicode general category of the character. This relies on the[Enum](Prelude.html#t:Enum "Prelude") instance of [GeneralCategory](Data-Char.html#t:GeneralCategory "Data.Char"), which must remain in the same order as the categories are presented in the Unicode standard.

Examples

Expand

Basic usage:

>>> **generalCategory 'a'** ****LowercaseLetter >>> **generalCategory 'A'** ****UppercaseLetter >>> **generalCategory '0'** ****DecimalNumber >>> **generalCategory '%'** ****OtherPunctuation >>> **generalCategory '♥'** ****OtherSymbol >>> **generalCategory '\31'** ****Control >>> **generalCategory ' '** ****Space

Case conversion

toUpper :: Char -> Char Source #

Convert a letter to the corresponding upper-case letter, if any. Any other character is returned unchanged.

toLower :: Char -> Char Source #

Convert a letter to the corresponding lower-case letter, if any. Any other character is returned unchanged.

toTitle :: Char -> Char Source #

Convert a letter to the corresponding title-case or upper-case letter, if any. (Title case differs from upper case only for a small number of ligature letters.) Any other character is returned unchanged.

Single digit characters

digitToInt :: Char -> Int Source #

Convert a single digit [Char](Data-Char.html#t:Char "Data.Char") to the corresponding [Int](Data-Int.html#t:Int "Data.Int"). This function fails unless its argument satisfies [isHexDigit](Data-Char.html#v:isHexDigit "Data.Char"), but recognises both upper- and lower-case hexadecimal digits (that is, '0'..'9', 'a'..'f', 'A'..'F').

Examples

Expand

Characters '0' through '9' are converted properly to0..9:

>>> map digitToInt ['0'..'9']** **[0,1,2,3,4,5,6,7,8,9]

Both upper- and lower-case 'A' through 'F' are converted as well, to 10..15.

>>> map digitToInt ['a'..'f']** **[10,11,12,13,14,15] >>> map digitToInt ['A'..'F']** **[10,11,12,13,14,15]

Anything else throws an exception:

>>> digitToInt 'G'** ***** Exception: Char.digitToInt: not a digit 'G' >>> digitToInt '♥'** ***** Exception: Char.digitToInt: not a digit '\9829'

intToDigit :: Int -> Char Source #

Convert an [Int](Data-Int.html#t:Int "Data.Int") in the range 0..15 to the corresponding single digit [Char](Data-Char.html#t:Char "Data.Char"). This function fails on other inputs, and generates lower-case hexadecimal digits.

Numeric representationsString representations

showLitChar :: Char -> ShowS Source #

Convert a character to a string using only printable characters, using Haskell source-language escape conventions. For example:

showLitChar '\n' s = "\n" ++ s

lexLitChar :: ReadS String Source #

Read a string representation of a character, using Haskell source-language escape conventions. For example:

lexLitChar "\nHello" = [("\n", "Hello")]

readLitChar :: ReadS Char Source #

Read a string representation of a character, using Haskell source-language escape conventions, and convert it to the character that it encodes. For example:

readLitChar "\nHello" = [('\n', "Hello")]