[Python-ideas] adding a casefold() method to str (original) (raw)
Steven D'Aprano steve at pearwood.info
Sun Jan 8 17:58:17 CET 2012
- Previous message: [Python-ideas] adding a casefold() method to str
- Next message: [Python-ideas] adding a casefold() method to str
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Benjamin Peterson wrote:
Hi, Casefolding (Unicode Standard 3.13) is a more aggressive version of lowercasing. It's purpose to assist in the implementation of caseless mapping. For example, under lowercase "ß" -> "ß" but under casefolding "ß" -> "ss". I propose we add a casefold() method. So, case-insensitive matching should really be "one.casefold() == two.casefold()" rather than "one.lower() == two.lower()".
+1 in principle, but in practice case folding is more complicated than a single method might imply. The most obvious complication is treatment of dotted and dotless I.
See, for example:
http://unicode.org/Public/UNIDATA/CaseFolding.txt http://www.w3.org/International/wiki/Case_folding http://en.wikipedia.org/wiki/Letter_case#Unicode_case_folding_and_script_identification
So while having proper Unicode case-folding is desirable, I don't know how simple it is to implement.
Would it be appropriate for casefold() to take an optional argument as to which mappings to use? E.g. something like:
str.casefold() # defaults to simple folding str.casefold(string.SIMPLE & string.TURKIC) str.casefold(string.FULL)
or should str.casefold() only apply simple folding, with the others combinations relegated to a function in a module somewhere?
I count 4 possible functions:
simple casefolding, without Turkic I full casefolding, without Turkic I simple casefolding, with Turkic I full casefolding, with Turkic I
-- Steven
- Previous message: [Python-ideas] adding a casefold() method to str
- Next message: [Python-ideas] adding a casefold() method to str
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]