[Python-3000] string module trimming (original) (raw)
Guido van Rossum guido at python.org
Tue Apr 17 23:28:00 CEST 2007
- Previous message: [Python-3000] string module trimming
- Next message: [Python-3000] string module trimming
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 4/17/07, Christian Heimes <lists at cheimes.de> wrote:
Neal Norwitz schrieb: > I don't have any plans, just considering options. Move them > somewhere? Perhaps, trim the ones that are unused. In a unicode > world, I'm not sure how much some of these make sense. letters stands > out more than others. I don't know enough about unicode to know if > digits or whitespace can be diff.
What do you think about replacing the definitions by information from the unicode character properties database. The information are available somewhere in Python: http://docs.python.org/lib/re-syntax.html \w ... With LOCALE, it will match the set [0-9] plus whatever characters are defined as alphanumeric for the current locale. If UNICODE is set, this will match the characters [0-9] plus whatever is classified as alphanumeric in the Unicode character properties database.
Yes, unicode.islower() and friends have this information.
It would be silly to set e.g. letters to a string of all unicode letters -- that would be a string of 46618 characters! Similar, there are 304 unicode digits. (And this is in a narrow Unicode build, only supporting the basic Unicode plane, 0--2**16!)
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
- Previous message: [Python-3000] string module trimming
- Next message: [Python-3000] string module trimming
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]