Issue 635595: Misleading description of \w in regexs (original) (raw)
In the Regular Expression Syntax doc page
(http://www.python.org/dev/doc/devel/lib/re-syntax.html), the
description for \w is misleading (the same goes for \W).
The description indicates that, with the locale flag in effect,
\w includes "characters defined as letters" for the current
locale. In reading that, I took "letters" to mean characters
for which isalpha returns true, but, in fact, all characters
defined as alphanumerics for the current locale are
included (so \w works pretty much the same way with locale
flag as with the unicode flag). For example (using '\xb2',
the superscript two):
Python 2.2.2 (#37, Oct 14 2002, 17:02:34) [MSC 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
import locale locale.setlocale(locale.LC_ALL, '') 'English_United States.1252' import re re.match(r'\w', '\xb2', re.L).group() '\xb2'