Issue 635595: Misleading description of \w in regexs (original) (raw)

In the Regular Expression Syntax doc page (http://www.python.org/dev/doc/devel/lib/re-syntax.html), the description for \w is misleading (the same goes for \W).
The description indicates that, with the locale flag in effect, \w includes "characters defined as letters" for the current locale. In reading that, I took "letters" to mean characters for which isalpha returns true, but, in fact, all characters defined as alphanumerics for the current locale are included (so \w works pretty much the same way with locale flag as with the unicode flag). For example (using '\xb2', the superscript two):

Python 2.2.2 (#37, Oct 14 2002, 17:02:34) [MSC 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.

import locale locale.setlocale(locale.LC_ALL, '') 'English_United States.1252' import re re.match(r'\w', '\xb2', re.L).group() '\xb2'