Issue 690974: re.LOCALE, umlaut and \w (original) (raw)

I submit this problem although I am not sure it is a real bug. It could be that I don't know how this locale stuff works.

Anyway, I have been browsing around quite some time on the net to find some good examples of code demonstating how to use regexp in python to get hold of åäö when using \w, but I have not found any complete examples.

If the code below behaves correctly, I suggest that the regexp documentation is improved by adding a complete example that shows how to use re.LOCALE. (The code behaves in the same way with python 2.2.2.)

#---------------------------------------- import locale locale.setlocale(locale.LC_ALL,'swedish') import re reguml=re.compile(r"[a-zä]", re.LOCALE) # I expect reguml and regw to give the same result. regw=re.compile(r"\w", re.LOCALE) reguml2=re.compile(r"[a-zä]+", re.LOCALE) # I expect reguml2 and regw2 to give the same result. regw2=re.compile(r"[\w]+", re.LOCALE) str="abcä d\344e ä f ";

print reguml.findall(str) # Behaves as I expect. print regw.findall(str) # Here I expect same result as above, but I don't get it. print reguml2.findall(str) # Behaves as I expect. print regw2.findall(str) # Behaves as I expect. #----------------------------------------

import locale locale.setlocale(locale.LC_ALL,'swedish') 'swedish' import re reguml=re.compile(r"[a-zä]", re.LOCALE) # I expect reguml and regw to give the same result. regw=re.compile(r"\w", re.LOCALE) reguml2=re.compile(r"[a-zä]+", re.LOCALE) # I expect reguml2 and regw2 to give the same result. regw2=re.compile(r"[\w]+", re.LOCALE) str="abcä d\344e ä f ";

print reguml.findall(str) # Behaves as I expect. ['a', 'b', 'c', '\xe4', 'd', '\xe4', 'e', '\xe4', 'f'] print regw.findall(str) # Here I expect same result as above, but I don't get it. ['a', 'b', 'c', 'd', 'e', 'f'] print reguml2.findall(str) # Behaves as I expect. ['abc\xe4', 'd\xe4e', '\xe4', 'f'] print regw2.findall(str) # Behaves as I expect. ['abc\xe4', 'd\xe4e', '\xe4', 'f']


peternl:Python-2.3a2>> /work1/pkg/dev-tools/python/2.3a2/bin/python -V Python 2.3a2 peternl:Python-2.3a2>>uname -a Linux peternl.computervision.se 2.4.18-6mdk-petern #2 Thu May 23 06:40:30 CEST 2002 i686 unknown