Issue 14045: In regex pattern long unicode character isn't recognized by repetition characters +, * and {} (original) (raw)

Issue14045

Created on 2012-02-18 03:13 by py.user, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (3)
msg153629 - (view) Author: py.user (py.user) * Date: 2012-02-18 03:13
>>> import re >>> '\U00000061' 'a' >>> '\U00100061' '\U00100061' >>> re.search('\U00100061', '\U00100061' * 10).group() '\U00100061' >>> re.search('\U00100061+', '\U00100061' * 10).group() '\U00100061' >>> re.search('(\U00100061)+', '\U00100061' * 10).group() '\U00100061\U00100061\U00100061\U00100061\U00100061\U00100061\U00100061\U00100061\U00100061\U00100061' >>> >>> >>> re.search('\U00100061{3}', '\U00100061' * 10) >>> re.search('(\U00100061){3}', '\U00100061' * 10).group() '\U00100061\U00100061\U00100061' >>>
msg153630 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012-02-18 03:26
The re module doesn't support non-BMP characters in Python 3.2 compiled in narrow mode (sys.maxunicode==65535). This issue is already fixed in Python 3.3 which doesn't have narrow or wide mode anymore thanks to the PEP 393!
msg153631 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-02-18 03:53
As Victor says, this issue is fixed in Python 3.3.
History
Date User Action Args
2022-04-11 14:57:26 admin set github: 58253
2012-02-18 03:53:35 loewis set status: open -> closedresolution: fixedmessages: +
2012-02-18 03:26:19 vstinner set nosy: + loewis, vstinnermessages: +
2012-02-18 03:13:37 py.user create