Issue 5239: Change time.strptime() to make it work with Unicode chars (original) (raw)

Created on 2009-02-13 01:46 by ezio.melotti, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
remove_ascii_flag.patch	ocean-city,2009-02-13 16:30

Messages (13)
msg81847 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2009-02-13 01:46
On Py3 strptime("２００９", "%Y") fails: >>> strptime("２００９", "%Y") Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.0/_strptime.py", line 454, in _strptime_time return _strptime(data_string, format)[0] File "/usr/local/lib/python3.0/_strptime.py", line 325, in _strptime (data_string, format)) ValueError: time data '２００９' does not match format '%Y' but non-ascii numbers are supported elsewhere: >>> int("２００９") 2009 >>> re.match("^\d{4}$", "２００９").group() '２００９' The problem seems to be at the line 265 of _strptime.py: return re_compile(self.pattern(format), IGNORECASE \| ASCII) The ASCII flag prevent the regex to work properly with '２００９': >>> re.match("^\d{4}$", "２００９", re.ASCII) >>> I tried to remove the ASCII flag and it worked fine. On Py2.x the problem is the same: >>> strptime(u"２００９", "%Y") Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.5/_strptime.py", line 330, in strptime (data_string, format)) ValueError>>> >>> int(u"２００９") 2009 >>> re.match("^\d{4}$", u"２００９") Here there's probably to add the re.UNICODE flag at the line 265 (untested): return re_compile(self.pattern(format), IGNORECASE	UNICODE) in order to make it work: >>> re.match("^\d{4}$", u"２００９", re.U).group() u'\uff12\uff10\uff10\uff19'
msg81928 - (view)	Author: Hirokazu Yamamoto (ocean-city) *	Date: 2009-02-13 13:52
This patch comes from . I think testcase is needed. I'll try if I can.
msg81932 - (view)	Author: Hirokazu Yamamoto (ocean-city) *	Date: 2009-02-13 14:13
Hmm, this fails on python2 too. Maybe re.ASCII is added for backward compatibility? Again, I'm not familiar with unicode, so I won't call remove_ascii_flag.patch as fix.
msg81934 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-02-13 14:24
> Hmm, this fails on python2 too. Maybe re.ASCII is added for backward > compatibility? Again, I'm not familiar with unicode, so I won't call > remove_ascii_flag.patch as fix. re.ASCII was added to many stdlib modules because I wanted to minimize the potential for breakage when I converted the re library to use unicode matching by default. If it is desireable for strptime() and friends to match unicode digits as well as pure-ASCII digits (which sounds like a reasonable request to me), then re.ASCII can probably be dropped without any regret. (py3k doesn't have to be 100% compatible with python2 :-))
msg81938 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2009-02-13 14:44
I think Py3 with re.ASCII is the same as Py2 without re.UNICODE (and Py3 without re.ASCII is the same as Py2 with re.UNICODE). It's probably a good idea to have a coherent behavior between Py2 and Py3, so if we remove re.ASCII from Py3 we should add re.UNICODE to Py2.
msg81939 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-02-13 14:50
Le vendredi 13 février 2009 à 14:44 +0000, Ezio Melotti a écrit : > It's probably a good idea to have a coherent behavior between Py2 and > Py3, so if we remove re.ASCII from Py3 we should add re.UNICODE to Py2. Removing re.ASCII in py3k is a no-brainer, because unicode is how strings work by default. On the other hand, strings in 2.x are 8-bit, so it would probably be better to keep strptime as is. As I said, py3k doesn't have to be compatible with 2.x, that's even the whole point of it.
msg81940 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2009-02-13 15:27
> Removing re.ASCII in py3k is a no-brainer, because unicode is how > strings work by default. I meant from the line 265 of _strptime.py, not from Python :P
msg81941 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-02-13 15:30
> > Removing re.ASCII in py3k is a no-brainer, because unicode is how > > strings work by default. > > I meant from the line 265 of _strptime.py, not from Python :P That's what I understood.
msg81948 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2009-02-13 16:26
Sorry, I misunderstood the meaning of "no-brainer". If we add re.UNICODE on Py2, strptime should work fine with unicode strings, but it could fail somehow with normal strings. Is it more important to provide a way to use Unicode chars that works only with unicode strings or to have a coherent behavior between str and unicode? I don't think that adding re.UNICODE will break any existing code, but it may cause problems if someone tries to use encoded str instead of unicode (but shouldn't work already). Also note that encoded strings should be a problem only if they have to match a strptime directive (e.g. %Y), the other chars should be compared as they are, so it should work with str and unicode as long as they are not mixed (I think that whitespaces are treated differently though). I'll try to add re.UNICODE and see what happens.
msg81949 - (view)	Author: Hirokazu Yamamoto (ocean-city) *	Date: 2009-02-13 16:30
I added test. But this requires fix to be passed on windows. (I used "\u3000" instead of "\xa0" because "\xa0" cannot be decoded on windows mbcs)
msg81952 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-02-13 16:57
> If we add re.UNICODE on Py2, strptime should work fine with unicode > strings, but it could fail somehow with normal strings. Is it more > important to provide a way to use Unicode chars that works only with > unicode strings or to have a coherent behavior between str and unicode? I'd say the latter, since str and unicode are often interchangeable in 2.x.
msg84665 - (view)	Author: Hirokazu Yamamoto (ocean-city) *	Date: 2009-03-30 21:52
This issue seems to be fixed on py3k by r70755. ()
msg84669 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2009-03-30 21:54
As Hirokazu pointed out, this was fixed.

History
Date	User	Action	Args
2022-04-11 14:56:45	admin	set	github: 49489
2009-03-30 21:54:24	brett.cannon	set	status: open -> closedresolution: fixedmessages: +
2009-03-30 21:52:24	ocean-city	set	messages: +
2009-02-13 19:06:30	brett.cannon	set	nosy: + brett.cannon
2009-02-13 16:57:15	pitrou	set	messages: +
2009-02-13 16:30:50	ocean-city	set	files: - remove_ascii_flag.patch
2009-02-13 16:30:39	ocean-city	set	files: + remove_ascii_flag.patchdependencies: + Fix strftime on windows.messages: +
2009-02-13 16:26:07	ezio.melotti	set	messages: +
2009-02-13 15:30:08	pitrou	set	messages: +
2009-02-13 15:27:52	ezio.melotti	set	messages: +
2009-02-13 14:50:33	pitrou	set	messages: +
2009-02-13 14:44:18	ezio.melotti	set	messages: +
2009-02-13 14:24:37	pitrou	set	messages: +
2009-02-13 14:13:06	ocean-city	set	nosy: + pitroumessages: +
2009-02-13 14:01:30	ezio.melotti	set	title: time.strptime("２００９", "%Y") raises a value error -> Change time.strptime() to make it work with Unicode chars
2009-02-13 13:52:31	ocean-city	set	files: + remove_ascii_flag.patchkeywords: + patchdependencies: + re.IGNORECASE not Unicode-readymessages: + nosy: + ocean-city
2009-02-13 13:43:17	ocean-city	link	issue5240 superseder
2009-02-13 01:47:01	ezio.melotti	set	type: behaviorcomponents: + Library (Lib), Unicodeversions: + Python 2.6, Python 2.5, Python 2.4, Python 3.0, Python 3.1, Python 2.7
2009-02-13 01:46:34	ezio.melotti	create