Issue 5239: Change time.strptime() to make it work with Unicode chars (original) (raw)

Created on 2009-02-13 01:46 by ezio.melotti, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
remove_ascii_flag.patch ocean-city,2009-02-13 16:30
Messages (13)
msg81847 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-02-13 01:46
On Py3 strptime("2009", "%Y") fails: >>> strptime("2009", "%Y") Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.0/_strptime.py", line 454, in _strptime_time return _strptime(data_string, format)[0] File "/usr/local/lib/python3.0/_strptime.py", line 325, in _strptime (data_string, format)) ValueError: time data '2009' does not match format '%Y' but non-ascii numbers are supported elsewhere: >>> int("2009") 2009 >>> re.match("^\d{4}$", "2009").group() '2009' The problem seems to be at the line 265 of _strptime.py: return re_compile(self.pattern(format), IGNORECASE | ASCII) The ASCII flag prevent the regex to work properly with '2009': >>> re.match("^\d{4}$", "2009", re.ASCII) >>> I tried to remove the ASCII flag and it worked fine. On Py2.x the problem is the same: >>> strptime(u"2009", "%Y") Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.5/_strptime.py", line 330, in strptime (data_string, format)) ValueError>>> >>> int(u"2009") 2009 >>> re.match("^\d{4}$", u"2009") Here there's probably to add the re.UNICODE flag at the line 265 (untested): return re_compile(self.pattern(format), IGNORECASE UNICODE) in order to make it work: >>> re.match("^\d{4}$", u"2009", re.U).group() u'\uff12\uff10\uff10\uff19'
msg81928 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-02-13 13:52
This patch comes from . I think testcase is needed. I'll try if I can.
msg81932 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-02-13 14:13
Hmm, this fails on python2 too. Maybe re.ASCII is added for backward compatibility? Again, I'm not familiar with unicode, so I won't call remove_ascii_flag.patch as *fix*.
msg81934 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-02-13 14:24
> Hmm, this fails on python2 too. Maybe re.ASCII is added for backward > compatibility? Again, I'm not familiar with unicode, so I won't call > remove_ascii_flag.patch as *fix*. re.ASCII was added to many stdlib modules because I wanted to minimize the potential for breakage when I converted the re library to use unicode matching by default. If it is desireable for strptime() and friends to match unicode digits as well as pure-ASCII digits (which sounds like a reasonable request to me), then re.ASCII can probably be dropped without any regret. (py3k doesn't have to be 100% compatible with python2 :-))
msg81938 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-02-13 14:44
I think Py3 with re.ASCII is the same as Py2 without re.UNICODE (and Py3 without re.ASCII is the same as Py2 with re.UNICODE). It's probably a good idea to have a coherent behavior between Py2 and Py3, so if we remove re.ASCII from Py3 we should add re.UNICODE to Py2.
msg81939 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-02-13 14:50
Le vendredi 13 février 2009 à 14:44 +0000, Ezio Melotti a écrit : > It's probably a good idea to have a coherent behavior between Py2 and > Py3, so if we remove re.ASCII from Py3 we should add re.UNICODE to Py2. Removing re.ASCII in py3k is a no-brainer, because unicode is how strings work by default. On the other hand, strings in 2.x are 8-bit, so it would probably be better to keep strptime as is. As I said, py3k doesn't have to be compatible with 2.x, that's even the whole point of it.
msg81940 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-02-13 15:27
> Removing re.ASCII in py3k is a no-brainer, because unicode is how > strings work by default. I meant from the line 265 of _strptime.py, not from Python :P
msg81941 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-02-13 15:30
> > Removing re.ASCII in py3k is a no-brainer, because unicode is how > > strings work by default. > > I meant from the line 265 of _strptime.py, not from Python :P That's what I understood.
msg81948 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2009-02-13 16:26
Sorry, I misunderstood the meaning of "no-brainer". If we add re.UNICODE on Py2, strptime should work fine with unicode strings, but it could fail somehow with normal strings. Is it more important to provide a way to use Unicode chars that works only with unicode strings or to have a coherent behavior between str and unicode? I don't think that adding re.UNICODE will break any existing code, but it may cause problems if someone tries to use encoded str instead of unicode (but shouldn't work already). Also note that encoded strings should be a problem only if they have to match a strptime directive (e.g. %Y), the other chars should be compared as they are, so it should work with str and unicode as long as they are not mixed (I think that whitespaces are treated differently though). I'll try to add re.UNICODE and see what happens.
msg81949 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-02-13 16:30
I added test. But this requires fix to be passed on windows. (I used "\u3000" instead of "\xa0" because "\xa0" cannot be decoded on windows mbcs)
msg81952 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-02-13 16:57
> If we add re.UNICODE on Py2, strptime should work fine with unicode > strings, but it could fail somehow with normal strings. Is it more > important to provide a way to use Unicode chars that works only with > unicode strings or to have a coherent behavior between str and unicode? I'd say the latter, since str and unicode are often interchangeable in 2.x.
msg84665 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-03-30 21:52
This issue seems to be fixed on py3k by r70755. ()
msg84669 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2009-03-30 21:54
As Hirokazu pointed out, this was fixed.
History
Date User Action Args
2022-04-11 14:56:45 admin set github: 49489
2009-03-30 21:54:24 brett.cannon set status: open -> closedresolution: fixedmessages: +
2009-03-30 21:52:24 ocean-city set messages: +
2009-02-13 19:06:30 brett.cannon set nosy: + brett.cannon
2009-02-13 16:57:15 pitrou set messages: +
2009-02-13 16:30:50 ocean-city set files: - remove_ascii_flag.patch
2009-02-13 16:30:39 ocean-city set files: + remove_ascii_flag.patchdependencies: + Fix strftime on windows.messages: +
2009-02-13 16:26:07 ezio.melotti set messages: +
2009-02-13 15:30:08 pitrou set messages: +
2009-02-13 15:27:52 ezio.melotti set messages: +
2009-02-13 14:50:33 pitrou set messages: +
2009-02-13 14:44:18 ezio.melotti set messages: +
2009-02-13 14:24:37 pitrou set messages: +
2009-02-13 14:13:06 ocean-city set nosy: + pitroumessages: +
2009-02-13 14:01:30 ezio.melotti set title: time.strptime("2009", "%Y") raises a value error -> Change time.strptime() to make it work with Unicode chars
2009-02-13 13:52:31 ocean-city set files: + remove_ascii_flag.patchkeywords: + patchdependencies: + re.IGNORECASE not Unicode-readymessages: + nosy: + ocean-city
2009-02-13 13:43:17 ocean-city link issue5240 superseder
2009-02-13 01:47:01 ezio.melotti set type: behaviorcomponents: + Library (Lib), Unicodeversions: + Python 2.6, Python 2.5, Python 2.4, Python 3.0, Python 3.1, Python 2.7
2009-02-13 01:46:34 ezio.melotti create