Issue 22840: strpdate('20141110', '%Y%m%d%H%S') returns wrong date (original) (raw)

Created on 2014-11-10 22:01 by dgorley, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (11)
msg230977 - (view) Author: Doug Gorley (dgorley) Date: 2014-11-10 22:01
strptime() is returning the wrong date if I try to parse today's date (2014-11-10) as a string with no separators, and if I ask strpdate() to look for nonexistent hour and minute fields. >>> datetime.datetime.strptime('20141110', '%Y%m%d').isoformat() '2014-11-10T00:00:00' >>> datetime.datetime.strptime('20141110', '%Y%m%d%H%M').isoformat() '2014-01-01T01:00:00'
msg230979 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-11-10 22:47
What result did you expect?
msg230980 - (view) Author: Doug Gorley (dgorley) Date: 2014-11-10 22:53
I expected the second call to strpdate() to throw an exception, because %Y consumed '2014', %m consumed '11', and %d consumed '10', leaving nothing for %H and %M to match. That would be consistent with the first call.
msg230983 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-11-10 23:29
The documentation certainly appears to say that %m, for example, will consume two digits, but it could just as easily be only for output (i.e. strftime). I suspect this is simply a documentation issue as opposed to a bug, but let's see what the others think.
msg230986 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-11-10 23:50
I have recently closed a similar issue (#5979) as "won't fix". The winning argument there was that Python behavior was consistent with C. How does C strptime behave in this case?
msg230988 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-11-11 00:00
With the following C code: #include <time.h> #include <stdio.h> #include <stdlib.h> #include <string.h> int main(){ char buf[255]; struct tm tm; memset(&tm, 0, sizeof(tm)); strptime("20141110", "%Y%m%d%H%M", &tm); strftime(buf, sizeof(buf), "%Y-%m-%d %H:%M", &tm); printf("%s\n", buf); return 0; } I get $ ./a.out 2014-11-10 00:00 So I think Python behavior is wrong.
msg230989 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-11-11 00:03
Here is the case that I think illustrates the current logic better: >>> datetime.strptime("20141234", "%Y%m%d%H%M") datetime.datetime(2014, 1, 2, 3, 4)
msg230991 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-11-11 00:07
Looking at the POSIX standard http://pubs.opengroup.org/onlinepubs/009695399/functions/strptime.html It appears that Python may be compliant: %H The hour (24-hour clock) [00,23]; leading zeros are permitted but not required. %m The month number [01,12]; leading zeros are permitted but not required. %M The minute [00,59]; leading zeros are permitted but not required.
msg230993 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-11-11 00:13
Here is another interesting bit from the standard: "The application shall ensure that there is white-space or other non-alphanumeric characters between any two conversion specifications." This is how they get away from not specifying whether parser of variable width fields should be greedy or not.
msg231028 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-11-11 15:34
strptime very much follows the POSIX standard as I implemented strptime by reading that doc. If you want to see how the behaviour is implemented you can look at https://hg.python.org/cpython/file/ac0334665459/Lib/_strptime.py#l178 . But the key thing here is that the OP has unused formatters. Since strptime uses regexes underneath the hood, the re module does its best to match the entire format. Since POSIX says that e.g. the leading 0 for %m is optional, the regex goes with the single digit version to let the %H format match _something_ (same goes for %d and %M). So without rewriting strptime to not use regexes to support unused formatters and to stop being so POSIX-compliant, I don't see how to change the behaviour. Plus it would be backwards-incompatible as this is how strptime has worked in 2002. It's Alexander's call, but I vote to close this as "not a bug".
msg231029 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014-11-11 15:57
After reading the standard a few more times, I agree with Brett and Ethan that this is at most a call for better documentation. I'll leave this open for a chance that someone will come up with a succinct description of what exactly datetime.strptime does. (Maybe we should just document the format to regexp translation implemented in _strptime.py.) We may also include POSIX's directive "The application shall ensure that there is white-space or other non-alphanumeric characters between any two conversion specifications" as a recommendation.
History
Date User Action Args
2022-04-11 14:58:10 admin set github: 67029
2015-03-01 19:58:23 belopolsky set status: open -> closedresolution: not a bugstage: resolved
2014-11-11 15:57:28 belopolsky set versions: + Python 3.5, - Python 3.4messages: + assignee: belopolskycomponents: + Documentation, - Library (Lib)type: behavior -> enhancement
2014-11-11 15:34:18 brett.cannon set nosy: + brett.cannonmessages: +
2014-11-11 00:13:30 belopolsky set messages: +
2014-11-11 00:07:44 belopolsky set messages: +
2014-11-11 00:03:51 belopolsky set messages: +
2014-11-11 00:00:02 belopolsky set messages: +
2014-11-10 23:50:28 belopolsky set messages: +
2014-11-10 23:29:29 ethan.furman set nosy: + lemburg, belopolskymessages: +
2014-11-10 22:53:18 dgorley set messages: +
2014-11-10 22:47:43 ethan.furman set nosy: + ethan.furmanmessages: +
2014-11-10 22:01:31 dgorley create