Issue 5979: strptime() gives inconsistent exceptions (original) (raw)
e.g.
from datetime import datetime
datetime.strptime("19951001", "%Y%m%d") datetime.datetime(1995, 10, 1, 0, 0)
datetime.strptime("19951000", "%Y%m%d") # day = 0, month < 11 ... ValueError: time data '19951000' does not match format '%Y%m%d'
datetime.strptime("19951100", "%Y%m%d") # day = 0, month >= 11 ... ValueError: unconverted data remains: 0
The exception messages are not really a serious issue, but note that the latter one can be confusing for users.
However, there seems to be some underlying issues with the choice to recognize single digit months with double digit days, which can make date strings ambiguous:
Consider "19951100" from above with the last '0' removed.
datetime.strptime("1995110", "%Y%m%d") datetime.datetime(1995, 1, 10, 0, 0)
In this case, strptime has treated the middle '1' as the month, resulting in 1995-01-10. This hints at why the second exception from above gives a strange message: with the extra '0' the day portion of "19951100" (00) is invalid, and strptime falls back on parsing the first 7 characters as above, and then failing due to the remaining '0'.
This seems a little dangerous. For instance: timestamp = "19951101-23:20:18" datestamp=timestamp[:7] # Oops, needed to use :8 to get up to index 7. reallyImportantButWrongDate = datetime.strptime(datestamp, "%Y%m%d")
Note that in this case strptime() from glibc would instead result in an error, which IMHO is the right thing to do.
I do realize, though, that changing the behavior of strptime() could devastate some existing code.
Looks like a bug to me:
datetime.strptime("1", "%d") datetime.datetime(1900, 1, 1, 0, 0)
datetime.strptime('1', '%m') datetime.datetime(1900, 1, 1, 0, 0)
both %m and %d accept single digits but they should not.
datetime.strptime('123', '%m%d') datetime.datetime(1900, 12, 3, 0, 0)
import this .. In the face of ambiguity, refuse the temptation to guess.
But this is exactly how strptime in C. Consider this:
#include <time.h> #include <stdio.h> #include <stdlib.h> #include <string.h>
int main(){
char buf[255];
struct tm tm;
memset(&tm, 0, sizeof(tm));
strptime("123", "%m%d", &tm);
strftime(buf, sizeof(buf), "%d %b %Y %H:%M", &tm);
printf("%s\n", buf);
return 0;
}
This produces output:
03 Dec 1900 00:00
Shouldn't it stay consistent with how C function works?