Issue 33941: datetime.strptime not able to recognize invalid date formats (original) (raw)
Can not recognize invalid date values for %Y%m%d, %y%m%d, %Y%m%d %H:%M and few more formats. In Java we have setLenient option which help us to validate to pattern and convert only valid formats Ex: datetime.strptime('181223', '%Y%m%d') For above input I am getting output as 1812-02-03 00:00:00 but expected output is error as ValueError: time data '181223' does not match format '%Y%m%d'
I tested below mentioned 4 modules. All modules giving same output
datetime.strptime
timestring.Date
parser.parse from dateutil
dateparser.parse
I looked a bit at _strptime.py and the corresponding tests and thought I would share my notes.
The regular expressions clearly allow non-zero padded values for both %d and %m matches. There is one test where the following is run: time.strptime("Mar 1", "%b %d"). So it seems intentional that %d and %m allow non-zero padded values.
It also just occurred to me that the example '181223' isn't ambiguous as %Y requires 4 digits and months cannot be more than 12. So it seems to me this could only be Y=1812,M=2,D=3.
There do exist cases in which they are truly ambiguous for non-zero padded values. For instance, 2018111 could potentially be 2018-Nov-1 or 2018-Jan-11. Python will deterministically take the most possible for the next value, so this will be November 11, 2018. Though, there is really no reason I can figure that can be assumed.
The edits required to stop allowing non-zero padded values were pretty straightforward and only one unit test (one that verifies 'Mar 1' comes after 'Feb 29') had to be altered. That may point more to a need to add additional tests though than an endorsement that no one is using single digit day or month values.