Infer datetime format by danbirken · Pull Request #6021 · pandas-dev/pandas (original) (raw)

  1. parse_dates already supports a wide variety of input formats, which makes squeezing in something else more complicated.
  2. Since this is theoretically going from not-enabled to enabled by default, having it be a separate field is really nice because we can flip the one boolean value and that is that. If it were in parse_dates, then to flip it we would probably have to add in something like parse_dates='not_infer' (and keep supporting parse_dates='infer') for the people who explicitly want to opt-out for whatever reason, which would be really confusing.

However, the situation isn't perfect. It will still mess up cases a human wouldn't:

In [4]: tools._guess_datetime_format('01:01 2011/01/01')
Out[4]: '%m:%d %Y/%H/%M'  # wrong!

In [6]: tools._guess_datetime_format('00:00 2011/01/01')
Out[6]: '%H:%M %Y/%m/%d'  # right!

But sentinel values don't actually improve this case, this is just a problem with the current guessing method. However, this is a pretty rare edge case, as pretty much every standard datetime format puts the Y-m-d information first, which is what the guesser expects.

So in conclusion, I think the sentinel values of 0 are actually perfectly good and I can't think of any case where they cause the guesser to do the wrong thing.


New questions:

Assuming everybody is content with adding the infer_datetime_format keyword to read_csv, should I also add this to Series.from_csv and DataFrame.from_csv?