BUG: fix read_csv to parse timezone correctly by swyoon · Pull Request #22380 · pandas-dev/pandas (original) (raw)
So this is a case I didn't account for when I recently fixed to_datetime
parsings of offsets.
In [1]: data = ['2018-01-04 09:01:00+09:00', '2018-01-04 09:02:00+09:00']
# This output should probably behave like In[5]
In [3]: pd.to_datetime(data, box=False)
Out[3]:
array(['2018-01-04T00:01:00.000000000', '2018-01-04T00:02:00.000000000'],
dtype='datetime64[ns]')
In [4]: data = ['2018-01-04 09:01:00+09:00', '2018-01-04 09:02:00+08:00']
In [5]: pd.to_datetime(data, box=False)
Out[5]:
array([datetime.datetime(2018, 1, 4, 9, 1, tzinfo=tzoffset(None, 32400)),
datetime.datetime(2018, 1, 4, 9, 2, tzinfo=tzoffset(None, 28800))],
dtype=object)
I don't have the patch off the top of my head, but what would happen if instead if the to_datetime
call returned an array of datetime objects? If it returns the expected result, the fix would probably be in to_datetime
instead of the parsing code.