BUG: fix read_csv to parse timezone correctly by swyoon · Pull Request #22380 · pandas-dev/pandas (original) (raw)

So this is a case I didn't account for when I recently fixed to_datetime parsings of offsets.

In [1]: data = ['2018-01-04 09:01:00+09:00', '2018-01-04 09:02:00+09:00']

# This output should probably behave like In[5]
In [3]: pd.to_datetime(data, box=False)
Out[3]:
array(['2018-01-04T00:01:00.000000000', '2018-01-04T00:02:00.000000000'],
      dtype='datetime64[ns]')

In [4]: data = ['2018-01-04 09:01:00+09:00', '2018-01-04 09:02:00+08:00']

In [5]: pd.to_datetime(data, box=False)
Out[5]:
array([datetime.datetime(2018, 1, 4, 9, 1, tzinfo=tzoffset(None, 32400)),
       datetime.datetime(2018, 1, 4, 9, 2, tzinfo=tzoffset(None, 28800))],
      dtype=object)

I don't have the patch off the top of my head, but what would happen if instead if the to_datetime call returned an array of datetime objects? If it returns the expected result, the fix would probably be in to_datetime instead of the parsing code.