Indexing into Series of tz-aware datetime64s fails using getitem (original) (raw)

I'm a huge fan of Pandas. Thanks for all the hard work!

I believe I have stumbled across a small bug in Pandas 0.17.1 which was not present in 0.16.2. Indexing into Series of timezone-aware datetime64s fails using __getitem__ but indexing succeeds if the datetime64s are timezone-naive. Here is a minimal code example and the exception produced by Pandas 0.17.1:

In [37]: dates_with_tz = pd.date_range("2011-01-01", periods=3, tz="US/Eastern")

In [46]: dates_with_tz Out[46]: DatetimeIndex(['2011-01-01 00:00:00-05:00', '2011-01-02 00:00:00-05:00', '2011-01-03 00:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', freq='D')

In [38]: s_with_tz = pd.Series(dates_with_tz, index=['a', 'b', 'c'])

In [39]: s_with_tz Out[39]: a 2011-01-01 00:00:00-05:00 b 2011-01-02 00:00:00-05:00 c 2011-01-03 00:00:00-05:00 dtype: datetime64[ns, US/Eastern]

In [40]: s_with_tz['a']

IndexError Traceback (most recent call last) in () ----> 1 s_with_tz['a']

/usr/local/lib/python2.7/dist-packages/pandas/core/series.pyc in getitem(self, key) 555 def getitem(self, key): 556 try: --> 557 result = self.index.get_value(self, key) 558 559 if not np.isscalar(result):

/usr/local/lib/python2.7/dist-packages/pandas/core/index.pyc in get_value(self, series, key) 1778 s = getattr(series,'_values',None) 1779 if isinstance(s, Index) and lib.isscalar(key): -> 1780 return s[key] 1781 1782 s = _values_from_object(series)

/usr/local/lib/python2.7/dist-packages/pandas/tseries/base.pyc in getitem(self, key) 98 getitem = self._data.getitem 99 if np.isscalar(key): --> 100 val = getitem(key) 101 return self._box_func(val) 102 else:

IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

If the dates are timezone-aware then we can access them using loc but, as far as I'm aware, we should be able to use __getitem__ in this situation too:

In [41]: s_with_tz.loc['a'] Out[41]: Timestamp('2011-01-01 00:00:00-0500', tz='US/Eastern')

However, if the dates are timezone-naive then indexing using __getitem__ works as expected:

In [32]: dates_naive = pd.date_range("2011-01-01", periods=3)

In [33]: dates_naive Out[33]: DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03'], dtype='datetime64[ns]', freq='D')

In [34]: s = pd.Series(dates_naive, index=['a', 'b', 'c'])

In [35]: s Out[35]: a 2011-01-01 b 2011-01-02 c 2011-01-03 dtype: datetime64[ns]

In [36]: s['a'] Out[36]: Timestamp('2011-01-01 00:00:00')

So indexing into a Series using __getitem__ works if the data is a list of timezone-naive datetime64s but indexing fails if the datetime64s are timezone-aware.

In [47]: pd.show_versions()

INSTALLED VERSIONS

commit: None python: 2.7.10.final.0 python-bits: 64 OS: Linux OS-release: 4.2.0-23-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8

pandas: 0.17.1 nose: 1.3.7 pip: 1.5.6 setuptools: 15.2 Cython: 0.23.1 numpy: 1.10.1 scipy: 0.16.1 statsmodels: 0.6.1 IPython: 4.0.0 sphinx: 1.2.3 patsy: 0.3.0 dateutil: 2.4.2 pytz: 2015.7 blosc: None bottleneck: None tables: 3.2.2 numexpr: 2.4.6 matplotlib: 1.4.3 openpyxl: None xlrd: 0.9.2 xlwt: None xlsxwriter: None lxml: None bs4: 4.3.2 html5lib: 0.999 httplib2: 0.9 apiclient: None sqlalchemy: None pymysql: None psycopg2: 2.5.3 (dt dec pq3 ext) Jinja2: None