Datetime as string in indexing a MultiIndex not always working · Issue #4758 · pandas-dev/pandas (original) (raw)

See http://stackoverflow.com/questions/18643030/groupby-and-multi-indexing/18644920#18644920

Basically, indexing a DateTimeIndex with a string does work in a single index, but not always in a MultiIndex:

In [7]: df= pd.DataFrame({'ACCOUNT':["ACCT1", "ACCT1", "ACCT1", "ACCT2"],
   ...:                   'TICKER':["ABC", "MNP", "XYZ", "XYZ"],
   ...:                   'val':[1,2,3,4]},
   ...:                  index=pd.date_range("2013-06-19 09:30:00", periods=4, freq='5T'))
In [8]: df_multi = df.set_index(['ACCOUNT', 'TICKER'], append=True)
In [9]: df
Out[9]:
                    ACCOUNT TICKER  val
2013-06-19 09:30:00   ACCT1    ABC    1
2013-06-19 09:35:00   ACCT1    MNP    2
2013-06-19 09:40:00   ACCT1    XYZ    3
2013-06-19 09:45:00   ACCT2    XYZ    4

In [10]: df_multi
Out[10]:
                                    val
                    ACCOUNT TICKER
2013-06-19 09:30:00 ACCT1   ABC       1
2013-06-19 09:35:00 ACCT1   MNP       2
2013-06-19 09:40:00 ACCT1   XYZ       3
2013-06-19 09:45:00 ACCT2   XYZ       4

Accessing a row with string in single index works:

In [11]: df.loc['2013-06-19 09:30:00']
Out[11]:
ACCOUNT    ACCT1
TICKER       ABC
val            1
Name: 2013-06-19 09:30:00, dtype: object

In a MultiIndex it doesn't work:

In [12]: df_multi.loc[('2013-06-19 09:30:00', 'ACCT1', 'ABC')]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last) <ipython-input-12-cbb2ba302ffa> in <module>()
----> 1 df_multi.loc[('2013-06-19 09:30:00', 'ACCT1', 'ABC')]

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\indexing.pyc in __getitem__(self, key)
    722     def __getitem__(self, key):
    723         if type(key) is tuple:
--> 724             return self._getitem_tuple(key)
    725         else:
    726             return self._getitem_axis(key, axis=0)

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\indexing.pyc in _getitem_tuple(self, tup)
    285
    286         # no multi-index, so validate all of the indexers
--> 287         self._has_valid_tuple(tup)
    288
    289         # ugly hack for GH #836

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\indexing.pyc in _has_valid_tuple(self, key)
    717             if i >= self.obj.ndim:
    718                 raise ValueError('Too many indexers')
--> 719             if not self._has_valid_type(k,i):
    720                 raise ValueError("Location based indexing can only have [%s] types" % self._valid_types)
    721

c:\users\vdbosscj\scipy\pandas-joris\pandas\core\indexing.pyc in _has_valid_type(self, key, axis)
    792
    793             if not key in ax:
--> 794                 raise KeyError("the label [%s] is not in the [%s]" % (key,self.obj._get_axis_name(axis)))
    795
    796         return True

KeyError: 'the label [ACCT1] is not in the [columns]'

But when only accessing the first two levels of the three levels of the MultiIndex, it does work:

In [13]: df_multi.loc[('2013-06-19 09:30:00', 'ACCT1')]
Out[13]:
        val
TICKER
ABC       1

And when using a Timestamp it also works (as it should be):

In [14]: df_multi.loc[(pd.Timestamp('2013-06-19 09:30:00', tz=None), 'ACCT1', 'ABC')]
Out[14]:
val    1
Name: (2013-06-19 09:30:00, ACCT1, ABC), dtype: int64