Datetime as string in indexing a MultiIndex not always working · Issue #4758 · pandas-dev/pandas (original) (raw)
See http://stackoverflow.com/questions/18643030/groupby-and-multi-indexing/18644920#18644920
Basically, indexing a DateTimeIndex with a string does work in a single index, but not always in a MultiIndex:
In [7]: df= pd.DataFrame({'ACCOUNT':["ACCT1", "ACCT1", "ACCT1", "ACCT2"],
...: 'TICKER':["ABC", "MNP", "XYZ", "XYZ"],
...: 'val':[1,2,3,4]},
...: index=pd.date_range("2013-06-19 09:30:00", periods=4, freq='5T'))
In [8]: df_multi = df.set_index(['ACCOUNT', 'TICKER'], append=True)
In [9]: df
Out[9]:
ACCOUNT TICKER val
2013-06-19 09:30:00 ACCT1 ABC 1
2013-06-19 09:35:00 ACCT1 MNP 2
2013-06-19 09:40:00 ACCT1 XYZ 3
2013-06-19 09:45:00 ACCT2 XYZ 4
In [10]: df_multi
Out[10]:
val
ACCOUNT TICKER
2013-06-19 09:30:00 ACCT1 ABC 1
2013-06-19 09:35:00 ACCT1 MNP 2
2013-06-19 09:40:00 ACCT1 XYZ 3
2013-06-19 09:45:00 ACCT2 XYZ 4
Accessing a row with string in single index works:
In [11]: df.loc['2013-06-19 09:30:00']
Out[11]:
ACCOUNT ACCT1
TICKER ABC
val 1
Name: 2013-06-19 09:30:00, dtype: object
In a MultiIndex it doesn't work:
In [12]: df_multi.loc[('2013-06-19 09:30:00', 'ACCT1', 'ABC')]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last) <ipython-input-12-cbb2ba302ffa> in <module>()
----> 1 df_multi.loc[('2013-06-19 09:30:00', 'ACCT1', 'ABC')]
c:\users\vdbosscj\scipy\pandas-joris\pandas\core\indexing.pyc in __getitem__(self, key)
722 def __getitem__(self, key):
723 if type(key) is tuple:
--> 724 return self._getitem_tuple(key)
725 else:
726 return self._getitem_axis(key, axis=0)
c:\users\vdbosscj\scipy\pandas-joris\pandas\core\indexing.pyc in _getitem_tuple(self, tup)
285
286 # no multi-index, so validate all of the indexers
--> 287 self._has_valid_tuple(tup)
288
289 # ugly hack for GH #836
c:\users\vdbosscj\scipy\pandas-joris\pandas\core\indexing.pyc in _has_valid_tuple(self, key)
717 if i >= self.obj.ndim:
718 raise ValueError('Too many indexers')
--> 719 if not self._has_valid_type(k,i):
720 raise ValueError("Location based indexing can only have [%s] types" % self._valid_types)
721
c:\users\vdbosscj\scipy\pandas-joris\pandas\core\indexing.pyc in _has_valid_type(self, key, axis)
792
793 if not key in ax:
--> 794 raise KeyError("the label [%s] is not in the [%s]" % (key,self.obj._get_axis_name(axis)))
795
796 return True
KeyError: 'the label [ACCT1] is not in the [columns]'
But when only accessing the first two levels of the three levels of the MultiIndex, it does work:
In [13]: df_multi.loc[('2013-06-19 09:30:00', 'ACCT1')]
Out[13]:
val
TICKER
ABC 1
And when using a Timestamp it also works (as it should be):
In [14]: df_multi.loc[(pd.Timestamp('2013-06-19 09:30:00', tz=None), 'ACCT1', 'ABC')]
Out[14]:
val 1
Name: (2013-06-19 09:30:00, ACCT1, ABC), dtype: int64