ENH: fastpath indexer API proposal (draft) · Issue #6328 · pandas-dev/pandas (original) (raw)

The discussion in #6134 has inspired an idea that I'm writing down for
discussion. The idea is pretty obvious so it should've been considered before,
but I still think pandas as it is right now can benefit from it.

My main complaint about pandas when using it in non-interactive way is that
lookups are significantly slower than with ndarray containers. I do realize
that this happens because of many ways the indexing may be done, but at some
point I've really started thinking about ditching pandas in some
performance-critical paths of my project and replacing them with the dreadful
dict/ndarray combo. Not only doing arr = df.values[df.idx.get_loc[key]]
gets old pretty fast but it's also slower when the frame contains different
dtypes and then you need to go deeper to fix that.

Now I thought what if this slowdown can be reduced by creating fastpath indexers that look like the IndexSlice from #6134 and would convey a
message to pandas indexing facilities, like "trust me, I've done all the
preprocessing, just look it up already". I'm talking about something like that
(the names are arbitrary and chosen for illustrative purposes only):

masked_rows = df.fastloc[pd.bool_slice[bool_array]]

or

masked_rows = df.fastloc[pd.bool_series_slice[bool_series]]

or

rows_3_and_10 = df.fastloc[pd.pos_slice[3, 10]]

or

rows_3_through_10 = df.fastloc[pd.range_slice[3:10]]

or

rows_for_two_days = df.fastloc[pd.tpos_slice['2014-01-01', '2014-01-08']]

Given the actual slice objects will have a common base class, the
implementation could be as easy as:

class FastLocAttribute(object): def init(self, container): self._container = container

def __getitem__(self, smth):
    if not isinstance(smth, FastpathIndexer):
        raise TypeError("Indexing object is not a FastpathIndexer")

    # open to custom FastpathIndexer implementations
    return smth.getitem(self._container)
    # or a better encapsulated, but not so open
    return self._container._index_method[type(smth)](smth)

Cons:

Pros: