REF/PERF: MultiIndex.get_locs to use boolean arrays internally by lukemanley · Pull Request #46330 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation13 Commits9 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
- Tests added and passed if fixing a bug or adding a new feature
- All code checks passed.
- Added an entry in the latest
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.
Use boolean arrays internally within MultiIndex.get_locs
rather than int64 indexes. Logical operations show performance improvements over intersecting int64 indexes. The output remains an integer positional indexer.
before after ratio
[17dda440] [94121581]
<main> <multiindex-get-locs-bool-arrays>
- 563±10μs 519±8μs 0.92 indexing.MultiIndexing.time_loc_all_scalars(True)
- 33.9±0.6ms 30.4±0.4ms 0.89 indexing.MultiIndexing.time_loc_all_null_slices(True)
- 38.7±1ms 34.5±1ms 0.89 indexing.MultiIndexing.time_loc_all_null_slices(False)
- 1.62±0.02ms 1.43±0.01ms 0.88 indexing.MultiIndexing.time_loc_all_slices(True)
- 6.40±0.06ms 5.53±0.1ms 0.86 indexing.MultiIndexing.time_loc_all_bool_indexers(True)
- 107±1ms 41.6±0.6ms 0.39 indexing.MultiIndexing.time_loc_all_lists(True)
- 34.6±0.8ms 8.24±0.4ms 0.24 indexing.MultiIndexing.time_loc_all_slices(False)
- 236±4ms 23.6±0.2ms 0.10 indexing.MultiIndexing.time_loc_all_lists(False)
- 97.3±0.7ms 9.16±0.4ms 0.09 indexing.MultiIndexing.time_loc_null_slice_plus_slice(False)
- 36.5±0.5ms 1.36±0.03ms 0.04 indexing.MultiIndexing.time_loc_null_slice_plus_slice(True)
# if we have a provided indexer, then this need not consider |
---|
# the entire labels set |
if step is not None and step < 0: |
# Switch elements for negative step size |
start, stop = stop - 1, start - 1 |
r = np.arange(start, stop, step) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add an explanation (similar to the below) say around L3160, e.g. for a future reader to understand what this algorithm is doing.
yep needs a rebase :->
rebased this one
@@ -310,7 +310,7 @@ Performance improvements |
---|
- Performance improvement in :meth:`.GroupBy.diff` (:issue:`16706`) |
- Performance improvement in :meth:`.GroupBy.transform` when broadcasting values for user-defined functions (:issue:`45708`) |
- Performance improvement in :meth:`.GroupBy.transform` for user-defined functions when only a single group exists (:issue:`44977`) |
- Performance improvement in :meth:`MultiIndex.get_locs` (:issue:`45681`, :issue:`46040`) |
- Performance improvement in :meth:`MultiIndex.get_locs` (:issue:`45681`, :issue:`46040`, :issue:`46330`) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most users dont use get_locs directly; is there a more user-facing description?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this:
Performance improvement in :meth:DataFrame.loc
and :meth:Series.loc
for tuple-based indexing of a :class:MultiIndex
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, thanks
can you merge master once again
@jreback - merged main and greenish. I don't think the error is related as I see it showing up in other PRs as well
) |
---|
indexer &= lvl_indexer |
if not np.any(indexer) and np.any(lvl_indexer): |
raise KeyError(seq) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this hit by tests?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, covered by test_loc.py > test_missing_key_combination
lukemanley deleted the multiindex-get-locs-bool-arrays branch
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request