API: Implement new indexing behavior for intervals by jschendel · Pull Request #27100 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation69 Commits13 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks really good, some comments.
if you have other issues you are closing add them as closes in the top of the PR
[1, -1])]) |
---|
def test_get_indexer_with_interval(self, query, expected): |
tuples = [(0, 2.5), (1, 3), (2, 4)] |
tuples = [(0, 2), (2, 4), (5, 7)] |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change here was to have a non-overlapping index?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, since overlapping indexes are invalid for get_indexer
and need to use get_indexer_non_unique
instead. For intervals, overlapping is the analogue for uniqueness, as uniqueness isn't a strong enough condition since a query on a unique IntervalIndex
can still have multiple values returned if they come from the overlap of 2+ intervals.
It would also be good to add some documentation on indexing in the advanced.rst interval section (maybe copy some example from the whatsnew you added)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updates look good!
ghost mentioned this pull request
@@ -965,6 +964,32 @@ If you select a label *contained* within an interval, this will also select the |
---|
df.loc[2.5] |
df.loc[[2.5, 3.5]] |
Selecting using an ``Interval`` will only return exact matches. |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a versionchanged 0.25 here
Indexing an ``IntervalIndex`` with ``Interval`` objects |
---|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
Indexing methods for :class:`IntervalIndex` have been modified to require exact matches only for :class:`Interval` queries. |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a link the docs you added above in indexing
elif lhs == -1: |
# check that target IntervalIndex is compatible |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe worth it to make a helper method for indexing with an II (as this is the same code as in get_indexer for unique)
merge master (as the other PR was merged)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 small changes that remove code done in the precursor and i think good to go
Glad to see this finally resolved. Looking forward to trying out the new behavior.
@zfrenchee it would be very welcome to test and give feedback !
(there should also be a release candidate in a few days, if that is easier to test than master)
@jorisvandenbossche I noticed today
NotImplementedError: contains not implemented for two intervals
Both in 25.3 and master. Is there still room to contribute to intervalIndex?
@alexlenail I think it is best to open a new issue for this (to ensure it doesn't get lost)
I need to look back whether this was done on purpose or just because it was less straightforward / lack of time (the error type seems to suggest the second). But let's use a new issue to discuss that. Contributions are certainly welcome!