API: Implement new indexing behavior for intervals by jschendel · Pull Request #27100 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation69 Commits13 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks really good, some comments.

if you have other issues you are closing add them as closes in the top of the PR

[1, -1])])
def test_get_indexer_with_interval(self, query, expected):

tuples = [(0, 2.5), (1, 3), (2, 4)]
tuples = [(0, 2), (2, 4), (5, 7)]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change here was to have a non-overlapping index?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since overlapping indexes are invalid for get_indexer and need to use get_indexer_non_unique instead. For intervals, overlapping is the analogue for uniqueness, as uniqueness isn't a strong enough condition since a query on a unique IntervalIndex can still have multiple values returned if they come from the overlap of 2+ intervals.

It would also be good to add some documentation on indexing in the advanced.rst interval section (maybe copy some example from the whatsnew you added)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates look good!

@ghost ghost mentioned this pull request

Jul 1, 2019

@@ -965,6 +964,32 @@ If you select a label contained within an interval, this will also select the
df.loc[2.5]
df.loc[[2.5, 3.5]]

Selecting using an ``Interval`` will only return exact matches.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a versionchanged 0.25 here

Indexing an ``IntervalIndex`` with ``Interval`` objects
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Indexing methods for :class:`IntervalIndex` have been modified to require exact matches only for :class:`Interval` queries.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a link the docs you added above in indexing


elif lhs == -1:
# check that target IntervalIndex is compatible

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth it to make a helper method for indexing with an II (as this is the same code as in get_indexer for unique)

merge master (as the other PR was merged)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 small changes that remove code done in the precursor and i think good to go

Glad to see this finally resolved. Looking forward to trying out the new behavior.

@zfrenchee it would be very welcome to test and give feedback !
(there should also be a release candidate in a few days, if that is easier to test than master)

@jorisvandenbossche I noticed today

NotImplementedError: contains not implemented for two intervals

Both in 25.3 and master. Is there still room to contribute to intervalIndex?

@alexlenail I think it is best to open a new issue for this (to ensure it doesn't get lost)
I need to look back whether this was done on purpose or just because it was less straightforward / lack of time (the error type seems to suggest the second). But let's use a new issue to discuss that. Contributions are certainly welcome!