API: allow step!=1 slice with IntervalIndex by jbrockmendel · Pull Request #31658 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation23 Commits30 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

jbrockmendel

@jreback jreback added Indexing

Related to indexing on series/frames, not to indexes themselves

Interval

Interval data type

labels

Feb 5, 2020

@jreback

I suppose should add a whatsnew bug fix entry.

@jbrockmendel

jorisvandenbossche

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test that ensures that slicing with an interval as step still raises an error?

@@ -145,7 +145,7 @@ Strings
Interval
^^^^^^^^
- Bug in :class:`IntervalIndex` slicing that prevented slicing with ``step > 1`` (:issue:`31658`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this in the "other enhancements" section

jorisvandenbossche

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel what would you expect for this:

s = Series(np.arange(5), IntervalIndex.from_breaks(np.arange(2, 14, 2))) 
s[1.5:9.5:2]

I think that is less clear, which might be the reason it was disallowed initially.

"these indexers .* of <class 'pandas._libs.interval.Interval'>"
)
with pytest.raises(TypeError, match=msg):
s[0 : 4 : Interval(0, 1)]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would test it where start and stop are also intervals.
And can you move it to the test_loc_with_slices a bit above that is testing slicing with Intervals ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start and stop are also intervals

out of scope

And can you move it to the test_loc_with_slices a bit above that is testing slicing with Intervals ?

sure

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

start and stop are also intervals

out of scope

I think that is not out of scope, as that hits this path, and right now actually ensured there is a proper error message.

msg = (
"cannot do slice indexing on "
"<class 'pandas.core.indexes.interval.IntervalIndex'> with "
"these indexers .* of <class 'pandas._libs.interval.Interval'>"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current error message I get is more comprehensible I think (with s from the tests here):

In [41]: s[Interval(3, 4):Interval(4, 5):Interval(1, 2)]                                                                                                                                                           
...
ValueError: cannot support not-default step in a slice

In [42]: s[1:2:Interval(1, 2)]                                                                                                                                                           
...
ValueError: cannot support not-default step in a slice

@jbrockmendel

what would you expect for [...] s[1.5:9.5:2] [...] I think that is less clear, which might be the reason it was disallowed initially.

I agree that is less obvious, as that is not positional.

@jorisvandenbossche

I agree that is less obvious, as that is not positional.

So let's not allow that?

@jbrockmendel

@jbrockmendel

So let's not allow that?

OK I see, I thought that the float-slice would have been disallowed somewhere else, but evidently not. So yes, this needs to be updated to restore that.

@jbrockmendel

@jbrockmendel

@jbrockmendel

@jorisvandenbossche double-checking: suppose we get integer sllce that is valid for either loc or iloc:

import pandas as pd
import numpy as np

ser = pd.Series(np.arange(5), pd.IntervalIndex.from_breaks(np.arange(2, 14, 2))) 
key = slice(3, 6)

ser[key]
ser.loc[key]
ser.iloc[key]

key2 = 3
ser[key2]
ser.loc[key2]
ser.iloc[key2]

For the slice we treat the ambiguous __getitem__ key as positional, while for the int we treat it as a label. That's on purpose, right?

@jbrockmendel

Closing to clear the queue (in particular #31786 should happen before this), will revisit

@jbrockmendel

@jreback

@jbrockmendel

@jbrockmendel

@jreback

@jbrockmendel

jschendel

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

@@ -20,6 +21,34 @@
# Indexer Identification
def is_valid_positional_slice(slc: slice) -> bool:
"""
Check if a slice object can be interpretd as a positional indexer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: interpretd --> interpreted

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@jbrockmendel

@jbrockmendel

@jreback

thanks 2 followups:

  1. @jorisvandenbossche comment about having start/stop as intervals, pls create an issue
  2. does this need an example in the docs?

SeeminSyed pushed a commit to CSCD01-team01/pandas that referenced this pull request

Mar 22, 2020

@jbrockmendel @SeeminSyed

Labels

Indexing

Related to indexing on series/frames, not to indexes themselves

Interval

Interval data type