API: allow step!=1 slice with IntervalIndex by jbrockmendel · Pull Request #31658 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation23 Commits30 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
Related to indexing on series/frames, not to indexes themselves
Interval data type
labels
I suppose should add a whatsnew bug fix entry.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a test that ensures that slicing with an interval as step still raises an error?
@@ -145,7 +145,7 @@ Strings |
---|
Interval |
^^^^^^^^ |
- Bug in :class:`IntervalIndex` slicing that prevented slicing with ``step > 1`` (:issue:`31658`) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put this in the "other enhancements" section
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel what would you expect for this:
s = Series(np.arange(5), IntervalIndex.from_breaks(np.arange(2, 14, 2)))
s[1.5:9.5:2]
I think that is less clear, which might be the reason it was disallowed initially.
"these indexers .* of <class 'pandas._libs.interval.Interval'>" |
---|
) |
with pytest.raises(TypeError, match=msg): |
s[0 : 4 : Interval(0, 1)] |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would test it where start and stop are also intervals.
And can you move it to the test_loc_with_slices
a bit above that is testing slicing with Intervals ?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start and stop are also intervals
out of scope
And can you move it to the test_loc_with_slices a bit above that is testing slicing with Intervals ?
sure
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start and stop are also intervals
out of scope
I think that is not out of scope, as that hits this path, and right now actually ensured there is a proper error message.
msg = ( |
---|
"cannot do slice indexing on " |
"<class 'pandas.core.indexes.interval.IntervalIndex'> with " |
"these indexers .* of <class 'pandas._libs.interval.Interval'>" |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current error message I get is more comprehensible I think (with s
from the tests here):
In [41]: s[Interval(3, 4):Interval(4, 5):Interval(1, 2)]
...
ValueError: cannot support not-default step in a slice
In [42]: s[1:2:Interval(1, 2)]
...
ValueError: cannot support not-default step in a slice
what would you expect for [...]
s[1.5:9.5:2]
[...] I think that is less clear, which might be the reason it was disallowed initially.
I agree that is less obvious, as that is not positional.
I agree that is less obvious, as that is not positional.
So let's not allow that?
So let's not allow that?
OK I see, I thought that the float-slice would have been disallowed somewhere else, but evidently not. So yes, this needs to be updated to restore that.
@jorisvandenbossche double-checking: suppose we get integer sllce that is valid for either loc or iloc:
import pandas as pd
import numpy as np
ser = pd.Series(np.arange(5), pd.IntervalIndex.from_breaks(np.arange(2, 14, 2)))
key = slice(3, 6)
ser[key]
ser.loc[key]
ser.iloc[key]
key2 = 3
ser[key2]
ser.loc[key2]
ser.iloc[key2]
For the slice we treat the ambiguous __getitem__
key as positional, while for the int we treat it as a label. That's on purpose, right?
Closing to clear the queue (in particular #31786 should happen before this), will revisit
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks!
@@ -20,6 +21,34 @@ |
---|
# Indexer Identification |
def is_valid_positional_slice(slc: slice) -> bool: |
""" |
Check if a slice object can be interpretd as a positional indexer. |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: interpretd --> interpreted
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
thanks 2 followups:
- @jorisvandenbossche comment about having start/stop as intervals, pls create an issue
- does this need an example in the docs?
SeeminSyed pushed a commit to CSCD01-team01/pandas that referenced this pull request
Allow step!=1 in slicing Series with IntervalIndex
Whatsnew
doc suggestion
test for interval step
remove commented-out
reword whatsnew
disallow label-based with step!=1
black fixup
update error message
typo fixup
Labels
Related to indexing on series/frames, not to indexes themselves
Interval data type