PERF: custom ops for RangeIndex.[all|any|contains] by topper-123 · Pull Request #26617 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation16 Commits5 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
- xref PERF: don't call RangeIndex._data unnecessarily #26565
- tests added / passed
- passes
git diff upstream/master -u -- "*.py" | flake8 --diff
- whatsnew entry
Follow-up to #26565. Make RangeIndex._data
be created in fewer cases.
Performance examples (but the larger gain is from memory savings by not creating the _data
array):
rng = pd.RangeIndex(-5_000_000, 0, 6) %timeit rng.all() 789 µs ± 5.24 µs per loop # master 216 ns ± 0.957 ns per loop # this PR %timeit rng.any() 763 µs ± 6.08 µs per loop # master 124 ns ± 9.15 ns per loop # this PR %timeit (-2 in rng) 380 ns ± 1.85 ns per loop # master 689 ns ± 44.7 ns per loop # this PR, slightly slower
def __contains__(self, key): |
---|
hash(key) |
try: |
key = com.ensure_python_int(key) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Python int is needed, else Python iterates over the range, making the operation very slow.
@@ -490,3 +490,13 @@ def f(x): |
---|
f = mapper |
return f |
def ensure_python_int(value: Union[int, Any]) -> int: |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put in dtypes.cast but i don’t think u need this as we already have ensure_platform_int
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ensure_platform_int
returns a numpy int(32|64), so won't work here:
rng = range(1, 1_000_000) %timeit 999_999 in rng 133 ns ± 2.46 ns per loop %timeit np.int64(999_999) in rng 306 ms ± 66.3 ms per loop
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok in any event move it to collocate with the ensure _ routines
@@ -1333,3 +1334,13 @@ def maybe_cast_to_integer_array(arr, dtype, copy=False): |
---|
if is_integer_dtype(dtype) and (is_float_dtype(arr) or |
is_object_dtype(arr)): |
raise ValueError("Trying to coerce float values to integers") |
def ensure_python_int(value: Union[int, Any]) -> int: |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave the wrong direction, can you put in core/dtypes/common (right below the other ensure_*)
and can you add a doc-string (I know its trivial)
the typing seems odd here, shouldn't this be Union[int, np.integer] ?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure. In principle a float or a np.floating and possibly others could for some subvalues pass the test (e.g. if value is 1.0). But ok, passing a float would be a code smell, so just as well make the programmer see passing those as typing errors.
def __contains__(self, key): |
---|
hash(key) |
try: |
key = cast.ensure_python_int(key) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import directly
def all(self): |
---|
return 0 not in self._range |
def any(self): |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you type
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will need to wait for this until #26581 is merged, as mypy doesn't understand/see attributes defined in __new__
or _simple_new
. If that one is merged, I can go through this again wrt. typing.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments
Comments addressed. The typing issue must wait until #26581 is merged for mypy to see _range.
can you add the issue number onto the one you already have in Performance section; merge master as well. ping on green.
Labels
Related to the Index class or subclasses
Memory or execution speed performance