BUG: IntegerArray/FloatingArray constructors mismatched NAs by jbrockmendel · Pull Request #44514 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation27 Commits17 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
- closes #xxxx
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
Same yak-shaving as #44495 (which turned out to be a dead end for this particular yak, but still a perf bump)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The setitem
change / bug fix is unrelated to the constructor fix? Or it's because you are testing that through setitem as well?
mask2 = isna(values) |
---|
if not (mask == mask2).all(): |
# e.g. if we have a timedelta64("NaT") |
raise TypeError(f"{values.dtype} cannot be converted to a FloatingDtype") |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, could it be libmissing.is_numeric_na
that already raises on encountering a "non-numeric NA"? (is there a use case for is_numeric_na to not be strict about this, i.e. to get a "partial" mask?)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you planning to address this one?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
planning to address in a follow-up
The setitem change / bug fix is unrelated to the constructor fix? Or it's because you are testing that through setitem as well?
The setitem bug was identified first and the cause tracked back to the constructor.
@@ -357,6 +364,31 @@ def test_setitem_series(self, data, full_indexer): |
---|
) |
self.assert_series_equal(result, expected) |
def test_setitem_frame_2d_values(self, data, using_array_manager, request): |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this test out of the extension
base tests, or remove the need to use using_array_manager
? (this is not defined by external users of those tests, and would be a bit annoying to replicate)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
gentle ping; this is a blocker for fixing a bug in Series.where, which in turn should allow us to share some more Block methods.
df = pd.DataFrame({"A": data}) |
---|
# Avoiding using_array_manager fixture |
# https://github.com/pandas-dev/pandas/pull/44514#discussion\_r754002410 |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for changing to not use the fixture. But generally, is this actually needed to have as base extension test? (since the fix was inside the Blocks code, it's not really testing a specific behaviour that the EA needs to have?)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this actually needed to have as base extension test?
This seems like the best way to systematically test it for all EA dtypes.
it's not really testing a specific behaviour that the EA needs to have?
That seems like it applies to most tests that aren't directly testing EA methods.
Two questions, but feel free to merge as well
updated to raise inside is_numeric_na
i think comments have all been addressed here? bugfix follow-up is ready.
This was referenced
Nov 30, 2021
Yah, looks like a lot of time is being taken in is_numeric_na.
from asv_bench.benchmarks.groupby import *
self = Cumulative()
self.setup("Float64", "cumsum")
%timeit self.time_frame_transform("Float64", "cumsum")
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 1.791 1.791 {built-in method builtins.exec}
1 0.000 0.000 1.791 1.791 <string>:1(<module>)
10 0.014 0.001 1.791 0.179 groupby.py:569(time_frame_transform)
10 0.000 0.000 1.773 0.177 generic.py:1179(transform)
10 0.000 0.000 1.773 0.177 groupby.py:1608(_transform)
10 0.000 0.000 1.773 0.177 groupby.py:3117(cumsum)
10 0.000 0.000 1.773 0.177 generic.py:1103(_cython_transform)
10 0.000 0.000 1.762 0.176 managers.py:1266(grouped_reduce)
50 0.000 0.000 1.762 0.035 blocks.py:381(apply)
50 0.000 0.000 1.758 0.035 generic.py:1118(arr_func)
50 0.000 0.000 1.758 0.035 ops.py:919(_cython_operation)
50 0.000 0.000 1.699 0.034 ops.py:587(cython_operation)
50 0.001 0.000 1.697 0.034 ops.py:319(_ea_wrap_cython_operation)
50 0.000 0.000 1.559 0.031 ops.py:376(_reconstruct_ea_result)
50 0.000 0.000 1.558 0.031 floating.py:263(_from_sequence)
50 0.001 0.000 1.557 0.031 floating.py:85(coerce_to_array)
50 1.551 0.031 1.551 0.031 {pandas._libs.missing.is_numeric_na}
50 0.000 0.000 0.134 0.003 ops.py:434(_cython_op_ndim_compat)
Looks like we are calling is_numeric_na in cases where we have float64 dtype, so can just use np.isnan. Should be an easy patch.