API: Uses pd.NA in IntegerArray by TomAugspurger · Pull Request #29964 · pandas-dev/pandas (original) (raw)

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

@TomAugspurger

@TomAugspurger

@TomAugspurger

@TomAugspurger

TomAugspurger

return type(self)(self._data[item], self._mask[item])
def _coerce_to_ndarray(self):
def _coerce_to_ndarray(self, dtype=None, na_value=libmissing.NA):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll want to make a to_array that's basically this method.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have to_numeric which is the canonical form of to_array (rather do conversions there)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jeff, the discussion about to_numpy (the to_array was a typo I think) moved to #30038 in the mean time. Can you move your comment there if relevant?

Note that to_numeric is a function that converts any thing to a numeric type. While this function here is to convert a numeric type (this IntegerArray) to any other numpy dtype.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to_array should have been to_numpy.

jorisvandenbossche

@TomAugspurger

Still some failing tests. Will have to put this on hold for a day or two now.

@TomAugspurger

@TomAugspurger

pandas/tests/test_register_accessor.py:39: error: Module has no attribute "register_series_accessor"
pandas/tests/test_register_accessor.py:40: error: Module has no attribute "register_dataframe_accessor"
pandas/tests/test_register_accessor.py:41: error: Module has no attribute "register_index_accessor"

I'm not able to reproduce these locally.

@jorisvandenbossche

What's the status of this? This is ready to go? (with the special cases for pow etc already merged in other PRs)

jorisvandenbossche

raise ValueError("Lengths must match to compare")
# numpy will show a DeprecationWarning on invalid elementwise
# comparisons, this will raise in the future

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous question about this. Is this comment no longer relevant or correct? Or why was it removed?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, do you know how this is actually hit? If NumPy is going to raise in the future, shouldn't they be seeing that warning?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is about the warning you get with comparisons with objects / non-broadcastable arrays. Eg:

In [29]: np.array([1, 2]) == "b"   
/home/joris/miniconda3/envs/dev/bin/ipython:1: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  #!/home/joris/miniconda3/envs/dev/bin/python
Out[29]: False

In [30]: pd.array([1, 2]) == "b" 
Out[30]: array([False, False])

(it seems IntegerArray already handles this fine, not sure there is a explicit test for that)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems IntegerArray already handles this fine,

Gotch. It's silencing the same warning from NumPy, and falling back to invalid_comparison, which returns the expected result. I'll restore the comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually... the comment is incorrect. NumPy will perform elementwise comparison in the future, not raise. If they were to raise on that in the future the implementation would be incorrect.

Though I'm still a bit confused, as the NumPy op is returning NotImplemented since we're calling it directly. Will that continue to return NotImplemented? Or will the elementwise result be different?

@TomAugspurger

@TomAugspurger

@TomAugspurger

@jorisvandenbossche

I am planning to merge this in a bit. I am not sure if @jreback's review was a full one, but the comments have been addressed/answered. But it would be good to have this in master for IntegerArray as well, and this is blocking some follow-ups (completing to_numpy in #30322, creating the base class). There can be more review in those PRs I think.

@jreback

let me have a look a n hour or 2

@TomAugspurger

@TomAugspurger

@jreback did you still want to / have time to take a look?

@jreback

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. i think need a big whatsnew sub-section that explains show the implications of this change.

  1. getitem is now NA
  2. comparisions now return BooleanArray
pd.array([1, 2, np.nan], dtype="Int64")
All NA-like values are replaced with :attr:`pandas.NA`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may want to add a versionchanged tag here (and below)

@TomAugspurger

Thanks, added whatsnew for those changes.

@TomAugspurger

@TomAugspurger

jreback

@jreback

Labels