DEPR: deprecate default of skipna=False in infer_dtype by h-vetinari · Pull Request #24050 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation48 Commits19 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

h-vetinari

I know that v.0.22 is generally not counted towards the "3 major releases between warning and deprecation" policy, but since this is a private method, I'm thinking it should be OK?

Would make life easier in a couple of places (especially in cleaning up some of the casting code).

@h-vetinari

@pep8speaks

Hello @h-vetinari! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on January 04, 2019 at 11:14 Hours UTC

h-vetinari

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on specific changes

@codecov

@codecov

@gfyoung

Hmm...you do have to try a little just to get to the function via Python, so I'm leaning towards ok, but not 100% certain though.

cc @jreback

@jreback

I don't think this was ever deprecated. So not really possible to actually switch it now. If you want to add a deprecation warning if its not passed (e.g. make the default None) that would be ok.

@h-vetinari

bashtage

h-vetinari

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bashtage

What happens when you have

df = pd.DataFrame([[None],[None]],dtype='object')

IIRC None is an N/A type.

TomAugspurger

@h-vetinari

@bashtage

What happens when you have
df = pd.DataFrame([[None],[None]],dtype='object')
IIRC None is an N/A type.

This also doesn't change between the two variants:

>>> import pandas as pd
>>> import pandas._libs.lib as lib
>>> df = pd.DataFrame([[None], [None]], dtype=object)
>>>
>>> lib.infer_dtype(df, skipna=True)
'integer'
>>> lib.infer_dtype(df.dropna(), skipna=False)
'integer'

That said, the result is a bit unexpected, and seems to be due the 2d case being handled differently. In any case, both variants (what you commented on, resp. what I'm changing it to) coincide also for 1d here:

>>> lib.infer_dtype(df.values.ravel(), skipna=True)
'empty'
>>> lib.infer_dtype(df.dropna().values.ravel(), skipna=False)
'empty'
>>>
>>> # and for comparison
>>> lib.infer_dtype(df.values.ravel(), skipna=False)
'mixed'

@jorisvandenbossche jorisvandenbossche changed the titleDEPR: switch default of skipna-kwarg in infer_dtype to True DEPR: deprecate default of skipna=False in infer_dtype

Dec 6, 2018

@h-vetinari

@TomAugspurger

I implemented the DeprecationWarning like you requested - this should be ready.

jreback

@h-vetinari

h-vetinari

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review

@h-vetinari

TomAugspurger

@h-vetinari

@h-vetinari

jreback

h-vetinari

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback
I think there's a misunderstanding here. Please have a look at my response and TAL.

jreback

h-vetinari

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback
I have very limited capability over the holidays. Will most likely not be able to make a precursor before the cutoff. I could switch everything to skipna=False, and then things would definitely be the same as before.

@h-vetinari

@h-vetinari

@h-vetinari

@jreback @TomAugspurger

Merged in #24560 and restored the state that was approved by @TomAugspurger (in 1ea21fe), which is all green now. The philosophy of that was to use skipna=True in as many places as possible, not least since the original approach of this PR was to set the default to True right away.

To avoid getting bogged down in discussing the implications before the cut-off, I've added the last commit, which maintains the current behaviour (skipna=False), and only adds the deprecation. The usages within tests are an exception to this, because they are ipso facto correct with a passing CI.

If changing the defaults in the codebase is desired, I can make a follow-up with 1ea21fe.

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok lgtm. very minor change. ping on green.

assert result == 'integer'
def test_warn(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to test_deprecation & add the issue number as a comment

@h-vetinari

@h-vetinari

@h-vetinari

@jreback
Thanks. Any comment on whether you'd like me to transition the calls from skipna=False to skipna=True where possible (#24050 (comment))?

jreback

@jreback

thanks @h-vetinari

I think we should only change to True in non-trivial cases, IOW the trivial ones can stay False; I think should be very explicit and deliberate for True. (so a lot of the string like ones this makes sense)

@jorisvandenbossche

A bit late to the party here :-), but:

  1. What is actually the rationale for this change?

I went quickly through the PR here but didn't directly find any reasoning apart from "Would make life easier in a couple of places (especially in cleaning up some of the casting code)." in the top post. Was this discussed in another issue?

Also to quote Jeff from his last comment just above: "IOW the trivial ones can stay False; I think should be very explicit and deliberate for True." (my emphasis)
If you want to be explicit in doing True, why then making this the default?

For the string case I think it certainly makes sense to have this as default, but for the integer case you now get a inconsistency between what infer_dtype tells you and what Series does: infer_dtype([1, 2, np.nan] == 'integer vs pd.Series([1, 2, np.nan]) == float64 (of course, if we ever have NA support by default for ints, this argument is no longer valid).

  1. If we keep the warning (not really a strong opinion here, mainly wondering the rationale in the above comment): what do you all think of making this a DeprecationWarning instead of FutureWarning first, given that it is more a functions used by developers / other libraries and less directly by users?

@h-vetinari

@jorisvandenbossche

  1. What is actually the rationale for this change?

Actually, my main reason for deprecating was that it has been "announced" in the docstring since 0.21 (see diff of this PR):

The default of ``False`` will be deprecated in a later version of pandas.

That being said, I think the inference generally makes more sense when nans are ignored (just that historically, skipna was not available until #17066).

  1. If we keep the warning (not really a strong opinion here, mainly wondering the rationale in the above comment): what do you all think of making this a DeprecationWarning instead of FutureWarning first, given that it is more a functions used by developers / other libraries and less directly by users?

I agree that it's not very user-facing. I have no strong preference for the type of warning.

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request

Feb 28, 2019

@h-vetinari @Pingviinituutti

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request

Feb 28, 2019

@h-vetinari @Pingviinituutti