API: Use object dtype for empty Series by SaturnFromTitan · Pull Request #29405 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation96 Commits5 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

SaturnFromTitan

@SaturnFromTitan

Meant to open a draft PR - please don't review this for now.

I assume quite a few more warnings and broken tests will need to be fixed

@TomAugspurger

Mmm sorry. We need to have a discussion about whether this is worth the change.

At this point, I'm not sure...

This can be made backwards compatible, right? When the data is empty and no dtype is specified, we can emit a warning saying "pass dtype=float to keep the old behavior, or dtype=object for the new behavior."

@SaturnFromTitan

We need to have a discussion about whether this is worth the change.

Do you mean if we want to change it at all? No strong opinion about it, but it seems reasonable to want consistent behaviour across Series, DataFrame and Index for this. Any particular reason why we wouldn't want that?

This can be made backwards compatible, right?

Why would we want to stay backwards compatible here? A big release like 1.0.0 seems like a pretty good opportunity to break this behaviour. We could obviously add this warning, but I wonder if it's really meaningful. Or do you mean we should add this warning before implementing the actual change, i.e. going through a full deprecation cycle?

@jreback

@SaturnFromTitan we generally don’t like to non backward compatible changes if we can help it; or if not provide a deprecation period

this change would likely break lots of code that is out there so better to provide some notice before we actually change it

@SaturnFromTitan

@jreback
Gotcha. So should I indeed change the PR to just raise the warning instead then? Or is it up for debate whether we want to go in this direction at all?

@jreback

yes only a warning for now

@TomAugspurger

Agreed that a warning is the best path forward.

@SaturnFromTitan

Phew... I changed the dtype change to a warning, but now realise this triggers an immense amount of warnings that clutter all test output. Silencing all of them feels like a huge undertaking... Is there an easier way around this @jreback @TomAugspurger?

@TomAugspurger

They'll all need to be updated. We don't allow unhandled warnings in our test suite. We'll also need a new test ensuring that not specifying the dtype or data results in a warning.

@jreback

Phew... I changed the dtype change to a warning, but now realise this triggers an immense amount of warnings that clutter all test output. Silencing all of them feels like a huge undertaking... Is there an easier way around this @jreback @TomAugspurger?

as tom indicates the best way here is to correct the expected in the tests
so there is no warning

@SaturnFromTitan

@jreback @TomAugspurger I understand how I can resolve the warnings, the problem I see is the amount of work. This code change now raises >4000 warnings. Changing all those tests would not only require a lot of work but would also result in a gigantic PR :D

Now the question: Do you think there is a good way to work around this or should we rather close this PR as "Won't fix" because of the value it brings doesn't justify the effort it demands?

@jreback

there shouldn’t be nearly this many warnings. the change is only for a very small subset of things, eg empty Series with no dtype

your patch is catching other cases; haven’t looked in depth but likely
you should look at a test which shouldn’t be changed and debug

jreback

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comments, you should have very very few catch warnings at all. if you have a bare Series() then just Series(dtype=object) and will silence and be correct.

@SaturnFromTitan

@jreback Thanks for taking a look at the PR already! I think I still need to rework some of the changes and silence more warnings, but it's already very helpful.

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge master as well. i have to fully review.

jorisvandenbossche

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates!

@SaturnFromTitan

@jorisvandenbossche

Looks all good to me now!
CI failures are indeed unrelated.

@SaturnFromTitan for future reference: if you could push only new commits instead of squashing with previous commits (and thus force pushing), that would be preferred. It makes it easier to see what has changes since a previous review with the github interface.

jorisvandenbossche

@jorisvandenbossche

@SaturnFromTitan

…sts and docs so that no unnecessary warnings are thrown

@SaturnFromTitan

@SaturnFromTitan

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. some minor comments. ping on green.

@SaturnFromTitan

@SaturnFromTitan

I'm not too sure about the types I assigned for create_series_with_explicit_dtype since Series itself isn't typed @jreback @jorisvandenbossche

@SaturnFromTitan

@SaturnFromTitan

@SaturnFromTitan

@jreback merge conflicts resolved and CI is green ✅

jreback

@jreback

thanks @SaturnFromTitan

great patch! this was a biggie.

do we have an issue to turn this into a FutureWarning?

happy to accept patches for things like typing (or anything else)!

@SaturnFromTitan

Yes, here's the issue for turning it into a FutureWarning: #30017

proost pushed a commit to proost/pandas that referenced this pull request

Dec 19, 2019

@SaturnFromTitan @proost

proost pushed a commit to proost/pandas that referenced this pull request

Dec 19, 2019

@SaturnFromTitan @proost

This was referenced

Jun 17, 2021