API: Use object dtype for empty Series by SaturnFromTitan · Pull Request #29405 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation96 Commits5 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
- closes API: Use object dtype for empty Series #17261
- tests added / passed
- passes
black pandas
- passes
git diff upstream/master -u -- "*.py" | flake8 --diff
- whatsnew entry
Meant to open a draft PR - please don't review this for now.
I assume quite a few more warnings and broken tests will need to be fixed
Mmm sorry. We need to have a discussion about whether this is worth the change.
At this point, I'm not sure...
This can be made backwards compatible, right? When the data is empty and no dtype is specified, we can emit a warning saying "pass dtype=float to keep the old behavior, or dtype=object for the new behavior."
We need to have a discussion about whether this is worth the change.
Do you mean if we want to change it at all? No strong opinion about it, but it seems reasonable to want consistent behaviour across Series, DataFrame and Index for this. Any particular reason why we wouldn't want that?
This can be made backwards compatible, right?
Why would we want to stay backwards compatible here? A big release like 1.0.0 seems like a pretty good opportunity to break this behaviour. We could obviously add this warning, but I wonder if it's really meaningful. Or do you mean we should add this warning before implementing the actual change, i.e. going through a full deprecation cycle?
@SaturnFromTitan we generally don’t like to non backward compatible changes if we can help it; or if not provide a deprecation period
this change would likely break lots of code that is out there so better to provide some notice before we actually change it
@jreback
Gotcha. So should I indeed change the PR to just raise the warning instead then? Or is it up for debate whether we want to go in this direction at all?
yes only a warning for now
Agreed that a warning is the best path forward.
Phew... I changed the dtype change to a warning, but now realise this triggers an immense amount of warnings that clutter all test output. Silencing all of them feels like a huge undertaking... Is there an easier way around this @jreback @TomAugspurger?
They'll all need to be updated. We don't allow unhandled warnings in our test suite. We'll also need a new test ensuring that not specifying the dtype or data results in a warning.
Phew... I changed the dtype change to a warning, but now realise this triggers an immense amount of warnings that clutter all test output. Silencing all of them feels like a huge undertaking... Is there an easier way around this @jreback @TomAugspurger?
as tom indicates the best way here is to correct the expected in the tests
so there is no warning
@jreback @TomAugspurger I understand how I can resolve the warnings, the problem I see is the amount of work. This code change now raises >4000 warnings. Changing all those tests would not only require a lot of work but would also result in a gigantic PR :D
Now the question: Do you think there is a good way to work around this or should we rather close this PR as "Won't fix" because of the value it brings doesn't justify the effort it demands?
there shouldn’t be nearly this many warnings. the change is only for a very small subset of things, eg empty Series with no dtype
your patch is catching other cases; haven’t looked in depth but likely
you should look at a test which shouldn’t be changed and debug
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comments, you should have very very few catch warnings at all. if you have a bare Series()
then just Series(dtype=object)
and will silence and be correct.
@jreback Thanks for taking a look at the PR already! I think I still need to rework some of the changes and silence more warnings, but it's already very helpful.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge master as well. i have to fully review.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates!
Looks all good to me now!
CI failures are indeed unrelated.
@SaturnFromTitan for future reference: if you could push only new commits instead of squashing with previous commits (and thus force pushing), that would be preferred. It makes it easier to see what has changes since a previous review with the github interface.
…sts and docs so that no unnecessary warnings are thrown
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. some minor comments. ping on green.
I'm not too sure about the types I assigned for create_series_with_explicit_dtype
since Series
itself isn't typed @jreback @jorisvandenbossche
@jreback merge conflicts resolved and CI is green ✅
thanks @SaturnFromTitan
great patch! this was a biggie.
do we have an issue to turn this into a FutureWarning?
happy to accept patches for things like typing (or anything else)!
Yes, here's the issue for turning it into a FutureWarning: #30017
proost pushed a commit to proost/pandas that referenced this pull request
proost pushed a commit to proost/pandas that referenced this pull request
This was referenced
Jun 17, 2021