BUG: fix concat of Sparse with non-sparse dtypes by jorisvandenbossche · Pull Request #34338 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation8 Commits7 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

Closes #34336

This adds tests with the behaviour as it was on 0.25.3 / 1.0.3, and some changes to get back to that behaviour.
(but whether this behaviour is fully "sane", I am not sure ..)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

I'm not totally sure about the expected behavior of concat([Sparse, Categorical]). I suspect it was never properly discussed / intentionally implemented. Happy to just match 1.0.3 behavior for now htough.

Co-authored-by: Tom Augspurger TomAugspurger@users.noreply.github.com

OK, will go forward with this PR as is (matching the previous behaviour) to unblock the other concat PRs, but will open a set of follow-up issues.

@@ -1063,7 +1063,8 @@ def astype(self, dtype=None, copy=True):
"""
dtype = self.dtype.update_dtype(dtype)
subtype = dtype._subtype_with_str
sp_values = astype_nansafe(self.sp_values, subtype, copy=copy)
# TODO copy=False is broken for astype_nansafe with int -> float
sp_values = astype_nansafe(self.sp_values, subtype, copy=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #34456 for this (and added link in the comment)

if is_sparse(arr.dtype) and not is_sparse(dtype):
# problem case: SparseArray.astype(dtype) doesn't follow the specified
# dtype exactly, but converts this to Sparse[dtype] -> first manually
# convert to dense array

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And opened #34459 for the general "what should concat(sparse, categorical) do?" question

@@ -20,7 +21,7 @@
)
from pandas.core.dtypes.generic import ABCCategoricalIndex, ABCRangeIndex, ABCSeries

from pandas.core.arrays import ExtensionArray
from pandas.core.arrays import ExtensionArray, SparseArray

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I already pushed a fix.
I don't really understand why, though, as SparseArray is included in the pandas.core.arrays init, just as ExtensionArray.

2 participants