BUG: fix concat of Sparse with non-sparse dtypes by jorisvandenbossche · Pull Request #34338 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation8 Commits7 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

jorisvandenbossche

Closes #34336

This adds tests with the behaviour as it was on 0.25.3 / 1.0.3, and some changes to get back to that behaviour.
(but whether this behaviour is fully "sane", I am not sure ..)

@jorisvandenbossche

@jorisvandenbossche

TomAugspurger

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

I'm not totally sure about the expected behavior of concat([Sparse, Categorical]). I suspect it was never properly discussed / intentionally implemented. Happy to just match 1.0.3 behavior for now htough.

@jorisvandenbossche @TomAugspurger

Co-authored-by: Tom Augspurger TomAugspurger@users.noreply.github.com

@jorisvandenbossche

OK, will go forward with this PR as is (matching the previous behaviour) to unblock the other concat PRs, but will open a set of follow-up issues.

jorisvandenbossche

@@ -1063,7 +1063,8 @@ def astype(self, dtype=None, copy=True):
"""
dtype = self.dtype.update_dtype(dtype)
subtype = dtype._subtype_with_str
sp_values = astype_nansafe(self.sp_values, subtype, copy=copy)
# TODO copy=False is broken for astype_nansafe with int -> float
sp_values = astype_nansafe(self.sp_values, subtype, copy=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #34456 for this (and added link in the comment)

jorisvandenbossche

if is_sparse(arr.dtype) and not is_sparse(dtype):
# problem case: SparseArray.astype(dtype) doesn't follow the specified
# dtype exactly, but converts this to Sparse[dtype] -> first manually
# convert to dense array

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche

@jorisvandenbossche

@jorisvandenbossche

And opened #34459 for the general "what should concat(sparse, categorical) do?" question

@jorisvandenbossche

TomAugspurger

@@ -20,7 +21,7 @@
)
from pandas.core.dtypes.generic import ABCCategoricalIndex, ABCRangeIndex, ABCSeries
from pandas.core.arrays import ExtensionArray
from pandas.core.arrays import ExtensionArray, SparseArray

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I already pushed a fix.
I don't really understand why, though, as SparseArray is included in the pandas.core.arrays init, just as ExtensionArray.

@jorisvandenbossche

2 participants

@jorisvandenbossche @TomAugspurger