BUG: allow describe() for DataFrames with only boolean columns by agraboso · Pull Request #13898 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation18 Commits1 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

agraboso

@sinhrks

Thx for the PR! We appreciate including the tests from first to see the detail.

sinhrks

@@ -5138,9 +5139,11 @@ def describe_1d(data):
if self.ndim == 1:
return describe_1d(self)
elif (include is None) and (exclude is None):
if len(self._get_numeric_data()._info_axis) > 0:
if len(self._get_numeric_data(exclude_bool=True)._info_axis) > 0:
# when some numerics are found, keep only numerics
data = self.select_dtypes(include=[np.number])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason to change _get_numeric_data rather than changing this line?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated PR with a more minimal approach.

@codecov-io

Current coverage is 85.30% (diff: 100%)

Merging #13898 into master will decrease coverage by <.01%

@@ master #13898 diff @@

Files 139 139
Lines 50143 50143
Methods 0 0
Messages 0 0
Branches 0 0

Powered by Codecov. Last update 7e15923...26201aa

sinhrks

if len(self._get_numeric_data()._info_axis) > 0:
ncols_numeric = len(self._get_numeric_data()._info_axis)
ncols_bool = len(self._get_bool_data()._info_axis)
if ncols_numeric > ncols_bool:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't look correct. we want bool describe only when there is no number column?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it is correct! BoolBlock inherits from NumericBlock, so ncols_numeric is always greater than or equal to ncols_bool. If they differ, we do have non-boolean numeric columns, which we select with np.number. If they coincide and ncols_bool > 0, the only numeric columns are boolean and we get them with np.bool. The remaining case has no numeric columns — not even boolean.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus u don't have to change the line, when self.select_dtypes(include=[np.number]) is empty use self.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree use .select_dtype here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, now I understand. Indeed, much cleaner.

@agraboso

@jreback

@agraboso

@jreback

when all of the checks have passed it will turn grin. ping then.

@agraboso

@agraboso

@jreback