BUG: allow describe() for DataFrames with only boolean columns by agraboso · Pull Request #13898 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation18 Commits1 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
Thx for the PR! We appreciate including the tests from first to see the detail.
@@ -5138,9 +5139,11 @@ def describe_1d(data): |
---|
if self.ndim == 1: |
return describe_1d(self) |
elif (include is None) and (exclude is None): |
if len(self._get_numeric_data()._info_axis) > 0: |
if len(self._get_numeric_data(exclude_bool=True)._info_axis) > 0: |
# when some numerics are found, keep only numerics |
data = self.select_dtypes(include=[np.number]) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any reason to change _get_numeric_data
rather than changing this line?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated PR with a more minimal approach.
Current coverage is 85.30% (diff: 100%)
@@ master #13898 diff @@
Files 139 139
Lines 50143 50143
Methods 0 0
Messages 0 0
Branches 0 0
- Hits 42777 42776 -1
- Misses 7366 7367 +1
Partials 0 0
Powered by Codecov. Last update 7e15923...26201aa
if len(self._get_numeric_data()._info_axis) > 0: |
---|
ncols_numeric = len(self._get_numeric_data()._info_axis) |
ncols_bool = len(self._get_bool_data()._info_axis) |
if ncols_numeric > ncols_bool: |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't look correct. we want bool
describe only when there is no number column?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it is correct! BoolBlock
inherits from NumericBlock
, so ncols_numeric
is always greater than or equal to ncols_bool
. If they differ, we do have non-boolean numeric columns, which we select with np.number
. If they coincide and ncols_bool > 0
, the only numeric columns are boolean and we get them with np.bool
. The remaining case has no numeric columns — not even boolean.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thus u don't have to change the line, when self.select_dtypes(include=[np.number])
is empty use self
.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree use .select_dtype
here
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, now I understand. Indeed, much cleaner.
when all of the checks have passed it will turn grin. ping then.