DEPR: dropping nuisance columns in DataFrame reductions by jbrockmendel · Pull Request #41480 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation8 Commits13 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

jbrockmendel

Discussed on this week's call

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. can you add a sub-section in deprecations as this is fairly user visible. ping on green.

@@ -9800,6 +9800,21 @@ def _get_data() -> DataFrame:
# Even if we are object dtype, follow numpy and return
# float64, see test_apply_funcs_over_empty
out = out.astype(np.float64)
if numeric_only is None and out.shape[0] != df.shape[1]:
# columns have been dropped

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue number here and below (this PR number is fine),

This was referenced

May 17, 2021

@jbrockmendel

added a whatsnew subsection. this is actually just the first half of the note im about to push for #41475.

the smart money says ive made a mess of the rst conventions regarding code-block:: ipython vs ipython:: python vs [...]

jreback

Deprecated Dropping Nuisance Columns in DataFrame Reductions and DataFrameGroupBy Operations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When calling a reduction (.min, .max, .sum, ...) on a :class:`DataFrame` with

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When calling a reduction (.min, .max, .sum, ...) on a :class:`DataFrame` with
The default of calling a reduction (.min, .max, .sum, ...) on a :class:`DataFrame` with
``numeric_only=None`` will silently ignore and drop from the result nuiscance columns, e.g. a string column in a .mean() reduction.

jreback

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When calling a reduction (.min, .max, .sum, ...) on a :class:`DataFrame` with
``numeric_only=None`` (the default, columns on which the reduction raises ``TypeError``
are silently ignored and dropped from the result. This behavior is deprecated.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start a new paragraph with 'This behavior is deprecated'

jreback

jreback

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. can you rebase once more and ping on green.

@jbrockmendel

@jbrockmendel

TLouf pushed a commit to TLouf/pandas that referenced this pull request

Jun 1, 2021

@jbrockmendel @TLouf

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request

Jul 3, 2021

@jbrockmendel @JulianWgs

zhengruifeng pushed a commit to apache/spark that referenced this pull request

Aug 21, 2023

@itholic @zhengruifeng

…0 and enabling tests

What changes were proposed in this pull request?

This PR proposes to match the behavior with pandas 2.0.0 and above for stat functions, such as sum, quantile, prod, etc. See pandas-dev/pandas#41480 and pandas-dev/pandas#47500 for more detail.

Why are the changes needed?

To match the behavior to latest pandas.

Does this PR introduce any user-facing change?

Yes, the behaviors for stat funcs are now matched with pandas 2.0.0 and above.

How was this patch tested?

Enabling & updating the existing UTs.

Closes #42526 from itholic/pandas_stat.

Authored-by: itholic haejoon.lee@databricks.com Signed-off-by: Ruifeng Zheng ruifengz@apache.org

valentinp17 pushed a commit to valentinp17/spark that referenced this pull request

Aug 24, 2023

@itholic

…0 and enabling tests

What changes were proposed in this pull request?

This PR proposes to match the behavior with pandas 2.0.0 and above for stat functions, such as sum, quantile, prod, etc. See pandas-dev/pandas#41480 and pandas-dev/pandas#47500 for more detail.

Why are the changes needed?

To match the behavior to latest pandas.

Does this PR introduce any user-facing change?

Yes, the behaviors for stat funcs are now matched with pandas 2.0.0 and above.

How was this patch tested?

Enabling & updating the existing UTs.

Closes apache#42526 from itholic/pandas_stat.

Authored-by: itholic haejoon.lee@databricks.com Signed-off-by: Ruifeng Zheng ruifengz@apache.org

ragnarok56 pushed a commit to ragnarok56/spark that referenced this pull request

Mar 2, 2024

@itholic @ragnarok56

…0 and enabling tests

What changes were proposed in this pull request?

This PR proposes to match the behavior with pandas 2.0.0 and above for stat functions, such as sum, quantile, prod, etc. See pandas-dev/pandas#41480 and pandas-dev/pandas#47500 for more detail.

Why are the changes needed?

To match the behavior to latest pandas.

Does this PR introduce any user-facing change?

Yes, the behaviors for stat funcs are now matched with pandas 2.0.0 and above.

How was this patch tested?

Enabling & updating the existing UTs.

Closes apache#42526 from itholic/pandas_stat.

Authored-by: itholic haejoon.lee@databricks.com Signed-off-by: Ruifeng Zheng ruifengz@apache.org

2 participants

@jbrockmendel @jreback