Issues with numeric_only for DataFrame.std() · Issue #9201 · pandas-dev/pandas (original) (raw)

Skip to content

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sign up

Appearance settings

@mortada

Description

@mortada

The docstring shows a numeric_only option for DataFrame.std() but it does not seem to actually be implemented. I'm happy to take a crack at fixing it but I'm not sure whether it's the doc or the implementation that needs fixing.

To see this consider a mixed-type DataFrame where I'm setting one entry to be a str of '100' while all other entries are float. For std() It does not matter whether numeric_only is True or False, but for max() it clearly makes a difference.

In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame(np.random.randn(5, 2), columns=['foo', 'bar'])
In [4]: df.ix[0, 'foo'] = '100'

In [5]: df
Out[5]:
         foo       bar
0        100 -1.958036
1   0.221049  0.309971
2   1.200093 -0.103244
3  -2.475388 -2.279483
4  0.1623936 -1.185682

In [6]: df.std(numeric_only=True)
Out[6]:
foo    44.841828
bar     1.129182
dtype: float64

In [7]: df.std(numeric_only=False)
Out[7]:
foo    44.841828
bar     1.129182
dtype: float64

In [8]: df.max(numeric_only=False)
Out[8]:
foo    100.000000
bar      0.309971
dtype: float64

In [9]: df.max(numeric_only=True)
Out[9]:
bar    0.309971
dtype: float64