Issues with numeric_only for DataFrame.std() · Issue #9201 · pandas-dev/pandas (original) (raw)
Navigation Menu
- Explore
- Pricing
Provide feedback
Saved searches
Use saved searches to filter your results more quickly
Appearance settings
Description
The docstring shows a numeric_only
option for DataFrame.std()
but it does not seem to actually be implemented. I'm happy to take a crack at fixing it but I'm not sure whether it's the doc or the implementation that needs fixing.
To see this consider a mixed-type DataFrame
where I'm setting one entry to be a str
of '100'
while all other entries are float
. For std()
It does not matter whether numeric_only
is True
or False
, but for max()
it clearly makes a difference.
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame(np.random.randn(5, 2), columns=['foo', 'bar'])
In [4]: df.ix[0, 'foo'] = '100'
In [5]: df
Out[5]:
foo bar
0 100 -1.958036
1 0.221049 0.309971
2 1.200093 -0.103244
3 -2.475388 -2.279483
4 0.1623936 -1.185682
In [6]: df.std(numeric_only=True)
Out[6]:
foo 44.841828
bar 1.129182
dtype: float64
In [7]: df.std(numeric_only=False)
Out[7]:
foo 44.841828
bar 1.129182
dtype: float64
In [8]: df.max(numeric_only=False)
Out[8]:
foo 100.000000
bar 0.309971
dtype: float64
In [9]: df.max(numeric_only=True)
Out[9]:
bar 0.309971
dtype: float64