describe() returns RuntimeWarning: Invalid value encountered in median RuntimeWarning · Issue #13146 · pandas-dev/pandas (original) (raw)
I just upgraded to 18.1 w/ conda. I started noticing this problem in some notebooks I created before the upgrade but recently revisited for further analysis.
Code Sample, a copy-pastable example if possible
import numpy as np import pandas as pd df = pd.DataFrame({'task_complete':['success','success','fail','fail','success','fail','success'], 'value':[np.nan,4.5,5.7,3.0,np.nan,6.7,3.78]})
df.value.describe()
returns a RuntimeWarning from numpy, which then gives this unexpected result for the quantiles:
In [5]: df.value.describe() /Users/adrianpalacios/anaconda/lib/python3.4/site-packages/numpy/lib/function_base.py:3403: RuntimeWarning: Invalid value encountered in median RuntimeWarning) Out[5]: count 5.000000 mean 4.736000 std 1.480703 min 3.000000 25% NaN 50% NaN 75% NaN max 6.700000 Name: value, dtype: float64
Expected Output
I got this using a different conda environment that has not been upgraded to latest pandas version:
In [5]: df.value.describe() Out[5]: count 5.000000 mean 4.736000 std 1.480703 min 3.000000 25% 3.780000 50% 4.500000 75% 5.700000 max 6.700000 Name: value, dtype: float64
Dropping the NaN's works in pandas 18.1:
In [9]: df.value.dropna().describe() Out[9]: count 5.000000 mean 4.736000 std 1.480703 min 3.000000 25% 3.780000 50% 4.500000 75% 5.700000 max 6.700000 Name: value, dtype: float64
However, this work-around is not a great option when multiple columns w/ NaNs are present:
df2 = pd.DataFrame({'task_complete':['success','success','fail','fail','success','fail','success'], 'value':[np.nan,4.5,5.7,3.0,np.nan,6.7,3.78], 'more_values':[8.2,np.nan,np.nan,np.nan,9.4,np.nan,np.nan] })
In [17]: df2[['value','more_values']].dropna().describe() Out[17]: value more_values count 0.0 0.0 mean NaN NaN std NaN NaN min NaN NaN 25% NaN NaN 50% NaN NaN 75% NaN NaN max NaN NaN
output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.4.4.final.0
python-bits: 64
OS: Darwin
OS-release: 15.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.18.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.2.2
Cython: 0.22.1
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.0
numexpr: 2.5
matplotlib: 1.5.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 1.0.0
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: None
httplib2: 0.9.1
apiclient: None
sqlalchemy: 1.0.5
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.38.0
pandas_datareader: None