pd.to_numeric - float64, object or error? · Issue #17007 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
import pandas as pd
print pd.version
#different result example d = pd.DataFrame({'a':[200, 300, '', 'NaN', 10000000000000000000]})
#returns dtype object a = pd.to_numeric(d['a'], errors='coerce') print a.dtype
#return dtype float64 b = d['a'].apply(pd.to_numeric, errors='coerce') print b.dtype
#why not float64? d = pd.DataFrame({'a':[200, 300, '', 'NaN', 30000000000000000000]})
#returns dtype object a = pd.to_numeric(d['a'], errors='coerce') print a.dtype
#returns OverflowError b = d['a'].apply(pd.to_numeric, errors='coerce') print b.dtype
Problem description
Hi guys, I realized that result of to_numeric changes depending on the way you pass a Series to that function. Please see example above. When I call to_numeric with series passed as parameter, it returns "object", but when I apply to_numeric to that series, it returns "float64". Moreover, I'm a bit confused what's the correct behavior of to_numeric, why it doesn't convert looooong int-like number to float64? It throws an exception from which I can't even deduce which number (position, index) caused that exception...
I'm pretty sure my issue is being discussed somewhere already, I tried to search the proper issue but rather found bits and pieces about to_numeric and convertions in general. Please feel free to put my issue in more appropriate thread.
Output of pd.show_versions()
INSTALLED VERSIONS ------------------ commit: None python: 2.7.13.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None
pandas: 0.20.3
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.13.1
scipy: None
xarray: None
IPython: 5.4.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None