pd.to_numeric produces misleading results on DataFrame (original) (raw)

when pd.to_numeric is called with errors='coerce' on a DataFrame, it doesn't raise and just returns the original DataFrame.

This may be related to the discussion here #11221 as this function currently doesn't support anything more than 1-d.

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': [1, 2, 'foo'], 'b': [2.3, -1, 'bar']})

In [3]: df
Out[3]:
     a    b
0    1  2.3
1    2   -1
2  foo  bar

In [4]: pd.to_numeric(df)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-9febd95a7c0a> in <module>()
----> 1 pd.to_numeric(df)

/Users/mortada_mehyar/code/github/pandas/pandas/tools/util.py in to_numeric(arg, errors)
     94         conv = lib.maybe_convert_numeric(arg,
     95                                          set(),
---> 96                                          coerce_numeric=coerce_numeric)
     97     except:
     98         if errors == 'raise':

/Users/mortada_mehyar/code/github/pandas/pandas/src/inference.pyx in pandas.lib.maybe_convert_numeric (pandas/lib.c:52369)()
    518 cdef int64_t iINT64_MIN = <int64_t> INT64_MIN
    519
--> 520 def maybe_convert_numeric(object[:] values, set na_values,
    521                           bint convert_empty=True, bint coerce_numeric=False):
    522     '''

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

In [5]: pd.to_numeric(df, errors='coerce')
Out[5]:
     a    b
0    1  2.3
1    2   -1
2  foo  bar

Note that the last expression doesn't raise but the previous one does.

Seems like we should either

make pd.to_numeric work with DataFrame or NDFrame in general
simply raise here too if a DataFrame or something more than 1-d is passed