PERF: to_numeric for numeric dtypes by sinhrks · Pull Request #12777 · pandas-dev/pandas (original) (raw)

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

@sinhrks

Skip object conversion if input is numeric already.

-  146.41ms    26.45μs      0.00  miscellaneous.to_numeric.time_from_float

@jreback

@jreback

In [1]: pd.date_range('20130101',periods=3,tz='US/Eastern')
Out[1]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', '2013-01-03 00:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', freq='D')

In [2]: pd.to_numeric(pd.date_range('20130101',periods=3,tz='US/Eastern'))
TypeError: 'Index' does not have the buffer interface

In [3]: pd.to_numeric(pd.date_range('20130101',periods=3,tz='US/Eastern').to_series())
Out[3]: 
2013-01-01 00:00:00-05:00    1357016400000000000
2013-01-02 00:00:00-05:00    1357102800000000000
2013-01-03 00:00:00-05:00    1357189200000000000
Freq: D, dtype: int64

@jreback

@sinhrks this looked fine. can you just rebase (I think this was sent when codecov was misbehaving).

@jreback

we can push this as well, lmk.

@codecov-io

Current coverage is 83.91%

Merging #12777 into master will increase coverage by +<.01%

@@ master #12777 diff @@

Files 136 136
Lines 49918 49931 +13
Methods 0 0
Messages 0 0
Branches 0 0

  1. File ...das/tseries/index.py (not in diff) was modified. more
    • Misses -1
    • Partials 0
    • Hits +1

Powered by Codecov. Last updated by 2d2b45a

@sinhrks

@jreback Current master has following issues. I've updated the PR to included the fixes.

1. to_numeric(Index) returns np.ndarray, not Index

To be compat with to_datetime, it must be Index.

pd.to_numeric(pd.Index([1, 2, 3]))
# array([1, 2, 3])

2. datetime-likes are not actually supported

Based on your examples, it should be asi8 repr.

pd.to_numeric(pd.date_range('2011-01-01', freq='M', periods=3))
# TypeError: 'Index' does not have the buffer interface

pd.to_numeric(pd.Series(pd.date_range('2011-01-01', freq='M', periods=3)))
# TypeError: Invalid object type

sinhrks

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoided to use is_period_array ATM. we can have faster impl when period dtype is added.

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will fail with a scalar (though that might be another issue).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it fails on master. Will include the fix.

pd.to_numeric(1)
# ValueError: Buffer has wrong number of dimensions (expected 1, got 0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gr8!. I think there might be an issue about this somewhere.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sinhrks

@sinhrks

Added scalar impl and tests (test_scalar). Now green except for codecov/changes.

@jreback

Labels