PERF: to_numeric for numeric dtypes by sinhrks · Pull Request #12777 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation13 Commits1 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

sinhrks

Skip object conversion if input is numeric already.

-  146.41ms    26.45μs      0.00  miscellaneous.to_numeric.time_from_float

@jreback

@jreback

In [1]: pd.date_range('20130101',periods=3,tz='US/Eastern')
Out[1]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', '2013-01-03 00:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', freq='D')

In [2]: pd.to_numeric(pd.date_range('20130101',periods=3,tz='US/Eastern'))
TypeError: 'Index' does not have the buffer interface

In [3]: pd.to_numeric(pd.date_range('20130101',periods=3,tz='US/Eastern').to_series())
Out[3]: 
2013-01-01 00:00:00-05:00    1357016400000000000
2013-01-02 00:00:00-05:00    1357102800000000000
2013-01-03 00:00:00-05:00    1357189200000000000
Freq: D, dtype: int64

@jreback

@sinhrks this looked fine. can you just rebase (I think this was sent when codecov was misbehaving).

@jreback

we can push this as well, lmk.

@codecov-io

Current coverage is 83.91%

Merging #12777 into master will increase coverage by +<.01%

@@ master #12777 diff @@

Files 136 136
Lines 49918 49931 +13
Methods 0 0
Messages 0 0
Branches 0 0

  1. File ...das/tseries/index.py (not in diff) was modified. more
    • Misses -1
    • Partials 0
    • Hits +1

Powered by Codecov. Last updated by 2d2b45a

@sinhrks

@jreback Current master has following issues. I've updated the PR to included the fixes.

1. to_numeric(Index) returns np.ndarray, not Index

To be compat with to_datetime, it must be Index.

pd.to_numeric(pd.Index([1, 2, 3]))
# array([1, 2, 3])

2. datetime-likes are not actually supported

Based on your examples, it should be asi8 repr.

pd.to_numeric(pd.date_range('2011-01-01', freq='M', periods=3))
# TypeError: 'Index' does not have the buffer interface

pd.to_numeric(pd.Series(pd.date_range('2011-01-01', freq='M', periods=3)))
# TypeError: Invalid object type

sinhrks

tm.assert_index_equal(res, pd.Index(idx.asi8, name='xxx'))
# ToDo: enable when we can support native PeriodDtype
# res = pd.to_numeric(pd.Series(idx, name='xxx'))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoided to use is_period_array ATM. we can have faster impl when period dtype is added.

jreback

elif getattr(arg, 'ndim', 1) > 1:
raise TypeError('arg must be a list, tuple, 1-d array, or Series')
else:
values = arg

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will fail with a scalar (though that might be another issue).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it fails on master. Will include the fix.

pd.to_numeric(1)
# ValueError: Buffer has wrong number of dimensions (expected 1, got 0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gr8!. I think there might be an issue about this somewhere.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sinhrks

@sinhrks

Added scalar impl and tests (test_scalar). Now green except for codecov/changes.

@jreback