PERF: to_numeric for numeric dtypes by sinhrks · Pull Request #12777 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation13 Commits1 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
- tests added / passed
- passes
git diff upstream/master | flake8 --diff
- whatsnew entry
Skip object
conversion if input is numeric already.
- 146.41ms 26.45μs 0.00 miscellaneous.to_numeric.time_from_float
In [1]: pd.date_range('20130101',periods=3,tz='US/Eastern')
Out[1]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', '2013-01-03 00:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', freq='D')
In [2]: pd.to_numeric(pd.date_range('20130101',periods=3,tz='US/Eastern'))
TypeError: 'Index' does not have the buffer interface
In [3]: pd.to_numeric(pd.date_range('20130101',periods=3,tz='US/Eastern').to_series())
Out[3]:
2013-01-01 00:00:00-05:00 1357016400000000000
2013-01-02 00:00:00-05:00 1357102800000000000
2013-01-03 00:00:00-05:00 1357189200000000000
Freq: D, dtype: int64
@sinhrks this looked fine. can you just rebase (I think this was sent when codecov was misbehaving).
we can push this as well, lmk.
Current coverage is 83.91%
@@ master #12777 diff @@
Files 136 136
Lines 49918 49931 +13
Methods 0 0
Messages 0 0
Branches 0 0
- Hits 41885 41899 +14
- Misses 8033 8032 -1
Partials 0 0
- File
...das/tseries/index.py
(not in diff) was modified. more- Misses
-1
- Partials
0
- Hits
+1
- Misses
Powered by Codecov. Last updated by 2d2b45a
@jreback Current master has following issues. I've updated the PR to included the fixes.
1. to_numeric(Index)
returns np.ndarray
, not Index
To be compat with to_datetime
, it must be Index
.
pd.to_numeric(pd.Index([1, 2, 3]))
# array([1, 2, 3])
2. datetime-likes are not actually supported
Based on your examples, it should be asi8
repr.
pd.to_numeric(pd.date_range('2011-01-01', freq='M', periods=3))
# TypeError: 'Index' does not have the buffer interface
pd.to_numeric(pd.Series(pd.date_range('2011-01-01', freq='M', periods=3)))
# TypeError: Invalid object type
tm.assert_index_equal(res, pd.Index(idx.asi8, name='xxx')) |
---|
# ToDo: enable when we can support native PeriodDtype |
# res = pd.to_numeric(pd.Series(idx, name='xxx')) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoided to use is_period_array
ATM. we can have faster impl when period dtype is added.
elif getattr(arg, 'ndim', 1) > 1: |
---|
raise TypeError('arg must be a list, tuple, 1-d array, or Series') |
else: |
values = arg |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will fail with a scalar (though that might be another issue).
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently it fails on master. Will include the fix.
pd.to_numeric(1)
# ValueError: Buffer has wrong number of dimensions (expected 1, got 0)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gr8!. I think there might be an issue about this somewhere.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added scalar impl and tests (test_scalar
). Now green except for codecov/changes
.