BUG/PERF: offsets.apply doesnt preserve nanosecond by sinhrks · Pull Request #7697 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation27 Commits1 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

sinhrks

Main Fix is to preserve nanosecond info which can lost during offset.apply, but it also includes:

Before

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
timeseries_year_incr                         |   0.0164 |   0.0103 |   1.5846 |
timeseries_year_apply                        |   0.0153 |   0.0094 |   1.6356 |
timeseries_day_incr                          |   0.0187 |   0.0053 |   3.5075 |
timeseries_day_apply                         |   0.0164 |   0.0033 |   4.9048 |

Target [d0076db] : PERF: Improve index.min and max perf
Base   [da0f7ae] : RLS: 0.14.0 final

After the fix

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
timeseries_year_incr                         |   0.0150 |   0.0087 |   1.7339 |
timeseries_year_apply                        |   0.0126 |   0.0073 |   1.7283 |
timeseries_day_incr                          |   0.0130 |   0.0053 |   2.4478 |
timeseries_day_apply                         |   0.0107 |   0.0033 |   3.2143 |

Target [64dd021] : BUG: offsets.apply doesnt preserve nanosecond
Base   [da0f7ae] : RLS: 0.14.0 final

jreback

@@ -26,15 +27,17 @@
# convert to/from datetime/timestamp to allow invalid Timestamp ranges to pass thru
def as_timestamp(obj):
if isinstance(obj, Timestamp):
return obj
if type(obj) == date:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be isinstance(obj, date)? not sure if you need to include (np.datetime64,datetime,date)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because `datetimeis a subclass ofdate``. Modified to use``isinstance``.

>>> dt = datetime.datetime(2011, 1, 1)
>>> isinstance(dt, datetime.date)
True

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, Timestamp can accept datetime.date. Thus this check seems not to be required at all.

@jreback

@sinhrks

@jreback Rebased. Is any other thing required?

@jreback

@sinhrks

Affected offset (reset nanosecond)

If the aply logic includes datetime conversion, nanosecond will be lost.

Affected offset (dateutil support)

tz.localize raises AttributeError: 'tzfile' object has no attribute 'localize' if tz is dateutil timezone.

@sinhrks

@jreback

not sure what you mean (about Tick being calced more than once). The 2nd time is simply int_64 addition. AFAICT. delta_to_nanoseconds is necessary. What do you think this should change to?

@sinhrks

@jreback

I think that delta and nanos could be cache_readonly. Once you create an offset they are not changed (I don't think). You can try that (but separate PR). I am not sure how to test that, maybe just trace the code and see.

@jreback

@sinhrks otherwise this looks ok. It just changes a lot of code so trying to review.

@sinhrks

OK, modified to normal property.

@jreback

seems, apply_wraps is on every apply, except for in DateOffset. maybe add a note there why this is (or is it right?)

jreback

value = result.value
result = Timestamp(value + nano)
if tz is not None and result.tzinfo is None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be result= tslib._localize_pydatetime(result, tz) as well here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is a flow for Timestamp, no need to care for datetime here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, maybe add a note (or you can simply use the other routine). I found it confusing?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Modified to use tslib._localize_pydatetime(result, tz) to avoid any confusion.

@sinhrks

DateOffset needs apply_wraps. I missed because of misunderstanding that DateOffset cannot be used by itself (#7375). Fixed and added tests.

@jreback

@sinhrks

@sinhrks

jreback added a commit that referenced this pull request

Jul 25, 2014

@jreback

BUG/PERF: offsets.apply doesnt preserve nanosecond

@jreback

@sinhrks thanks for this...cleans up a large amount of code.....

2 participants

@sinhrks @jreback