ENH/BUG: Offset.apply dont preserve time by sinhrks · Pull Request #7375 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation27 Commits1 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

sinhrks

hayd

@@ -1208,7 +1395,7 @@ def test_offset(self):
def test_normalize(self):
dt = datetime(2007, 1, 1, 3)
result = dt + BMonthEnd()
result = dt + BMonthEnd(normalize=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an API change?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on definition. Whether to regard BMonthEnd and MonthEnd inconsistencies are spec or bug. From users point of view, I think it looks a bug as most of other offsets behave differently.

@sinhrks sinhrks changed the titleENH/BUG: Offset.aplly dont preserve time ENH/BUG: Offset.apply dont preserve time

Jun 7, 2014

jreback

@@ -115,8 +115,19 @@ Enhancements
- Implemented ``sem`` (standard error of the mean) operation for ``Series``,
``DataFrame``, ``Panel``, and ``Groupby`` (:issue:`6897`)
- All ``offsets`` suppports ``normalize`` keyword to specify whether ``offsets.apply``, ``rollforward`` and ``rollback`` resets time (hour, minute, etc) or not (default ``False``, preserves time) (:issue:`7156`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't you move these to API section (they technically are a missing 'feature', but just to make it obvious)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine, pls move to API section, then good 2 go

@sinhrks

@sinhrks

@jreback Moved the description to API, and rebased.

jreback added a commit that referenced this pull request

Jun 11, 2014

@jreback

ENH/BUG: Offset.apply dont preserve time

@jorisvandenbossche

@jreback While trying to clean up the whatsnew page for 0.14.1, I wanted to rewrite the api change on the offsets a bit (as it is now not clear what did change), but I stumbled on this:

In [26]: d
Out[26]: Timestamp('2014-01-01 09:00:00')

In [27]: d + offsets.MonthEnd()
Out[27]: Timestamp('2014-01-31 09:00:00')

In [28]: d + offsets.MonthEnd(normalize=True)
Out[28]: Timestamp('2014-01-31 00:00:00')

In [29]: d + offsets.Day()
Out[29]: Timestamp('2014-01-02 09:00:00')

In [30]: d + offsets.Day(normalize=True)
Out[30]: Timestamp('2014-01-02 09:00:00')

In [31]: d + offsets.MonthOffset()
Out[31]: Timestamp('2014-01-02 09:00:00')

So for MonthEnd it seems to work as expected, but with Day normalize has no effect, and MonthOffset gives the same as Day. Are these bugs, or just due to my misunderstanding (I have to say that I almost never use offsets directly)

@jreback

@jorisvandenbossche

@jreback

yes it looks like a bug with Day which is handled in Tick

@sinhrks

Yes, it is a bug. Currently normalize is effective only if addition or subtraction internally calls apply. Fixed in #7697.

@jorisvandenbossche

And the MonthOffset issue?

And yet something else:

In [55]: offsets.YearOffset()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-55-b53e59b753d1> in <module>()
----> 1 offsets.YearOffset()

c:\users\vdbosscj\scipy\pandas-joris\pandas\tseries\offsets.pyc in __init__(self
, n, normalize, **kwds)
   1331
   1332     def __init__(self, n=1, normalize=False, **kwds):
-> 1333         self.month = kwds.get('month', self._default_month)
   1334
   1335         if self.month < 1 or self.month > 12:

AttributeError: 'YearOffset' object has no attribute '_default_month'

@jreback

can u separate out the fix for this into a new pr?

@sinhrks

@jreback Sure, will do shortly.

@jorisvandenbossche YearOffset looks an wrapper. It doesn't define effective apply and other methods. Maybe instanciation should be prohibited?

@jreback

@sinhrks In theory YearOffset should work like MonthOffset. open a new bug report for that pls.

@sinhrks

OK. Is it correct DateOffset, MonthOffset and YearOffset can be used in itselves (even though not included in pd.offsets.__all__). The result should be a timestamp increased one unit of the day, month and year?

@jreback

@sinhrks that's correct (I personally find that pretty confusing), as ts + 1 does this.

so its a semi-private method (and one should really use MonthEnd et all)

This was referenced

Jul 9, 2014

@jorisvandenbossche

I created seperate issues for the YearOffset and MonthOffset issues I reported here. But do I understand correctly that they are not really 'meant to be used' like this?

As I expect MonthOffset to do the same as DateOffset(months=1) (and this is something different as MonthEnd):

In [76]: d + offsets.DateOffset(months=1)
Out[76]: Timestamp('2014-02-01 09:00:00')

In [77]: d + offsets.MonthOffset()
Out[77]: Timestamp('2014-01-02 09:00:00')

In [78]: d + offsets.MonthEnd()
Out[78]: Timestamp('2014-01-31 00:00:00')

@jreback

@jorisvandenbossche its iffy. I don't see any explicty usage of them in docs/otherwise

I think the idea was that once could have it programatically select the class, e.g.

DateOffset(months=2) -> MonthEnd(2)

which does seem useful. Maybe need to figure out use/doc it (or clearly mark it as semi-private).

@jorisvandenbossche

I was just updating my comment above: DateOffset(months=2) is not the same as MonthEnd(2)! And I would expect that MonthOffset would be.

@jreback

sorry, I mean Month(2), e.g. just advance me 2 months.

@jorisvandenbossche

but Month does not exist? You only have MonthBegin, MonthEnd, MonthOffset in pd.tseries.offsets

@jreback

Month is de-facto MonthOffset(months=n)

The problem it is that

Timestamp('20130105') + pd.offsets.MonthOffset(months=1) works! but pd.offsets.MonthOffset(1) is de-factor a 1-day increase as the default arg is day

@jorisvandenbossche

aha, so the __init__ shouldbe overwritten in MonthOffset?
(and where is Month? I get AttributeError: 'module' object has no attribute 'Month')

@jreback

I think the contructor is ok, it needs an apply (kind of like MonthEnd has , but w/o the month end stuff).

The question remains are DateOffset/MonthOffset/YearOffset public, or abc methods?

@jreback

I think this is related as well: #4804

@jorisvandenbossche

If we decide they are not public, they should get a _ name.
If it is public, I certainly think the constructor is not OK, the default arg should not be day for MonthOffset.

For the rest I can't really say if they should be public or not, I find all this offset stuff quite confusing, and the docs are rather scarce.

@jreback

ok, you have highlited the issues, I don't think its hard to 'fix' them to be correct. Let's do that, then maybe can expand the docs a bit / improve doc-strings.