ENH/BUG: Offset.apply dont preserve time by sinhrks · Pull Request #7375 · pandas-dev/pandas (original) (raw)
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an API change?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on definition. Whether to regard BMonthEnd and MonthEnd inconsistencies are spec or bug. From users point of view, I think it looks a bug as most of other offsets behave differently.
sinhrks changed the title
ENH/BUG: Offset.aplly dont preserve time ENH/BUG: Offset.apply dont preserve time
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't you move these to API section (they technically are a missing 'feature', but just to make it obvious)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks fine, pls move to API section, then good 2 go
@jreback Moved the description to API, and rebased.
jreback added a commit that referenced this pull request
ENH/BUG: Offset.apply dont preserve time
@jreback While trying to clean up the whatsnew page for 0.14.1, I wanted to rewrite the api change on the offsets a bit (as it is now not clear what did change), but I stumbled on this:
In [26]: d
Out[26]: Timestamp('2014-01-01 09:00:00')
In [27]: d + offsets.MonthEnd()
Out[27]: Timestamp('2014-01-31 09:00:00')
In [28]: d + offsets.MonthEnd(normalize=True)
Out[28]: Timestamp('2014-01-31 00:00:00')
In [29]: d + offsets.Day()
Out[29]: Timestamp('2014-01-02 09:00:00')
In [30]: d + offsets.Day(normalize=True)
Out[30]: Timestamp('2014-01-02 09:00:00')
In [31]: d + offsets.MonthOffset()
Out[31]: Timestamp('2014-01-02 09:00:00')
So for MonthEnd it seems to work as expected, but with Day normalize has no effect, and MonthOffset gives the same as Day. Are these bugs, or just due to my misunderstanding (I have to say that I almost never use offsets directly)
yes it looks like a bug with Day which is handled in Tick
Yes, it is a bug. Currently normalize is effective only if addition or subtraction internally calls apply. Fixed in #7697.
And the MonthOffset issue?
And yet something else:
In [55]: offsets.YearOffset()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-55-b53e59b753d1> in <module>()
----> 1 offsets.YearOffset()
c:\users\vdbosscj\scipy\pandas-joris\pandas\tseries\offsets.pyc in __init__(self
, n, normalize, **kwds)
1331
1332 def __init__(self, n=1, normalize=False, **kwds):
-> 1333 self.month = kwds.get('month', self._default_month)
1334
1335 if self.month < 1 or self.month > 12:
AttributeError: 'YearOffset' object has no attribute '_default_month'
can u separate out the fix for this into a new pr?
@jreback Sure, will do shortly.
@jorisvandenbossche YearOffset looks an wrapper. It doesn't define effective apply and other methods. Maybe instanciation should be prohibited?
@sinhrks In theory YearOffset should work like MonthOffset. open a new bug report for that pls.
OK. Is it correct DateOffset, MonthOffset and YearOffset can be used in itselves (even though not included in pd.offsets.__all__). The result should be a timestamp increased one unit of the day, month and year?
@sinhrks that's correct (I personally find that pretty confusing), as ts + 1 does this.
so its a semi-private method (and one should really use MonthEnd et all)
This was referenced
Jul 9, 2014
I created seperate issues for the YearOffset and MonthOffset issues I reported here. But do I understand correctly that they are not really 'meant to be used' like this?
As I expect MonthOffset to do the same as DateOffset(months=1) (and this is something different as MonthEnd):
In [76]: d + offsets.DateOffset(months=1)
Out[76]: Timestamp('2014-02-01 09:00:00')
In [77]: d + offsets.MonthOffset()
Out[77]: Timestamp('2014-01-02 09:00:00')
In [78]: d + offsets.MonthEnd()
Out[78]: Timestamp('2014-01-31 00:00:00')
@jorisvandenbossche its iffy. I don't see any explicty usage of them in docs/otherwise
I think the idea was that once could have it programatically select the class, e.g.
DateOffset(months=2) -> MonthEnd(2)
which does seem useful. Maybe need to figure out use/doc it (or clearly mark it as semi-private).
I was just updating my comment above: DateOffset(months=2) is not the same as MonthEnd(2)! And I would expect that MonthOffset would be.
sorry, I mean Month(2), e.g. just advance me 2 months.
but Month does not exist? You only have MonthBegin, MonthEnd, MonthOffset in pd.tseries.offsets
Month is de-facto MonthOffset(months=n)
The problem it is that
Timestamp('20130105') + pd.offsets.MonthOffset(months=1) works! but pd.offsets.MonthOffset(1) is de-factor a 1-day increase as the default arg is day
aha, so the __init__ shouldbe overwritten in MonthOffset?
(and where is Month? I get AttributeError: 'module' object has no attribute 'Month')
I think the contructor is ok, it needs an apply (kind of like MonthEnd has , but w/o the month end stuff).
The question remains are DateOffset/MonthOffset/YearOffset public, or abc methods?
I think this is related as well: #4804
If we decide they are not public, they should get a _ name.
If it is public, I certainly think the constructor is not OK, the default arg should not be day for MonthOffset.
For the rest I can't really say if they should be public or not, I find all this offset stuff quite confusing, and the docs are rather scarce.
ok, you have highlited the issues, I don't think its hard to 'fix' them to be correct. Let's do that, then maybe can expand the docs a bit / improve doc-strings.