BUG: PeriodIndex and Period subtraction error by sinhrks · Pull Request #13071 · pandas-dev/pandas (original) (raw)
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
- tests added / passed
- passes
git diff upstream/master | flake8 --diff - whatsnew entry
Similar to #5202. PeriodIndex subtraction raises AttributeError either side is Period (scalar).
pd.PeriodIndex(['2011-01', '2011-02'], freq='M') - pd.Period('2011-01', freq='M')
# AttributeError: 'PeriodIndex' object has no attribute 'ordinal'
Expected
If Period(NaT) is included in either side, result is Float64Index to hold nan.
pd.PeriodIndex(['2011-01', '2011-02'], freq='M') - pd.Period('2011-01', freq='M')
# Int64Index([0, 1], dtype='int64')
pd.PeriodIndex(['2011-01', 'NaT'], freq='M') - pd.Period('2011-01', freq='M')
Float64Index([0.0, nan], dtype='float64')
hmm, shouldn't subtraction yield a TimedeltaIndex?
Yes returning TimedeltaIndex may be natural. It's based on the current Period behavior.
pd.Period('2011-03', freq='M') - pd.Period('2011-01', freq='M')
# 2
Also found DatetimeIndex - Period raises below errors on current master. Are these should be supported, or raise more understandable errors?
pd.DatetimeIndex(['2011-01-01', '2011-02-01']) - pd.Period('2011-01-01', freq='D')
# ValueError: Cannot do arithmetic with non-conforming periods
pd.DatetimeIndex(['2011-01-01', '2011-01-02'], freq='D') - pd.Period('2011-01-01', freq='D')
# AttributeError: 'DatetimeIndex' object has no attribute 'ordinal'
hmm, actually I think
pd.Period('2011-03', freq='M') - pd.Period('2011-01', freq='M')
should be a 2M Span, which we don't really have the concept of. Timedelta point-in-time.
cc @MaximilianR
I think your first example is legit (you can't do arithmetic between spans and points) but the 2nd should work
If we think about a Period as a interval / span of time, I think it's reasonable that equal freq Periods can do arithmetic.
This can be extended to intervals generally: [5->7] - [2->4] == 3.
While Period arithmetic currently return ints, Timedeltas would be ideal (although ints are 'ok', it's pretty clear what they mean)
So I think this should probably be a 2M Timedelta:pd.Period('2011-03', freq='M') - pd.Period('2011-01', freq='M')
@jreback do you think differently?
Timedelta don't really support '2 months' as these are only days,h,m,s, etc.
we need a Perioddelta
Timedelta don't really support '2 months' as these are only days,h,m,s, etc.
I see.
But for me it's the same concept, just with different units. So [5->7] - [2->4] == 3, whether with days, months, or ounces.
so u end up losing freq info here
and 2 months is itself ambiguous - we could return an offset
we could return an offset
Yes, I think this is absolutely right.
and 2 months is itself ambiguous
Why?
@MaximilianR 2M doesn't have a fixed conversion because its a relative offset, where as a Timedelta of 60 days is absolute.
So you leave something on the table.
PeriodDelta is the right soln, but maybe we can just tag this onto Timedelta, IOW have it support 2 Months
I don't know the intricacies of DateOffset / relativedelta, but relativedelta seems to do this well, at the conceptual level?
In [15]: from dateutil import relativedelta In [18]: relativedelta.relativedelta(months=2) Out[18]: relativedelta(months=+2) In [22]: relativedelta.relativedelta(months=2) + datetime.datetime(2005,5,3) Out[22]: datetime.datetime(2005, 7, 3, 0, 0)
relativedelta is equiv to an offset (though more fully featured, the offsets I mean)
ok on fixing what is broken here. Yes, let's open a new issue to discuss 1)
This was referenced
May 4, 2016
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this always a Period? see my comment above
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always Period. Defined in tseries/base.py.
sinhrks deleted the period_period_sub branch