BUG: PeriodIndex and Period subtraction error by sinhrks · Pull Request #13071 · pandas-dev/pandas (original) (raw)

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

@sinhrks

Similar to #5202. PeriodIndex subtraction raises AttributeError either side is Period (scalar).

pd.PeriodIndex(['2011-01', '2011-02'], freq='M') - pd.Period('2011-01', freq='M') 
# AttributeError: 'PeriodIndex' object has no attribute 'ordinal'

Expected

If Period(NaT) is included in either side, result is Float64Index to hold nan.

pd.PeriodIndex(['2011-01', '2011-02'], freq='M') - pd.Period('2011-01', freq='M') 
# Int64Index([0, 1], dtype='int64')

pd.PeriodIndex(['2011-01', 'NaT'], freq='M') - pd.Period('2011-01', freq='M') 
Float64Index([0.0, nan], dtype='float64')

@jreback

hmm, shouldn't subtraction yield a TimedeltaIndex?

@sinhrks

Yes returning TimedeltaIndex may be natural. It's based on the current Period behavior.

pd.Period('2011-03', freq='M') - pd.Period('2011-01', freq='M') 
# 2

@sinhrks

Also found DatetimeIndex - Period raises below errors on current master. Are these should be supported, or raise more understandable errors?

pd.DatetimeIndex(['2011-01-01', '2011-02-01']) - pd.Period('2011-01-01', freq='D') 
# ValueError: Cannot do arithmetic with non-conforming periods

pd.DatetimeIndex(['2011-01-01', '2011-01-02'], freq='D') - pd.Period('2011-01-01', freq='D') 
# AttributeError: 'DatetimeIndex' object has no attribute 'ordinal'

@jreback

hmm, actually I think

pd.Period('2011-03', freq='M') - pd.Period('2011-01', freq='M')

should be a 2M Span, which we don't really have the concept of. Timedelta point-in-time.

cc @MaximilianR

@jreback

I think your first example is legit (you can't do arithmetic between spans and points) but the 2nd should work

@max-sixty

If we think about a Period as a interval / span of time, I think it's reasonable that equal freq Periods can do arithmetic.
This can be extended to intervals generally: [5->7] - [2->4] == 3.

While Period arithmetic currently return ints, Timedeltas would be ideal (although ints are 'ok', it's pretty clear what they mean)

So I think this should probably be a 2M Timedelta:
pd.Period('2011-03', freq='M') - pd.Period('2011-01', freq='M')

@jreback do you think differently?

@jreback

@MaximilianR

Timedelta don't really support '2 months' as these are only days,h,m,s, etc.

we need a Perioddelta

@max-sixty

Timedelta don't really support '2 months' as these are only days,h,m,s, etc.

I see.

But for me it's the same concept, just with different units. So [5->7] - [2->4] == 3, whether with days, months, or ounces.

@jreback

so u end up losing freq info here
and 2 months is itself ambiguous - we could return an offset

@max-sixty

we could return an offset

Yes, I think this is absolutely right.

and 2 months is itself ambiguous

Why?

@jreback

@MaximilianR 2M doesn't have a fixed conversion because its a relative offset, where as a Timedelta of 60 days is absolute.

So you leave something on the table.

PeriodDelta is the right soln, but maybe we can just tag this onto Timedelta, IOW have it support 2 Months

@max-sixty

I don't know the intricacies of DateOffset / relativedelta, but relativedelta seems to do this well, at the conceptual level?

In [15]: from dateutil import relativedelta In [18]: relativedelta.relativedelta(months=2) Out[18]: relativedelta(months=+2) In [22]: relativedelta.relativedelta(months=2) + datetime.datetime(2005,5,3) Out[22]: datetime.datetime(2005, 7, 3, 0, 0)

@jreback

relativedelta is equiv to an offset (though more fully featured, the offsets I mean)

@sinhrks

@jreback

ok on fixing what is broken here. Yes, let's open a new issue to discuss 1)

This was referenced

May 4, 2016

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this always a Period? see my comment above

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always Period. Defined in tseries/base.py.

@sinhrks

@sinhrks sinhrks deleted the period_period_sub branch

May 6, 2016 12:34

Labels