Possible Bug - math on like-indexed datetime series doesn't work as expected · Issue #7500 · pandas-dev/pandas (original) (raw)
I have two series that are like-indexed datetimes. I'm trying to do simple math operations on them and noticed the results don't match what I'd expect. Specifically, subtracting one datetime from the other doesn't always result in subtraction across the aligned indices. Transforming the series to a dataframe with a dummy column gets us closer but the type manipulation isn't correct.
print firstOrderNotEval.loc[site] print firstEvalOrder.loc[site] print type(firstOrderNotEval.loc[site]) print type(firstEvalOrder.loc[site])
output:
#2008-08-21 00:00:00 #2013-09-10 00:00:00
<class 'pandas.tslib.Timestamp'>
<class 'pandas.tslib.Timestamp'>
timeToFirstNonEvalPurchase_doesntWork = ((firstOrderNotEval - firstEvalOrder)/np.timedelta64(1,'D')) timeToFirstNonEvalPurchase = ((firstOrderNotEval.to_frame('a') - firstEvalOrder.to_frame('a'))/np.timedelta64(1,'D'))['a']
print timeToFirstNonEvalPurchase_doesntWork.loc[2898717] print timeToFirstNonEvalPurchase.loc[2898717]
output:
nan
-1846 nanoseconds # note should be 1846 days
Subtracting individual elements gives the correct result but as a datetime.timedelta type. subtracting the series directly gives NaT:
site = 2898717
print (firstOrderNotEval.loc[site] - firstEvalOrder.loc[site])
print type(firstOrderNotEval.loc[site] - firstEvalOrder.loc[site])
print (firstOrderNotEval - firstEvalOrder).loc[site]
output:
-1846 days, 0:00:00
<type 'datetime.timedelta'>
NaT
Perhaps this has to do with the timestamp type itself given the following example:
print (firstOrderNotEval.to_frame('a') - firstEvalOrder.to_frame('a')).loc[site]/np.timedelta64(1,'D') print ((firstOrderNotEval.to_frame('a') - firstEvalOrder.to_frame('a'))/np.timedelta64(1,'D')).loc[site]
output:
a -1846
Name: 2898717.0, dtype: float64
a -00:00:00.000002
Name: 2898717.0, dtype: timedelta64[ns]
Note that the following have different results based on how the divide by timedelta64 is performed:
tmp = ((firstOrderNotEval.to_frame('a') - firstEvalOrder.to_frame('a'))) print (tmp/np.timedelta64(1,'D')).loc[site] print tmp.apply(lambda x: x/np.timedelta64(1,'D')).loc[site]
output:
a -00:00:00.000002
Name: 2898717.0, dtype: timedelta64[ns]
what we'd expect:
a -1846
Name: 2898717.0, dtype: float64
The attached pickle files (as .jpg) include the series used in this example
firstOrderNotEval.to_pickle('./firstOrderNotEval.jpg') firstEvalOrder.to_pickle('./firstEvalOrder.jpg')