BUG: reindex() and reindex_like() fill behavior is different in pandas 12.0 and 13.1? · Issue #6418 · pandas-dev/pandas (original) (raw)
I just came across an issue which caused me serious troubles since upgrading from pandas 12.0 to 13.1. It happens when using a fill method with a reindex() or reindex_like() method. Moreover, those method are not giving consistent results anymore! I have not tested how this issue or originates from changed .ffill() and similar method, but I see it propagates to resample(). Could not find any recent mentioning of the strange behavior and no hints in the docs or What's New section.
This is the problem I encounter using pandas 12.0 (with numpy 1.7.1, in both, 32bit Python 2.7.5 Python x,y and 64bit, WinPython-64bit-2.7.4.1; windows 7) and pandas 13.1 (D:\PortableApps\WinPython-64bit-2.7.6.2, numpy 1.8.0). Pandas 12.0 behavior is the same for the 32 bit and 64 bit versions, so this cannot explain the problem.
Code:
import pandas as pd
# Make low frequency timeseries:
i30 = index=pd.date_range('2002-02-02', periods=4, freq='30T')
s=pd.Series(np.arange(4.), index=i30)
s[2] = np.NaN
# Upsample by factor 3 with reindex() and resample() methods:
i10 = pd.date_range(i30[0], i30[-1], freq='10T')
s10 = s.reindex(index=i10, method='bfill')
s10_2 = s.reindex(index=i10, method='bfill', limit=2)
r10 = s.resample('10Min', fill_method='bfill')
r10_2 = s.resample('10Min', fill_method='bfill', limit=2)
In pandas 12.0: s10 equals s10_2 equals r10 equals r10_2
s10
Out[60]:
2002-02-02 00:00:00 0
2002-02-02 00:10:00 1
2002-02-02 00:20:00 1
2002-02-02 00:30:00 1
2002-02-02 00:40:00 NaN
2002-02-02 00:50:00 NaN
2002-02-02 01:00:00 NaN
2002-02-02 01:10:00 3
2002-02-02 01:20:00 3
2002-02-02 01:30:00 3
Freq: 10T, dtype: float64
In pandas 13.1: s10 does not equal s10_2; s10 has all NaN's filled
s10
Out[120]:
2002-02-02 00:00:00 0
2002-02-02 00:10:00 1
2002-02-02 00:20:00 1
2002-02-02 00:30:00 1
2002-02-02 00:40:00 3
2002-02-02 00:50:00 3
2002-02-02 01:00:00 3
2002-02-02 01:10:00 3
2002-02-02 01:20:00 3
2002-02-02 01:30:00 3
Freq: 10T, dtype: float64
Same holds for resampled series r10
Conclusion: in pandas 13.1, all is filled if limit=None which breaks with the pandas 12.0 behavior. I think the 12.0 behavior is mre sensible; only fill the gaps created from upsampling.
This even more import for the reindex_like method because there the "limit" key cannot limit which gaps are filled in pandas 13.1:
s.reindex_like(s10, method='bfill', limit=2)
Out[121]:
2002-02-02 00:00:00 0
2002-02-02 00:10:00 1
2002-02-02 00:20:00 1
2002-02-02 00:30:00 1
2002-02-02 00:40:00 3
2002-02-02 00:50:00 3
2002-02-02 01:00:00 3
2002-02-02 01:10:00 3
2002-02-02 01:20:00 3
2002-02-02 01:30:00 3
Freq: 10T, dtype: float64
Hope this is clear and I can be reproduced? I hope this can be fixed soon. But of course, if you can reproduce this behavior and it has indeed change from 12.0 to 13.1, this should be in the docs