BUG: reindex() and reindex_like() fill behavior is different in pandas 12.0 and 13.1? · Issue #6418 · pandas-dev/pandas (original) (raw)

I just came across an issue which caused me serious troubles since upgrading from pandas 12.0 to 13.1. It happens when using a fill method with a reindex() or reindex_like() method. Moreover, those method are not giving consistent results anymore! I have not tested how this issue or originates from changed .ffill() and similar method, but I see it propagates to resample(). Could not find any recent mentioning of the strange behavior and no hints in the docs or What's New section.

This is the problem I encounter using pandas 12.0 (with numpy 1.7.1, in both, 32bit Python 2.7.5 Python x,y and 64bit, WinPython-64bit-2.7.4.1; windows 7) and pandas 13.1 (D:\PortableApps\WinPython-64bit-2.7.6.2, numpy 1.8.0). Pandas 12.0 behavior is the same for the 32 bit and 64 bit versions, so this cannot explain the problem.

Code:

import pandas as pd
# Make low frequency timeseries:
i30 = index=pd.date_range('2002-02-02', periods=4, freq='30T')
s=pd.Series(np.arange(4.), index=i30)
s[2] = np.NaN 

# Upsample by factor 3 with reindex() and resample() methods:
i10 = pd.date_range(i30[0], i30[-1], freq='10T')
s10 = s.reindex(index=i10, method='bfill')
s10_2 = s.reindex(index=i10, method='bfill', limit=2)
r10 = s.resample('10Min', fill_method='bfill')
r10_2 = s.resample('10Min', fill_method='bfill', limit=2)

In pandas 12.0: s10 equals s10_2 equals r10 equals r10_2

s10
Out[60]: 
2002-02-02 00:00:00     0
2002-02-02 00:10:00     1
2002-02-02 00:20:00     1
2002-02-02 00:30:00     1
2002-02-02 00:40:00   NaN
2002-02-02 00:50:00   NaN
2002-02-02 01:00:00   NaN
2002-02-02 01:10:00     3
2002-02-02 01:20:00     3
2002-02-02 01:30:00     3
Freq: 10T, dtype: float64

In pandas 13.1: s10 does not equal s10_2; s10 has all NaN's filled

s10
Out[120]: 
2002-02-02 00:00:00    0
2002-02-02 00:10:00    1
2002-02-02 00:20:00    1
2002-02-02 00:30:00    1
2002-02-02 00:40:00    3
2002-02-02 00:50:00    3
2002-02-02 01:00:00    3
2002-02-02 01:10:00    3
2002-02-02 01:20:00    3
2002-02-02 01:30:00    3
Freq: 10T, dtype: float64

Same holds for resampled series r10
Conclusion: in pandas 13.1, all is filled if limit=None which breaks with the pandas 12.0 behavior. I think the 12.0 behavior is mre sensible; only fill the gaps created from upsampling.
This even more import for the reindex_like method because there the "limit" key cannot limit which gaps are filled in pandas 13.1:

s.reindex_like(s10, method='bfill', limit=2)
Out[121]: 
2002-02-02 00:00:00    0
2002-02-02 00:10:00    1
2002-02-02 00:20:00    1
2002-02-02 00:30:00    1
2002-02-02 00:40:00    3
2002-02-02 00:50:00    3
2002-02-02 01:00:00    3
2002-02-02 01:10:00    3
2002-02-02 01:20:00    3
2002-02-02 01:30:00    3
Freq: 10T, dtype: float64

Hope this is clear and I can be reproduced? I hope this can be fixed soon. But of course, if you can reproduce this behavior and it has indeed change from 12.0 to 13.1, this should be in the docs