Allow interpolate() to fill backwards as well as forwards by lmjohns3 · Pull Request #10691 · pandas-dev/pandas (original) (raw)
I just checked in a few changes to address the comments above. It looks to me like the algorithm does work for gaps of any length. In the example you suggested here is the behavior:
>>> s = Series([1,3,np.nan,np.nan,np.nan,7,9,np.nan,np.nan,12,np.nan])
>>> np.array(s.interpolate(limit=2, limit_direction='both'))
array([ 1., 3., 4., 5., 6., 7., 9., 10., 11., 12., 12.])
>>> np.array(s.interpolate(limit=1, limit_direction='both'))
array([ 1., 3., 4., nan, 6., 7., 9., 10., 11., 12., 12.])
In the first example, all values are filled, because none of the NaNs is further than 2 from a non-NaN value. In the second, the "5" value remains a NaN because that slot is further than 1 from all non-NaN values. I added these two examples as tests.
Also, here's an example of the "filling" behavior of np.interp
that I was talking about for extrapolations:
>>> np.interp(x=[0, 1, 2, 3, 4], xp=[1, 3], fp=[3, 5])
array([ 3., 3., 4., 5., 5.])
To me this says, "I've observed value (fp
) 3 at index (xp
) 1 and value 5 at index 3; now I want to fill in values at indexes (x
) 0 through 4." Because 0 < 1, interp
returns 3 for this index, and because 4 > 3, interp
returns 5 for this index.
But if you think these values are actually sampled from some continuous underlying function, and you want to reconstruct this function even outside the [3, 5] interval, you have to use a spline interpolation:
>>> scipy.interpolate.InterpolatedUnivariateSpline([1, 3], [3, 5], k=1)(list(range(5)))
array([ 2., 3., 4., 5., 6.])
This works pretty well in pandas
:
>>> s = pd.Series([np.nan, 3, np.nan, 5, np.nan])
>>> np.array(s.interpolate(method='linear', limit=10, limit_direction='both'))
array([ 3., 3., 4., 5., 5.])
>>> np.array(s.interpolate(method='spline', limit=10, limit_direction='both', order=1))
array([ 2., 3., 4., 5., 6.])
You have to specify method='spline'
and order=k
to get a spline that will extrapolate, though!