FIX: fix interpolate with kwarg limit area and limit direction using pad or bfill by cchwala · Pull Request #31048 · pandas-dev/pandas (original) (raw)
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
This test currently only test limit_area.
For limit_direction the implementation should later raise an error,
because pad and bfill both already define a direction. But let's
now first do the implementation of the limit_area for pad
and bfill.
Since methods pad and bfill in blocks.interpolate end up
using missing.interpolate_2d which can not (easily) be extended
to support limit_area, I introduce the new function
missing.interpolate_1d_fill. It is a modified copy of interpolate_2d
but only works for 1d data and uses newly introduced function
_derive_indices_of_nans_to_preserve, which is now also used in
missing.interpolate_1d. It works the same way as the
1D-interpolation functions which are based on scipy-interpolation which
are applied via np.apply_along_axis.
Hello @cchwala! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻
Comment last updated at 2020-09-15 15:40:12 UTC
This was referenced
Jan 15, 2020
@cchwala so IIUC you are doing a precursor to solve the issue with pad and limit_area and then planning to add max_gap after that gets merged in the original PR right?
@WillAyd Yes. That is my plan.
I am reusing the code from the PR with max_gap that actually solves #26796. The max_gap PR will be a lot smaller when the changes from this PR are merge.
…valuesalso was changed via appliyingfunc`
…e used
Test for all forbidden combos of pad and backfill is included
cchwala changed the title
FIX: fix interpolate with kward limit area and limit direction usung pad or bfill FIX: fix interpolate with kwarg limit area and limit direction using pad or bfill
- black formatting
- typo
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good. If you can annotate any new functions would be helpful. Added what I think should be
| def interpolate_1d_fill( |
| values, |
| method="pad", |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| method="pad", |
|---|
| method: str = "pad", |
| def interpolate_1d_fill( |
|---|
| values, |
| method="pad", |
| axis=0, |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(import from pandas._typing)
| values, |
|---|
| method="pad", |
| axis=0, |
| limit=None, |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| limit=None, |
|---|
| limit: Optional[int] = None, |
| method="pad", |
|---|
| axis=0, |
| limit=None, |
| limit_area=None, |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| limit_area=None, |
|---|
| limit_area: Optional[str] = None, |
| limit=None, |
|---|
| limit_area=None, |
| fill_value=None, |
| dtype=None, |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| dtype=None, |
|---|
| dtype: Optional[Dtype] = None, |
(import from pandas._typing)
| axis=0, |
|---|
| limit=None, |
| limit_area=None, |
| fill_value=None, |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| fill_value=None, |
|---|
| fill_value: Optional[Hashable] = None, |
Should be ready now. I am offline for some hours of sleep, though.
@WillAyd Any further requests for changes?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good
| "column to a numeric dtype." |
|---|
| ) |
| # Set `limit_direction` depending on `method` |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason this logic does not simply belong in interpolate_2d?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or call something like clean_fill_method
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
limit_direction has no effect in interpolate_2d.
My idea was to detect conflicting user input, e.g. pad with limit_direction='backward' as early as possible in the chain of calls to the many functions that are involved.
But I can move the logic to a function like clean_fill_method, e.g. called set_limit_direction. Should this function belong to generic.py or missing.py?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interpolate_2d is also called from fillna. only NDFrame.interpolate allows limit_direction so does make sense to validate here.
clean_fill_method checks method. probably want to avoid passing more parameters around.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls make a method similr to clean_fill_method then, e.g this logic should living pandas/core/missing.py
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what to do here. after merge with master, there is similar code at this location, but it is not mine...
| return [P(x, nu) for nu in der] |
|---|
| def interpolate_1d_fill( |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this under interpolate_1d
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interpolate_1d_fill is a special case needed by blocks._interpolate_with_fill() to support the limit kwargs with method pad. As you requested in the comment above interpolate_1d_fill is now called from within interpolate_2d which is what blocks._interpolate_with_fill() calls.
You request, moving it to missing.interpolate_1d will, in my opinion, not work, because missing.interpolate_1d is not reached in the call sequence for interpolating with method pad since blocks.Block.interpolate() splits up into blocks.Block._interpolate (for scipy wrappers) and blocks.Block._interpolate_with_fill (for pad methods). missing.interpolate_1d is only called in the former case, i.e. for methods using the scipy wrappers.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cchwala interpolate_2d will work on a 1d array. did you investigate applying it along an axis (with masking logic) rather than creating a 1d version.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #34749 for alternative implementation calling interpolate_2d instead. interpolate_2d already has the limit logic so no need to use the preserve_nans set based logic.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the nice solution from #34749 as suggested above
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a consequence there is no interpolate_1d_fill anymore
…ments
Test on my local machine are not affected by removing the unncessery
arguments valid and invalid, which are now derived within the
function.
not really sure of the status here and @simonjayhawkins had some cleanups as well, can we clarify.
I did break some bits off this PR. so does need a merge of master to see what's left (I did try this #31048 (comment))
Okay. I will do the merge with master today to see what is left. I guess, at least the tests will still be of value.
…imit_area_and_limit_direction_with_pad
I removed most of the code that I have added in this branch and will add it back in step by step in the next commits
I am now looking into the suggestion from @simonjayhawkins to use interpolate_2d instead of introducing interpolate_1d_fill to add less clutter to the already cluttered code.
@jreback can you please have another look? As expected there is not much left from this PR because parts of it have already been merged elsewhere. Most of your code review comments do not apply anymore, only this one.
@jreback Sorry to ping you again, but it would be great if I could finish this PR with (hopefully) final adjustments tomorrow (Friday). Otherwise things would have to wait again another two weeks because I will be offline.
will have to wait as i am away
| dtype=self.dtype, |
|---|
| ) |
| if limit_area is None: |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put this in a function in missing and just call here?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. But if I do this I could also directly integrate it into missing.interpolate_2d, e.g. like in this branch. Advantage, no need to introduce yet another function that does some interpolation. Disadvantage, obfuscates missing.interpolate_2d. For this, adding some comments would help for sure.
If you prefer the separate function in missing, what function name would you choose?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Merged and CI is running.
I have also pushed a branch with an example implementation using a separate function in missing. The name of the new function is not good, but the implementation would probably look something like this.
For both solutions, tests for interpolation pass, but I did not run the full test suite. Will see what changes are required when CI is done.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test are green.
If you find the recursive solution appropriate, I would add a comment to the function that briefly explains the reasoning. Anything else that needs improvement?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cchwala can you resolve merge conflicts and can take another look
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks ok, some comments, can you merge master and we can get this in.
| # and `limit_area` logic along a certain axis. |
|---|
| if limit_area is not None: |
| def func(values): |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this a module level function and just call it here
moving off 1.2 need merge master and update to comments.
Labels
np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Concat, Merge/Join, Stack/Unstack, Explode