DOC: update the pandas.Series/DataFrame.interpolate docstring by math-and-data · Pull Request #20270 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation59 Commits13 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
- PR title is "DOC: update the docstring"
- The validation script passes:
scripts/validate_docstrings.py <your-function-or-method>
- The PEP8 style check passes:
git diff upstream/master -u -- "*.py" | flake8 --diff
- The html version looks good:
python doc/make.py --single <your-function-or-method>
- It has been proofread on language by another sprint participant
Errors in the validation script:
- Formatting of the 'method' param is not right (not sure how to break option list properly into multiple lines)
- kwargs (to ignore)
- Formatting compaints are due to extra line for "New in version ..."
################### Docstring (pandas.DataFrame.interpolate) ###################
################################################################################
Interpolate values according to different methods.
Please note that only ``method='linear'`` is supported for
DataFrames/Series with a MultiIndex.
Parameters
----------
method : {'linear', 'time', 'index', 'values', 'nearest', 'zero',
'slinear', 'quadratic', 'cubic', 'barycentric', 'krogh',
'polynomial', 'spline', 'piecewise_polynomial', 'pad',
'from_derivatives', 'pchip', 'akima'}, default 'linear'
Interpolation technique to use.
* 'linear': Ignore the index and treat the values as equally
spaced. This is the only method supported on MultiIndexes.
Default.
* 'time': Interpolation works on daily and higher resolution
data to interpolate given length of interval.
* 'index', 'values': use the actual numerical values of the index.
* 'pad': Fill in NaNs using existing values.
* 'nearest', 'zero', 'slinear', 'quadratic', 'cubic',
'barycentric', 'polynomial': Passed to
``scipy.interpolate.interp1d``. Both 'polynomial' and 'spline'
require that you also specify an `order` (int),
e.g. df.interpolate(method='polynomial', order=4).
These use the actual numerical values of the index.
* 'krogh', 'piecewise_polynomial', 'spline', 'pchip', 'akima':
Wrappers around the scipy interpolation methods of
similar names. These use the actual numerical values of the
index. For more information on their behavior, see the
`scipy documentation
<http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation>`__
and `tutorial documentation
<http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html>`__.
* 'from_derivatives': Refers to
``scipy.intrepolate.BPoly.from_derivatives`` which
replaces 'piecewise_polynomial' interpolation method in
scipy 0.18.
.. versionadded:: 0.18.1
Added support for the 'akima' method
Added interpolate method 'from_derivatives' which replaces
'piecewise_polynomial' in scipy 0.18; backwards-compatible with
scipy < 0.18
axis : {0, 1}, default 0
Axis to interpolate along.
* 0: Fill column-by-column.
* 1: Fill row-by-row.
limit : int, default None
Maximum number of consecutive NaNs to fill. Must be greater than 0.
inplace : bool, default False
Update the data in place if possible.
limit_direction : {'forward', 'backward', 'both'}, default 'forward'
If limit is specified, consecutive NaNs will be filled in this
direction.
limit_area : {'inside', 'outside'}, default None
If limit is specified, consecutive NaNs will be filled with this
restriction.
* None: No fill restriction (default).
* 'inside': Only fill NaNs surrounded by valid values
(interpolate).
* 'outside': Only fill NaNs outside valid values (extrapolate).
.. versionadded:: 0.21.0
downcast : optional, 'infer' or None, defaults to None
Downcast dtypes if possible.
kwargs
Keyword arguments to pass on to the interpolating function.
Returns
-------
Series or DataFrame
Same-shape object interpolated at the NaN values
See Also
--------
replace : replace a value
fillna : fill missing values
Examples
--------
Filling in NaNs in a Series via linear interpolation.
>>> ser = pd.Series([0, 1, np.nan, 3])
>>> ser.interpolate()
0 0.0
1 1.0
2 2.0
3 3.0
dtype: float64
Filling in NaNs in a Series by padding, but filling at most two
consecutive NaN at a time.
>>> ser = pd.Series([np.nan, "single_one", np.nan,
... "fill_two_more", np.nan, np.nan, np.nan,
... 4.71, np.nan])
>>> ser
0 NaN
1 single_one
2 NaN
3 fill_two_more
4 NaN
5 NaN
6 NaN
7 4.71
8 NaN
dtype: object
>>> ser.interpolate(method='pad', limit=2)
0 NaN
1 single_one
2 single_one
3 fill_two_more
4 fill_two_more
5 fill_two_more
6 NaN
7 4.71
8 4.71
dtype: object
Create a DataFrame with missing values.
>>> df = pd.DataFrame([[0,1,2,0,4],[1,2,3,-1,8],
... [2,3,4,-2,12],[3,4,5,-3,16]],
... columns=['a', 'b', 'c', 'd', 'e'])
>>> df
a b c d e
0 0 1 2 0 4
1 1 2 3 -1 8
2 2 3 4 -2 12
3 3 4 5 -3 16
>>> df.loc[3,'a'] = np.nan
>>> df.loc[0,'b'] = np.nan
>>> df.loc[1,'d'] = np.nan
>>> df.loc[2,'d'] = np.nan
>>> df.loc[1,'e'] = np.nan
>>> df
a b c d e
0 0.0 NaN 2 0.0 4.0
1 1.0 2.0 3 NaN NaN
2 2.0 3.0 4 NaN 12.0
3 NaN 4.0 5 -3.0 16.0
Fill the DataFrame forward (that is, going down) along each column.
Note how the last entry in column `a` is interpolated differently
(because there is no entry after it to use for interpolation).
Note how the first entry in column `b` remains NA (because there
is no entry befofe it to use for interpolation).
>>> df.interpolate(method='linear', limit_direction='forward', axis=0)
a b c d e
0 0.0 NaN 2 0.0 4.0
1 1.0 2.0 3 -1.0 8.0
2 2.0 3.0 4 -2.0 12.0
3 2.0 4.0 5 -3.0 16.0
################################################################################
################################## Validation ##################################
################################################################################
Errors found:
Errors in parameters section
Parameter "method" description should start with capital letter
Parameter "method" description should finish with "."
Parameter "limit_area" description should finish with "."
Parameter "kwargs" has no type
* 'linear': ignore the index and treat the values as equally |
* 'linear': Ignore the index and treat the values as equally |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally shouldn't need periods at the end of bullet points
* 'time': interpolation works on daily and higher resolution |
---|
data to interpolate given length of interval |
* 'index', 'values': use the actual numerical values of the index |
Default. |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need this
'polynomial', 'spline', 'piecewise_polynomial', |
---|
'from_derivatives', 'pchip', 'akima'} |
'polynomial', 'spline', 'piecewise_polynomial', 'pad', |
'from_derivatives', 'pchip', 'akima'}, default 'linear' |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't need the default designation at end (implied by linear
being the first value)
data to interpolate given length of interval |
---|
* 'index', 'values': use the actual numerical values of the index |
Default. |
* 'time': Interpolation works on daily and higher resolution |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Interpolation works...to interpolate" seems unnecessarily verbose. Perhaps just "Works on daily and higher resolution data"?
* 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', |
---|
'barycentric', 'polynomial' is passed to |
'barycentric', 'polynomial': Passed to |
``scipy.interpolate.interp1d``. Both 'polynomial' and 'spline' |
require that you also specify an `order` (int), |
e.g. df.interpolate(method='polynomial', order=4). |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems better served as a dedicated example than crammed into this
Examples |
-------- |
Filling in NaNs |
Filling in NaNs in a Series via linear interpolation. |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:class:`~pandas.Series`
1 1 |
---|
2 2 |
3 3 |
>>> ser = pd.Series([0, 1, np.nan, 3]) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convention here is s =
instead of ser =
>>> ser = pd.Series([np.nan, "single_one", np.nan, |
---|
... "fill_two_more", np.nan, np.nan, np.nan, |
... 4.71, np.nan]) |
>>> ser |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To save space you don't need to print the Series
here - should be straightforward based off the constructor directly above it
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not include the print for the very small series example (it was straightforward to see), but I'd like to keep this longer one if that's alright - it was encouraged so the differences can be spotted easier.
Create a DataFrame with missing values. |
>>> df = pd.DataFrame([[0,1,2,0,4],[1,2,3,-1,8], |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just construct with the missing values?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly so people can see the "expected" Interpolation (I tried to have a pattern column-wise) and they can compare it with what actually happens, e.g. with lin. Interpolation (especially if the last entry is an NA)
Fill the DataFrame forward (that is, going down) along each column. |
---|
Note how the last entry in column `a` is interpolated differently |
(because there is no entry after it to use for interpolation). |
Note how the first entry in column `b` remains NA (because there |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NaN
Lot's of comments but just wanted to say nice job! This is one of the tougher docstrings
I hope to finish these great suggestions some time today.
Thank you for the thorough review, @WillAyd
I made some changes.
I believe the errors below can be ignored, because they relate to known issues (**kwargs, .. versionadded::, etc.)
################################################################################
################### Docstring (pandas.DataFrame.interpolate) ###################
################################################################################
Interpolate values according to different methods.
Please note that only ``method='linear'`` is supported for
DataFrames/Series with a MultiIndex.
Parameters
----------
method : {'linear', 'time', 'index', 'values', 'nearest', 'zero',
'slinear', 'quadratic', 'cubic', 'barycentric', 'krogh',
'polynomial', 'spline', 'piecewise_polynomial', 'pad',
'from_derivatives', 'pchip', 'akima'}
Interpolation technique to use.
* 'linear': Ignore the index and treat the values as equally
spaced. This is the only method supported on MultiIndexes.
* 'time': Works on daily and higher resolution
data to interpolate given length of interval.
* 'index', 'values': use the actual numerical values of the index.
* 'pad': Fill in NaNs using existing values.
* 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'spline',
'barycentric', 'polynomial': Passed to
``scipy.interpolate.interp1d``. Both 'polynomial' and 'spline'
require that you also specify an `order` (int),
e.g. df.interpolate(method='polynomial', order=4).
These use the actual numerical values of the index.
* 'krogh', 'piecewise_polynomial', 'spline', 'pchip', 'akima':
Wrappers around the scipy interpolation methods of similar
names. See `Notes`.
* 'from_derivatives': Refers to
``scipy.interpolate.BPoly.from_derivatives`` which
replaces 'piecewise_polynomial' interpolation method in
scipy 0.18.
.. versionadded:: 0.18.1
Added support for the 'akima' method
Added interpolate method 'from_derivatives' which replaces
'piecewise_polynomial' in scipy 0.18; backwards-compatible with
scipy < 0.18
axis : {0 or 'index', 1 or 'columns', None}, default None
Axis to interpolate along.
limit : int, optional
Maximum number of consecutive NaNs to fill. Must be greater than
0.
inplace : bool, default False
Update the data in place if possible.
limit_direction : {'forward', 'backward', 'both'}, default 'forward'
If limit is specified, consecutive NaNs will be filled in this
direction.
limit_area : {`None`, 'inside', 'outside'}
If limit is specified, consecutive NaNs will be filled with this
restriction.
* None: No fill restriction (default).
* 'inside': Only fill NaNs surrounded by valid values
(interpolate).
* 'outside': Only fill NaNs outside valid values (extrapolate).
.. versionadded:: 0.21.0
downcast : optional, 'infer' or None, defaults to None
Downcast dtypes if possible.
**kwargs
Keyword arguments to pass on to the interpolating function.
Returns
-------
Series or DataFrame
Same-shape object interpolated at the NaN values
See Also
--------
replace : replace a value
fillna : fill missing values
scipy.interpolate.Akima1DInterpolator : piecewise cubic polynomials
(Akima interpolator)
scipy.interpolate.BPoly.from_derivatives : piecewise polynomial in the
Bernstein basis
scipy.interpolate.interp1d : interpolate a 1-D function
scipy.interpolate.KroghInterpolator : interpolate polynomial (Krogh
interpolator)
scipy.interpolate.PchipInterpolator : PCHIP 1-d monotonic cubic
interpolation
scipy.interpolate.CubicSpline : cubic spline data interpolator
Notes
-----
If the selected `method` is one of 'krogh', 'piecewise_polynomial',
'spline', 'pchip', 'akima':
They are wrappers around the scipy interpolation methods of similar
names. These use the actual numerical values of the index.
For more information on their behavior, see the
`scipy documentation
<http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation>`__
and `tutorial documentation
<http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html>`__.
Examples
--------
Filling in `NaN` in a :class:`~pandas.Series` via linear
interpolation.
>>> s = pd.Series([0, 1, np.nan, 3])
>>> s.interpolate()
0 0.0
1 1.0
2 2.0
3 3.0
dtype: float64
Filling in `NaN` in a Series by padding, but filling at most two
consecutive `NaN` at a time.
>>> s = pd.Series([np.nan, "single_one", np.nan,
... "fill_two_more", np.nan, np.nan, np.nan,
... 4.71, np.nan])
>>> s
0 NaN
1 single_one
2 NaN
3 fill_two_more
4 NaN
5 NaN
6 NaN
7 4.71
8 NaN
dtype: object
>>> s.interpolate(method='pad', limit=2)
0 NaN
1 single_one
2 single_one
3 fill_two_more
4 fill_two_more
5 fill_two_more
6 NaN
7 4.71
8 4.71
dtype: object
Filling in `NaN` in a Series via polynomial interpolation or splines:
Both `polynomial` and `spline` methods require that you also specify
an `order` (int).
>>> s = pd.Series([0, 2, np.nan, 8])
>>> s.interpolate(method='polynomial', order=1)
0 0.0
1 2.0
2 5.0
3 8.0
dtype: float64
>>> s.interpolate(method='polynomial', order=2)
0 0.000000
1 2.000000
2 4.666667
3 8.000000
dtype: float64
Create a :class:`~pandas.DataFrame` with missing values to fill it
with diffferent methods.
>>> df = pd.DataFrame([[0,1,2,0,4],[1,2,3,-1,8],
... [2,3,4,-2,12],[3,4,5,-3,16]],
... columns=['a', 'b', 'c', 'd', 'e'])
>>> df
a b c d e
0 0 1 2 0 4
1 1 2 3 -1 8
2 2 3 4 -2 12
3 3 4 5 -3 16
>>> df.loc[1,'a'] = np.nan
>>> df.loc[3,'a'] = np.nan
>>> df.loc[0,'b'] = np.nan
>>> df.loc[1,'d'] = np.nan
>>> df.loc[2,'d'] = np.nan
>>> df.loc[1,'e'] = np.nan
>>> df
a b c d e
0 0.0 NaN 2 0.0 4.0
1 NaN 2.0 3 NaN NaN
2 2.0 3.0 4 NaN 12.0
3 NaN 4.0 5 -3.0 16.0
Fill the DataFrame forward (that is, going down) along each column.
Note how the last entry in column `a` is interpolated differently
(because there is no entry after it to use for interpolation).
Note how the first entry in column `b` remains `NaN` (because there
is no entry befofe it to use for interpolation).
>>> df.interpolate(method='linear', limit_direction='forward', axis=0)
a b c d e
0 0.0 NaN 2 0.0 4.0
1 1.0 2.0 3 -1.0 8.0
2 2.0 3.0 4 -2.0 12.0
3 2.0 4.0 5 -3.0 16.0
################################################################################
################################## Validation ##################################
################################################################################
Errors found:
Errors in parameters section
Parameters {'kwargs'} not documented
Unknown parameters {'**kwargs'}
Parameter "method" description should start with capital letter
Parameter "method" description should finish with "."
Parameter "limit_area" description should finish with "."
Parameter "**kwargs" has no @type
* 'linear': ignore the index and treat the values as equally |
* 'linear': Ignore the index and treat the values as equally |
spaced. This is the only method supported on MultiIndexes. |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand why you added these, but generally do not put punctuation at the end of bullet points. If you get an error as a result OK to ignore
* 'index', 'values': use the actual numerical values of the index. |
---|
* 'pad': Fill in NaNs using existing values. |
* 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'spline', |
'barycentric', 'polynomial': Passed to |
``scipy.interpolate.interp1d``. Both 'polynomial' and 'spline' |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do the same thing here you did for 'krogh' and move some of the implementation details down to the Notes section
http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html`__ |
---|
* 'from_derivatives' refers to BPoly.from_derivatives which |
* 'krogh', 'piecewise_polynomial', 'spline', 'pchip', 'akima': |
Wrappers around the scipy interpolation methods of similar |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use SciPy instead of scipy when referring to the package outside of code (couple other places this pops up)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
If limit is specified, consecutive NaNs will be filled in this |
---|
direction. |
inplace : bool, default False |
Update the NDFrame in place if possible. |
limit_area : {`None`, 'inside', 'outside'} |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add ", default None" to the end here and remove the comment about it being the default below
* None: No fill restriction (default). |
---|
* 'inside': Only fill NaNs surrounded by valid values |
(interpolate). |
* 'outside': Only fill NaNs outside valid values (extrapolate). |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to add an example for 'outside'
If the selected `method` is one of 'krogh', 'piecewise_polynomial', |
---|
'spline', 'pchip', 'akima': |
They are wrappers around the scipy interpolation methods of similar |
names. These use the actual numerical values of the index. |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "These use the actual numerical values of the index" mean?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"These use the actual numerical values of the index." Better grammar?
3 8.000000 |
---|
dtype: float64 |
Create a :class:`~pandas.DataFrame` with missing values to fill it |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are explaining here what the below code is going to do, but not really saying what it's important. Would be better worded as "Interpolation can also be applied to DataFrames" or something to the effect
>>> df = pd.DataFrame([[0,1,2,0,4],[1,2,3,-1,8], |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought I had this comment before but just use the NA values in your constructor - no reason to instantiate the DataFrame with values and then assign them missing values after the fact.
Also make sure you put a space after every comma
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it so one can see how the columns get created - and we have linear values in 3 columns and quadratic on the 4th.
Fill the DataFrame forward (that is, going down) along each column. |
Note how the last entry in column `a` is interpolated differently |
(because there is no entry after it to use for interpolation). |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need the parentheses here (nor on the next line)
Returns |
------- |
Series or DataFrame of same shape interpolated at the NaNs |
Series or DataFrame |
Same-shape object interpolated at the NaN values |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the description here say "Returns the same object type as the caller" - that wording has been used by a few other PRs so just want to be consistent
Hello @math-and-data! Thanks for updating the PR.
Cheers ! There are no PEP8 issues in this Pull Request. 🍻
Comment last updated on August 19, 2018 at 00:25 Hours UTC
- docstring validation issues below, should be known kwarg and .
################################################################################
################### Docstring (pandas.DataFrame.interpolate) ###################
################################################################################
Interpolate values according to different methods.
Please note that only ``method='linear'`` is supported for
DataFrames/Series with a MultiIndex.
Parameters
----------
method : {'linear', 'time', 'index', 'values', 'nearest', 'zero',
'slinear', 'quadratic', 'cubic', 'barycentric', 'krogh',
'polynomial', 'spline', 'piecewise_polynomial', 'pad',
'from_derivatives', 'pchip', 'akima'}
Interpolation technique to use.
* 'linear': Ignore the index and treat the values as equally
spaced. This is the only method supported on MultiIndexes.
* 'time': Works on daily and higher resolution
data to interpolate given length of interval.
* 'index', 'values': use the actual numerical values of the index.
* 'pad': Fill in NaNs using existing values.
* 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'spline',
'barycentric', 'polynomial': Passed to
``scipy.interpolate.interp1d``. Both 'polynomial' and 'spline'
require that you also specify an `order` (int),
e.g. df.interpolate(method='polynomial', order=4).
These use the numerical values of the index.
* 'krogh', 'piecewise_polynomial', 'spline', 'pchip', 'akima':
Wrappers around the SciPy interpolation methods of similar
names. See `Notes`.
* 'from_derivatives': Refers to
``scipy.interpolate.BPoly.from_derivatives`` which
replaces 'piecewise_polynomial' interpolation method in
scipy 0.18.
.. versionadded:: 0.18.1
Added support for the 'akima' method
Added interpolate method 'from_derivatives' which replaces
'piecewise_polynomial' in SciPy 0.18; backwards-compatible with
SciPy < 0.18
axis : {0 or 'index', 1 or 'columns', None}, default None
Axis to interpolate along.
limit : int, optional
Maximum number of consecutive NaNs to fill. Must be greater than
0.
inplace : bool, default False
Update the data in place if possible.
limit_direction : {'forward', 'backward', 'both'}, default 'forward'
If limit is specified, consecutive NaNs will be filled in this
direction.
limit_area : {`None`, 'inside', 'outside'}
If limit is specified, consecutive NaNs will be filled with this
restriction.
* None: No fill restriction.
* 'inside': Only fill NaNs surrounded by valid values
(interpolate).
* 'outside': Only fill NaNs outside valid values (extrapolate).
.. versionadded:: 0.21.0
downcast : optional, 'infer' or None, defaults to None
Downcast dtypes if possible.
**kwargs
Keyword arguments to pass on to the interpolating function.
Returns
-------
Series or DataFrame
Returns the same object type as the caller, interpolated at
some or all `NaN` values
See Also
--------
replace : replace a value
fillna : fill missing values
scipy.interpolate.Akima1DInterpolator : piecewise cubic polynomials
(Akima interpolator)
scipy.interpolate.BPoly.from_derivatives : piecewise polynomial in the
Bernstein basis
scipy.interpolate.interp1d : interpolate a 1-D function
scipy.interpolate.KroghInterpolator : interpolate polynomial (Krogh
interpolator)
scipy.interpolate.PchipInterpolator : PCHIP 1-d monotonic cubic
interpolation
scipy.interpolate.CubicSpline : cubic spline data interpolator
Notes
-----
The 'krogh', 'piecewise_polynomial', 'spline', 'pchip' and 'akima'
methods are wrappers around the respective SciPy implementations of
similar names. These use the actual numerical values of the index.
For more information on their behavior, see the
`SciPy documentation
<http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation>`__
and `SciPy tutorial
<http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html>`__.
Examples
--------
Filling in `NaN` in a :class:`~pandas.Series` via linear
interpolation.
>>> s = pd.Series([0, 1, np.nan, 3])
>>> s.interpolate()
0 0.0
1 1.0
2 2.0
3 3.0
dtype: float64
Filling in `NaN` in a Series by padding, but filling at most two
consecutive `NaN` at a time.
>>> s = pd.Series([np.nan, "single_one", np.nan,
... "fill_two_more", np.nan, np.nan, np.nan,
... 4.71, np.nan])
>>> s
0 NaN
1 single_one
2 NaN
3 fill_two_more
4 NaN
5 NaN
6 NaN
7 4.71
8 NaN
dtype: object
>>> s.interpolate(method='pad', limit=2)
0 NaN
1 single_one
2 single_one
3 fill_two_more
4 fill_two_more
5 fill_two_more
6 NaN
7 4.71
8 4.71
dtype: object
Filling in `NaN` in a Series via polynomial interpolation or splines:
Both `polynomial` and `spline` methods require that you also specify
an `order` (int).
>>> s = pd.Series([0, 2, np.nan, 8])
>>> s.interpolate(method='polynomial', order=2)
0 0.000000
1 2.000000
2 4.666667
3 8.000000
dtype: float64
Filling in `NaN` in a :class:`~pandas.DataFrame` via linear
interpolation.
>>> df = pd.DataFrame({'a': range(0,4),
... 'b': range(1,5),
... 'c': range(-1, -5, -1),
... 'd': [x**2 for x in range(1,5)]})
>>> df
a b c d
0 0 1 -1 1
1 1 2 -2 4
2 2 3 -3 9
3 3 4 -4 16
>>> df.loc[1,'a'] = np.nan
>>> df.loc[3,'a'] = np.nan
>>> df.loc[0,'b'] = np.nan
>>> df.loc[1,'c'] = np.nan
>>> df.loc[2,'c'] = np.nan
>>> df.loc[1,'d'] = np.nan
>>> df
a b c d
0 0.0 NaN -1.0 1.0
1 NaN 2.0 NaN NaN
2 2.0 3.0 NaN 9.0
3 NaN 4.0 -4.0 16.0
Fill the DataFrame forward (that is, going down) along each column.
Note how the last entry in column `a` is interpolated differently,
because there is no entry after it to use for interpolation.
Note how the first entry in column `b` remains `NaN`, because there
is no entry befofe it to use for interpolation.
>>> df.interpolate(method='linear', limit_direction='forward', axis=0)
a b c d
0 0.0 NaN -1.0 1.0
1 1.0 2.0 -2.0 5.0
2 2.0 3.0 -3.0 9.0
3 2.0 4.0 -4.0 16.0
>>> df['d'].interpolate(method='polynomial', order=2)
0 1.0
1 4.0
2 9.0
3 16.0
Name: d, dtype: float64
################################################################################
################################## Validation ##################################
################################################################################
Errors found:
Errors in parameters section
Parameters {'kwargs'} not documented
Unknown parameters {'**kwargs'}
Parameter "method" description should start with capital letter
Parameter "method" description should finish with "."
Parameter "limit_area" description should finish with "."
Parameter "**kwargs" has no type
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd if you don't mind reviewing this one too. Made few minor changes like pep8 of the doctest, the type of method couldn't be all options, as sphinx do not let parameter types be multiline and things like this.
Thanks for the docstring @math-and-data, really good work. And sorry for the long wait.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC needs some fixes around backtick usage
* 'pad': Fill in NaNs using existing values. |
---|
* 'nearest', 'zero', 'slinear', 'quadratic', 'cubic', 'spline', |
'barycentric', 'polynomial': Passed to |
``scipy.interpolate.interp1d``. Both 'polynomial' and 'spline' |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should just be single backticks no?
Wrappers around the SciPy interpolation methods of similar |
---|
names. See `Notes`. |
* 'from_derivatives': Refers to |
``scipy.interpolate.BPoly.from_derivatives`` which |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single backtick too?
If limit is specified, consecutive NaNs will be filled with this |
---|
restriction. |
* None: No fill restriction. |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe double backticks here to render literal value?
Series or DataFrame of same shape interpolated at the NaNs |
---|
Series or DataFrame |
Returns the same object type as the caller, interpolated at |
some or all `NaN` values |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double ticks
Examples |
-------- |
Filling in NaNs |
Filling in `NaN` in a :class:`~pandas.Series` via linear |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double
dtype: float64 |
---|
Filling in `NaN` in a Series by padding, but filling at most two |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double ticks (here and next line)
8 4.71 |
---|
dtype: object |
Filling in `NaN` in a Series via polynomial interpolation or splines: |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double backtick
Note how the last entry in column `a` is interpolated differently, |
because there is no entry after it to use for interpolation. |
Note how the first entry in column `b` remains `NaN`, because there |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double backticks
Yep, agree. I think they should be all right now. Thanks @WillAyd !
an `order` (int). |
---|
Filling in ``NaN`` in a Series via polynomial interpolation or splines: |
Both 'polynomial' and 'spline' methods require that you also specify |
an ``order`` (int). |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameters should be single backticks - double is only for literals and code samples I think
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that double backticks is for code, including parts like a single variable None
, an assignment foo=1
... Single backticks is for things that you can refer (link) to, like a function, class, module... And for values just quotes.
For an argument, I'd consider it more code, that something you can link to. That's why I added double backticks. But it's very subtle, I'd be happy with any option (no quoting, single backticks, double backticks and quotes).
Does this make sense?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep thanks. I think I've seen other instances where parameters are in single backticks but this is nuanced enough that it shouldn't hold up the PR - can be part of a larger conversation.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a bullet point to #20298 to decide a standard for these cases. I think at the moment there is not much consistency.
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request
interpolator). |
---|
scipy.interpolate.PchipInterpolator : PCHIP 1-d monotonic cubic |
interpolation. |
scipy.interpolate.CubicSpline : Cubic spline data interpolator. |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this method is referenced here, I expected it to be available just like any of the other ones, but I have combed the source code and I am unable to find the place where this method is used. Was it added because it was planned to add future support for CubicSpline (with a wrapper such as Akima's)?
khyox mentioned this pull request
5 tasks