DEPR/CLN: Remove pd.rolling_*, pd.expanding* and pd.ewm* by topper-123 · Pull Request #18723 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation52 Commits3 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

topper-123

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are removing a lot of tests that should remain

@@ -216,7 +216,9 @@ Removal of prior version deprecations/changes
- The ``pandas.io.wb`` and ``pandas.io.data`` stub modules have been removed (:issue:`13735`)
- ``Categorical.from_array`` has been removed (:issue:`13854`)
- The ``freq`` and ``how`` parameters have been removed from the ``rolling``/``expanding``/``ewm`` methods of DataFrame
and Series (deprecated since v0.18). Instead, resample before calling the methods. (:issue:18601 & :issue:18668)
and Series (deprecated since v0.18). Instead, resample before calling the methods. (:issue:`18601`)
- The top-level functions ``pd.rolling_*``, ``pd.expanding_*`` and ``pd.ewm*`` have been removed (deprecated since v0.18).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don’t remove any issue numbers

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this and some other stuff that’s unrelated to the pd.rolling stuff seems to be a rebase issue. I’ll fix it tomorrow.

If you want, you can limit the review to only the latest commit, the other commits are because of the rebase issue.

@@ -331,5 +333,5 @@ Categorical
Other
^^^^^
- Improved error message when attempting to use a Python keyword as an identifier in a ``numexpr`` backed query (:issue:`18221`)
- Improved error message when attempting to use a Python keyword as an identifier in a numexpr query (:issue:`18221`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don’t change other things

12.952, np.nan, np.nan])
with catch_warnings(record=True):
rs = mom.rolling_window(vals, 5, 'boxcar', center=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don’t remove tests just because we are catching the warning

you may need to rewrite them to use the new syntax
unless they duplicate current tests

@codecov

@codecov

Codecov Report

Merging #18723 into master will increase coverage by 0.07%.
The diff coverage is n/a.

Impacted file tree graph

@@ Coverage Diff @@ ## master #18723 +/- ##

Flag Coverage Δ
#multiple 90.06% <ø> (+0.07%) ⬆️
#single 41.71% <ø> (-0.03%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ca4ae4f...1443818. Read the comment docs.

@topper-123

The rebase issue has been solved now. I'll look at your comments now,. If you've got additional comments, I'd be thakful, and I'll look at them

@pep8speaks

Hello @topper-123! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on February 01, 2018 at 06:53 Hours UTC

@topper-123

Hi @jreback. An issue is that dask apparently still uses pd.rolling_*. This means that pandas/tests/test_downstream.py currently fails. Should this PR be put on hold, or would notifying the dask guys about this removal be enough? This stuff has been deprecated since 0.18, so for quite a long time.

This PR doesn't touch too many files, so it would be ok to wait a bit, but given the simplification to the pandas API that this PR offers, it should IMO be included when 0.22 releases.

I've updated the PR, so be'd grateful for a review, so it'll be ready for an eventual pull.

@jreback

@TomAugspurger

The dask versions are deprecated, just like pandas. I believe the best solution would be to

  1. Use this PR to update all the test to the new syntax
  2. Update dask to remove the dd.rolling, etc. methods (@topper-123 could I convince you to work on that? :) It'd be nice to have you as a contributor if you haven't already.)
  3. Remove them from pandas once dask has a release

Depending on the timing of pandas 0.22, dask may not have a release out.

@topper-123

@TomAugspurger , the test_dask function doesn't explicitly call pd.rolling_*, so changes must be made in the dask code base, correct?

I've never used dask, and have a feeling I'd have to learn some intricancies of that lib to make such a PR, and I could easily go down a rabbit's hole that I never intended to enter... Could someone familiar with dask not take it upon themselves to do those changes? I can sit on this PR until dask is updated, and rebase when this is ready to merge.

TomAugspurger added a commit to TomAugspurger/dask that referenced this pull request

Dec 14, 2017

@TomAugspurger

@TomAugspurger

@topper-123 dask was importing them at the top level if dask.dataframe, which caused the failure. Next time dask wants to mirror a pandas deprecation, we'll want to put it in a try / except block.

dask/dask#2995

@topper-123

@TomAugspurger , thanks for picking up on this, I will wait with my PR a while, until dask is updated.

My PR lowers the number of top-level functions of pandas by some 30, so it would IMO be very beneficial to pandas' users to have this merged by 0.22, as it would make pandas look much more focused. Could dask have a release before pandas 0.22, eventually just a minor release?

jcrist pushed a commit to dask/dask that referenced this pull request

Dec 21, 2017

@TomAugspurger @jcrist

@topper-123

dask 0.16.1 has now been released, and all tests now pass locally with dask version 0.16.1. The travis failure has to do with IO/S3 which I didn't touch. That's the only failure in travis.

So, I think everything is ok with my PR. Is there anything I can do to make this pass that travis error, or is this travis failure unrelated to my PR?

jorisvandenbossche

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about one of the tests

assert result2.dtype == np.float_
def _check_ew_structures(self, func, name):
def _check_ew(self, name=None):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this function consisted of _check_ew_ndarray and _check_ew_structures subfunctions to test both interfaces. You removed _check_ew_ndarray and only kept the content of _check_ew_structures as this is testing the new interface.
But, as far as I can see, the actual testing was done in _check_ew_ndarray (_check_ew_structures is only testing the output class type). Shouldn't this testing be kept? (changed to the new interface)

(might good be I am missing something, or that it is covered in other tests)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right: So rather than test this function call, test the ewm method call on a series. I'll look into it tomorrow.

@topper-123

@jorisvandenbossche , I've now replicated the code from _check_ew_ndarray into check_ew. Could you review it?

@jorisvandenbossche

@topper-123 that specific one looks good now I think.
Did you double check the other tests to see there were not similar problems?

@topper-123

I've added test_expanding_func (parametrized, was called test_parametrized_max previously) and test_expanding_apply. Everything is now included AFAICT.

@topper-123

The tests have been running for 9 hours now and the Travis-CI tests have not yet finished. Same thing happened yesterday where after 20 hours one travis test-suite failed without console output (my guess is it timed out). Is anything wrong with the travis suite?

Everything passes locally and the errors that travis reported two days ago have been fixed (but the fixes not confirmed by travis, because I can't get travis to go through them...).

What should I do? Could a core dev run the travis tests locally and confirm they're ok, and merge them?

@TomAugspurger

@topper-123

Ok, thanks. I'll wait a bit more then.

@topper-123

@jreback

@topper-123

jorisvandenbossche

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry @topper-123, one more question
(it's a very complex file / diff (not your fault!) to review)

result = get_result(arr, 50, min_periods=30)
tm.assert_almost_equal(result[-1], static_comp(arr[10:-10]))
# min_periods is working correctly

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to move parts of _check_ndarray like this part on checking min_periods to _check_moments_func ?

From a superficial look, it seems that _check_ndarray did more than _check_structures (eg in _check_structures I don't see any assertion of NaNs when min_periods is specified like here)

I also don't know if test_stable and test_window make sense for the new interface

@topper-123

jreback

r = obj.rolling(window=window, min_periods=min_periods,
center=center)
return getattr(r, name)(**kwargs)
self._check_series(name, static_comp=static_comp,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure of the point of adding _check_series, it seems that you are already checking the series & frame results (line 1344), yes?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there IS a difference, IOW you are checking something different, then certainly don't make a new function, just in-line it here.

tp added 2 commits

January 31, 2018 22:36

@topper-123

Hi @jreback. I uploaded a new PR 12 days ago.

There were a few issues that were checked in _check_ndarray but were not in _check_moments previously:

All of the above are inlined in _check_moments now (see 37abd25).

I've rebased a moments ago, without making code changes otherwise, so everything should turn up green & rebased when the tests are finished.

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine. just that 1 question.

@@ -863,72 +848,55 @@ def test_centered_axis_validation(self):
.rolling(window=3, center=True, axis=2).mean())
def test_rolling_sum(self):
self._check_moment_func(mom.rolling_sum, np.nansum, name='sum',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, shouldn't this be nansum?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. fixed.

@topper-123

jorisvandenbossche

@jorisvandenbossche

harisbal pushed a commit to harisbal/pandas that referenced this pull request

Feb 28, 2018

@topper-123

…18723)