BUG: resample fixes by jreback · Pull Request #12449 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation27 Commits1 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

make sure .resample(...).plot() warns and returns a correct plotting object
make sure that .groupby(...).resample(....) is hitting warnings when appropriate

closes #12448

jorisvandenbossche

def plot(self, args, *kwargs):
# for compat with prior versions, we want to
# have the warnings shown here and just have this work
return _maybe_process_deprecations(self, how='mean').plot(*args,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small remark, by passing how='mean' here, the user gets the message "FutureWarning: how in .resample() is deprecated", while he did not use 'how' (in eg s.resample('15min').plot())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, he DID use mean implicity. Oh you want a more meaningful warning? ok sure

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that is what I mean (a more meaningful warning). As this is now rather confusing to the user IMO, as the warning says something about how while the user has not explicitely used it (with other operations, you get a "resample is now a deferred operation, please use df.resample().mean() instead message, which is more appropriate here)

It is possible that the previous workaround with warning ("resample is now a deferred ..") for accessing elements does not work anymore?

In [13]: s.resample('15min').mean()[0]
Out[13]: -0.27773169409289944

In [14]: s.resample('15min')[0]
KeyError: 'Column not found: 0'

I thought this last one worked (gave the same result), but warned?

For case 3, I still not get the same as in 0.17.1. With this PR:

In [19]: df.groupby('group').resample('1D', fill_method='ffill')
C:\Anaconda\envs\devel\Scripts\ipython-script.py:1: FutureWarning: fill_method i
s deprecated to .resample()
the new syntax is .resample(...).ffill()
  if __name__ == '__main__':
Out[19]:
            val
date
2016-01-03    5
2016-01-10    6
2016-01-17    7
2016-01-24    8

So the actual grouper has disappeared

You get the same KeyError in 0.17.1 when you are using a DataFrame

In [5]: s.to_frame('foo').resample('15T')[0]

I will fix the Series behavior

@jorisvandenbossche

see #12486

I can't fix this now. I think its possible, but will require some work. This is NOT the same as a multi-groupby of a column / freq, because resampling does some auto-filling. To be honest this is prob a bit to magical in the past.

The error with the Series is still there (this is not necessarily related to the changes in this PR though):

In [21]: s.resample('2s').mean()[0]
Out[21]: 0.22864360896621477

In [22]: s.resample('2s')[0]
KeyError: 'Column not found: 0'

I think this should give something like iloc does?:

In [23]: s.resample('2s').iloc[0]
ValueError: .resample() is now a deferred operation
        use .resample(...).mean() instead of .resample(...)
        assignment will have no effect as you are working on a copy

BTW, the "assignment will have no effect as you are working on a copy" in that error message is not really clear to me. It's not that it has no effect, as it raises an error.

On the PR itself, plot change and whatsnew update looks good!

Something else with getitem:

In [8]: rs['2010-01-01 09:00:00']

AbstractMethodError: This method must be defined in the concrete class of Series
GroupBy

But these getitem things are not really related to this PR, so I can open another issue for that, and then this PR can be merged

Are you sure you did with this PR? (for the getitem issue). This is explicity fixed/mentioned.

In [5]: s.resample('2s').mean()[0]
Out[5]: -0.0026492975424703812

In [6]: s.resample('2s')[0]
Out[6]: -0.0026492975424703812

took out the assignment part of the message

ok, both those errors fixed (and now they show the deprecation warning as well).

ping if ok and i'll merge (I need to squash)

make sure .resample(...).plot() warns and returns a correct plotting object make sure that .groupby(...).resample(....) is hitting warnings when appropriate

closes pandas-dev#12448

Hi,

I have a FutureWarning from pandas 0.18.0 related to a resample operation which seems inappropriate:

import pandas as pd import datetime as dt

df=pd.DataFrame(data=[1,3], index=[dt.timedelta(), dt.timedelta(minutes=3)]) df.resample('1T').interpolate(method='linear')

The result I get from the last line is correct but I also get the following warning:

FutureWarning: .resample() is now a deferred operation use .resample(...).mean() instead of .resample(...)

What is wrong with my syntax?
Thanks

@benoit9126 No, I think this warning is correct, and I think what you want is:

df.resample('1T').mean().interpolate(method='linear')

interpolate is not a method that is available on a Resample object. You first have to indicate how you want to resample the values (in this case using mean does work because you downsample).

But that aside, it does maybe make sense to have interpolate available, as this can work similarly to to resample().fillna()

@benoit9126 as @jorisvandenbossche points out, in < 0.18.0 there was an implicit how=mean being done when you said df.resample('1T) w/o any how= arg.

@jorisvandenbossche yes, having a .interpolate method would be fine (and very straightforward to do actually). if you can open an issue with a nice example would be great.

@jreback Something else I just noticed, seems like a bug in asfreq, but didn't yet test with master:

In [16]: df=pd.DataFrame(data=[1,3], index=[dt.timedelta(), dt.timedelta(minutes
=3)])
In [17]: df
Out[17]:
          0
00:00:00  1
00:03:00  3

In [18]: df.resample('1T').mean()
Out[18]:
            0
00:00:00  1.0
00:01:00  NaN
00:02:00  NaN
00:03:00  3.0

In [19]: df.resample('1T').asfreq()
Out[19]:
            0
00:00:00  1.0
00:01:00  NaN
00:02:00  NaN

Missing the the last value in asfreq?

@jorisvandenbossche hmm, that last does look like a bug, can you create an issue and i'll take a look. ty.

Thanks a lot for the explanation.
Nevertheless, I keep my code as it is hoping that it won't throw a warning in pandas 0.18.2 😉

@benoit9126 To be clear, there is no guarantee that #12925 will be implemented by 0.19, so it is possible that this code will start raising an error. Your responsibility :-) (or always welcome to put up a PR! I don't think it will be that hard to implement)

@jorisvandenbossche To be honest I do not have enough time to implement this feature in a close future (especially because I have never carefully inspected the pandas core). If it changes, it will be a pleasure.