BUG: resample fixes by jreback · Pull Request #12449 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation27 Commits1 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
make sure .resample(...).plot() warns and returns a correct plotting object
make sure that .groupby(...).resample(....) is hitting warnings when appropriate
closes #12448
def plot(self, *args, **kwargs): |
---|
# for compat with prior versions, we want to |
# have the warnings shown here and just have this work |
return _maybe_process_deprecations(self, how='mean').plot(*args, |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small remark, by passing how='mean'
here, the user gets the message "FutureWarning: how in .resample() is deprecated", while he did not use 'how' (in eg s.resample('15min').plot()
)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, he DID use mean
implicity. Oh you want a more meaningful warning? ok sure
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that is what I mean (a more meaningful warning). As this is now rather confusing to the user IMO, as the warning says something about how
while the user has not explicitely used it (with other operations, you get a "resample is now a deferred operation, please use df.resample().mean() instead
message, which is more appropriate here)
It is possible that the previous workaround with warning ("resample is now a deferred ..") for accessing elements does not work anymore?
In [13]: s.resample('15min').mean()[0]
Out[13]: -0.27773169409289944
In [14]: s.resample('15min')[0]
KeyError: 'Column not found: 0'
I thought this last one worked (gave the same result), but warned?
For case 3, I still not get the same as in 0.17.1. With this PR:
In [19]: df.groupby('group').resample('1D', fill_method='ffill')
C:\Anaconda\envs\devel\Scripts\ipython-script.py:1: FutureWarning: fill_method i
s deprecated to .resample()
the new syntax is .resample(...).ffill()
if __name__ == '__main__':
Out[19]:
val
date
2016-01-03 5
2016-01-10 6
2016-01-17 7
2016-01-24 8
So the actual grouper has disappeared
You get the same KeyError
in 0.17.1 when you are using a DataFrame
In [5]: s.to_frame('foo').resample('15T')[0]
I will fix the Series
behavior
see #12486
I can't fix this now. I think its possible, but will require some work. This is NOT the same as a multi-groupby of a column / freq, because resampling does some auto-filling. To be honest this is prob a bit to magical in the past.
The error with the Series is still there (this is not necessarily related to the changes in this PR though):
In [21]: s.resample('2s').mean()[0]
Out[21]: 0.22864360896621477
In [22]: s.resample('2s')[0]
KeyError: 'Column not found: 0'
I think this should give something like iloc does?:
In [23]: s.resample('2s').iloc[0]
ValueError: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)
assignment will have no effect as you are working on a copy
BTW, the "assignment will have no effect as you are working on a copy" in that error message is not really clear to me. It's not that it has no effect, as it raises an error.
On the PR itself, plot change and whatsnew update looks good!
Something else with getitem:
In [8]: rs['2010-01-01 09:00:00']
AbstractMethodError: This method must be defined in the concrete class of Series
GroupBy
But these getitem things are not really related to this PR, so I can open another issue for that, and then this PR can be merged
Are you sure you did with this PR? (for the getitem issue). This is explicity fixed/mentioned.
In [5]: s.resample('2s').mean()[0]
Out[5]: -0.0026492975424703812
In [6]: s.resample('2s')[0]
Out[6]: -0.0026492975424703812
took out the assignment part of the message
ok, both those errors fixed (and now they show the deprecation warning as well).
ping if ok and i'll merge (I need to squash)
make sure .resample(...).plot() warns and returns a correct plotting object make sure that .groupby(...).resample(....) is hitting warnings when appropriate
closes pandas-dev#12448
Hi,
I have a FutureWarning from pandas 0.18.0 related to a resample
operation which seems inappropriate:
import pandas as pd import datetime as dt
df=pd.DataFrame(data=[1,3], index=[dt.timedelta(), dt.timedelta(minutes=3)]) df.resample('1T').interpolate(method='linear')
The result I get from the last line is correct but I also get the following warning:
FutureWarning: .resample() is now a deferred operation use .resample(...).mean() instead of .resample(...)
What is wrong with my syntax?
Thanks
@benoit9126 No, I think this warning is correct, and I think what you want is:
df.resample('1T').mean().interpolate(method='linear')
interpolate
is not a method that is available on a Resample object. You first have to indicate how you want to resample the values (in this case using mean
does work because you downsample).
But that aside, it does maybe make sense to have interpolate
available, as this can work similarly to to resample().fillna()
@benoit9126 as @jorisvandenbossche points out, in < 0.18.0 there was an implicit how=mean
being done when you said df.resample('1T)
w/o any how=
arg.
@jorisvandenbossche yes, having a .interpolate
method would be fine (and very straightforward to do actually). if you can open an issue with a nice example would be great.
@jreback Something else I just noticed, seems like a bug in asfreq
, but didn't yet test with master:
In [16]: df=pd.DataFrame(data=[1,3], index=[dt.timedelta(), dt.timedelta(minutes
=3)])
In [17]: df
Out[17]:
0
00:00:00 1
00:03:00 3
In [18]: df.resample('1T').mean()
Out[18]:
0
00:00:00 1.0
00:01:00 NaN
00:02:00 NaN
00:03:00 3.0
In [19]: df.resample('1T').asfreq()
Out[19]:
0
00:00:00 1.0
00:01:00 NaN
00:02:00 NaN
Missing the the last value in asfreq?
@jorisvandenbossche hmm, that last does look like a bug, can you create an issue and i'll take a look. ty.
Thanks a lot for the explanation.
Nevertheless, I keep my code as it is hoping that it won't throw a warning in pandas 0.18.2 😉
@benoit9126 To be clear, there is no guarantee that #12925 will be implemented by 0.19, so it is possible that this code will start raising an error. Your responsibility :-) (or always welcome to put up a PR! I don't think it will be that hard to implement)
@jorisvandenbossche To be honest I do not have enough time to implement this feature in a close future (especially because I have never carefully inspected the pandas core). If it changes, it will be a pleasure.