sparse resampling not working with dictionary of columns? · Issue #15386 · pandas-dev/pandas (original) (raw)
Hello there,
Have I said that Pandas is awesome? yes, many times ;-)
I have a question, I am working with a very large dataframe of trades, timestamped at the millisecond precision. Latest Pandas 19.2 here.
I need to resample the dataframe every 200 ms
, but given that my data spans several years and I am only interested in resampling data between 10:00 am
and 12:00 am
every day (handled by between_time()
), using a plain resample
will crash and burn my machine.
Instead, I tried the sparse resampling
shown in the http://pandas.pydata.org/pandas-docs/stable/timeseries.html#sparse-resampling, but it fails when i provide it with a dictionary of columns.
Is that expected? Is it a bug?
import pandas as pd
import numpy as np
rng = pd.date_range('2014-1-1', periods=100, freq='D') + pd.Timedelta('1s')
ts = pd.DataFrame({'value' : range(100)}, index=rng)
from functools import partial
from pandas.tseries.frequencies import to_offset
def round(t, freq):
freq = to_offset(freq)
return pd.Timestamp((t.value // freq.delta.value) * freq.delta.value)
# works
ts.groupby(partial(round, freq='3T')).value.sum()
# does not work
ts.groupby(partial(round, freq='3T')).apply({'value' : 'sum'})
ts.groupby(partial(round, freq='3T')).apply({'value' : 'sum'})
Traceback (most recent call last):
File "<ipython-input-104-6004b307a469>", line 1, in <module>
ts.groupby(partial(round, freq='3T')).apply({'value' : 'sum'})
File "C:\Users\m1hxb02\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\groupby.py", line 674, in apply
func = self._is_builtin_func(func)
File "C:\Users\m1hxb02\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\base.py", line 644, in _is_builtin_func
return self._builtin_table.get(arg, arg)
TypeError: unhashable type: 'dict'
Problem is: I need to resample several columns at once in my dataframe, eventually using different functions (sum
, mean
, max
). Is anything wrong here?
Thanks~