Standard Error of the Mean Unavailable in RollingGroupby (original) (raw)

Code Sample, a copy-pastable example if possible

Standard Error of the Mean works on Groupby objects, but not RollingGroupby objects.

Groupby object

data = pd.DataFrame({"a": np.linspace(0, 100, 101), "b": np.random.random(101).round(1) # values we can group on })

print values to 2 decimal places. Just showing it works.

data.groupby("b").sem().T.round(2)

Problem description

RollingGroupby object raises AttributeError

data = pd.DataFrame({"a": np.linspace(0, 100, 101), "b": np.random.random(101).round(1) # values we can group on })

print values to 2 decimal places. Just showing it works.

data.rolling(5).sem()

I believe API of Groupby and RollingGroupby are supposed to be as similar as possible. Than an optomized aggregation is available in one, but not the other is a problem.

Expected Output

The output should be the equivalent of

data.rolling(5).std() / (data.rolling(5).count() - ddof).pow(0.5)

where ddof is degrees of freedom.

Output of `pd.show_versions()`

DetailsINSTALLED VERSIONS ------------------ commit: None python: 3.7.3.final.0 python-bits: 64 OS: Linux OS-release: 3.10.0-693.11.6.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 4.4.1
pip: 19.0.3
setuptools: 41.0.0
Cython: None
numpy: 1.16.3
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.4.0
sphinx: 2.0.1
patsy: None
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Standard Error of the Mean Unavailable in RollingGroupby (original) (raw)

Code Sample, a copy-pastable example if possible

print values to 2 decimal places. Just showing it works.

Problem description

print values to 2 decimal places. Just showing it works.

Expected Output

Output of pd.show_versions()

Output of `pd.show_versions()`