Grouping a 2 rows DataFrame by time and columns doesn't work as expected · Issue #11185 · pandas-dev/pandas (original) (raw)

Hi guys,

Working with pandas is great, however I might have notice a bug while grouping a 2 rows DataFrame by time and columns:

import pandas as pd import numpy as np from datetime import datetime

freq = 's' t1 = np.datetime64(datetime.utcnow(), freq) index = pd.date_range(start=t1, periods=2, freq=freq)

DatetimeIndex(['2015-09-24 08:55:27', '2015-09-24 08:55:28'], dtype='datetime64[ns]', freq='S', tz=None)

df = pd.DataFrame([['A', 10], ['B', 15]], columns=['metric', 'values'], index=index)

metric values

#2015-09-24 08:55:27 A 10 #2015-09-24 08:55:28 B 15

grouped = df.groupby([pd.Grouper(level=0, freq=freq), 'metric'])

here the grouping should output something similar to the input DataFrame,

since each rows are already individual groups reguarding the parameters of the groupby function.

grouped.mean()

values

<pandas.tseries.resample.TimeGrouper object at ... 10

metric 15

notice how the index is broken : a new TimeGrouper object is the first index values,

while the second value is the name of the columns used to create the second group...

now let's try to add another row : a new second, a new metric

df_2 = pd.DataFrame([['C', 0]], columns=df.columns, index=[df.index.shift(-1, freq)[0]]) df_2 = df_2.append(df)

metric values

#2015-09-24 08:55:26 C 0 #2015-09-24 08:55:27 A 10 #2015-09-24 08:55:28 B 15

grouped = df_2.groupby([pd.Grouper(level=0, freq=freq), 'metric']) grouped.mean()

values

metric

#2015-09-24 08:55:26 C 0 #2015-09-24 08:55:27 A 10 #2015-09-24 08:55:28 B 15

work as expected with 3 rows !

let's try with 1 row :

df_2.iloc[0:1].groupby([pd.Grouper(level=0, freq=freq), 'metric']).mean()

values

metric

#2015-09-24 08:55:26 C 0

work as expected too !

I have tried to group by key, instead of level, or to use another frequency for aggregating (using freq = 's' while building the dataframe, then aggregate with freq='T'), but the result is the same.

Did I miss something ?

Please, not that using the resampling API provide the expected result, but i think the grouping API should provide consistent results :

df.groupby(['metric']).resample(how='mean', freq=freq)

values

metric

A 2015-09-24 08:55:27 10

B 2015-09-24 08:55:28 15

Here are the dependencies I have installed with pandas (working on Ubuntu 12.04.5 LTS):

from pandas.util.print_versions import show_versions show_versions() INSTALLED VERSIONS


commit: None python: 2.7.10.final.0 python-bits: 64 OS: Linux OS-release: 3.5.0-54-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: fr_FR.UTF-8

pandas: 0.16.2 nose: 1.3.7 Cython: 0.22.1 numpy: 1.9.2 scipy: 0.15.1 statsmodels: None IPython: 3.2.0 sphinx: 1.3.1 patsy: 0.3.0 dateutil: 2.4.2 pytz: 2015.4 bottleneck: 1.0.0 tables: 3.2.0 numexpr: 2.4.3 matplotlib: 1.4.3 openpyxl: 1.8.5 xlrd: 0.9.3 xlwt: 1.0.0 xlsxwriter: 0.7.3 lxml: 3.4.4 bs4: 4.3.2 html5lib: None httplib2: None apiclient: None sqlalchemy: 1.0.5 pymysql: None psycopg2: None