pandas.Grouper — pandas 3.0.0.dev0+2103.g41968a550a documentation (original) (raw)

class pandas.Grouper(*args, **kwargs)[source]#

A Grouper allows the user to specify a groupby instruction for an object.

This specification will select a column via the key parameter, or if the level parameter is given, a level of the index of the target object.

If level is passed as a keyword to both Grouper andgroupby, the values passed to Grouper take precedence.

Parameters:

*args

Currently unused, reserved for future use.

**kwargs

Dictionary of the keyword arguments to pass to Grouper.

Attributes

Returns:

Grouper or pandas.api.typing.TimeGrouper

A TimeGrouper is returned if freq is not None. Otherwise, a Grouper is returned.

Examples

df.groupby(pd.Grouper(key="Animal")) is equivalent to df.groupby('Animal')

df = pd.DataFrame( ... { ... "Animal": ["Falcon", "Parrot", "Falcon", "Falcon", "Parrot"], ... "Speed": [100, 5, 200, 300, 15], ... } ... ) df Animal Speed 0 Falcon 100 1 Parrot 5 2 Falcon 200 3 Falcon 300 4 Parrot 15 df.groupby(pd.Grouper(key="Animal")).mean() Speed Animal Falcon 200.0 Parrot 10.0

Specify a resample operation on the column ‘Publish date’

df = pd.DataFrame( ... { ... "Publish date": [ ... pd.Timestamp("2000-01-02"), ... pd.Timestamp("2000-01-02"), ... pd.Timestamp("2000-01-09"), ... pd.Timestamp("2000-01-16"), ... ], ... "ID": [0, 1, 2, 3], ... "Price": [10, 20, 30, 40], ... } ... ) df Publish date ID Price 0 2000-01-02 0 10 1 2000-01-02 1 20 2 2000-01-09 2 30 3 2000-01-16 3 40 df.groupby(pd.Grouper(key="Publish date", freq="1W")).mean() ID Price Publish date 2000-01-02 0.5 15.0 2000-01-09 2.0 30.0 2000-01-16 3.0 40.0

If you want to adjust the start of the bins based on a fixed timestamp:

start, end = "2000-10-01 23:30:00", "2000-10-02 00:30:00" rng = pd.date_range(start, end, freq="7min") ts = pd.Series(np.arange(len(rng)) * 3, index=rng) ts 2000-10-01 23:30:00 0 2000-10-01 23:37:00 3 2000-10-01 23:44:00 6 2000-10-01 23:51:00 9 2000-10-01 23:58:00 12 2000-10-02 00:05:00 15 2000-10-02 00:12:00 18 2000-10-02 00:19:00 21 2000-10-02 00:26:00 24 Freq: 7min, dtype: int64

ts.groupby(pd.Grouper(freq="17min")).sum() 2000-10-01 23:14:00 0 2000-10-01 23:31:00 9 2000-10-01 23:48:00 21 2000-10-02 00:05:00 54 2000-10-02 00:22:00 24 Freq: 17min, dtype: int64

ts.groupby(pd.Grouper(freq="17min", origin="epoch")).sum() 2000-10-01 23🔞00 0 2000-10-01 23:35:00 18 2000-10-01 23:52:00 27 2000-10-02 00:09:00 39 2000-10-02 00:26:00 24 Freq: 17min, dtype: int64

ts.groupby(pd.Grouper(freq="17min", origin="2000-01-01")).sum() 2000-10-01 23:24:00 3 2000-10-01 23:41:00 15 2000-10-01 23:58:00 45 2000-10-02 00:15:00 45 Freq: 17min, dtype: int64

If you want to adjust the start of the bins with an offset Timedelta, the two following lines are equivalent:

ts.groupby(pd.Grouper(freq="17min", origin="start")).sum() 2000-10-01 23:30:00 9 2000-10-01 23:47:00 21 2000-10-02 00:04:00 54 2000-10-02 00:21:00 24 Freq: 17min, dtype: int64

ts.groupby(pd.Grouper(freq="17min", offset="23h30min")).sum() 2000-10-01 23:30:00 9 2000-10-01 23:47:00 21 2000-10-02 00:04:00 54 2000-10-02 00:21:00 24 Freq: 17min, dtype: int64

To replace the use of the deprecated base argument, you can now use offset, in this example it is equivalent to have base=2:

ts.groupby(pd.Grouper(freq="17min", offset="2min")).sum() 2000-10-01 23:16:00 0 2000-10-01 23:33:00 9 2000-10-01 23:50:00 36 2000-10-02 00:07:00 39 2000-10-02 00:24:00 24 Freq: 17min, dtype: int64