DataFrame self-joins · Issue #2996 · pandas-dev/pandas (original) (raw)
Given the following DataFrame
area | point | test | value |
---|---|---|---|
A | 11 | 0 | 1234234 |
A | 11 | 1 | 12341234 |
A | 16 | 0 | 234234 |
A | 16 | 1 | 2343 |
A | 16 | 2 | 234234 |
C | 4 | 0 | 234234 |
C | 4 | 1 | 234234 |
it would be nice if there were a way of grouping say columns area
and point
and comparing the value
per test
> 1 with the value for test
- 1.
This can be done by iterating over df.groupby(['area', 'point', 'test'])
and using the sorting provided by groupby()
on the specified columns to compare current and previous value
s. However, it would be neat if this could also be done in a more Pandas-esque way using something akin to a SQL self-join.
NB request first made in pystatsmodels Google Group; was asked by Wes to create a Github issue for this.