Unexpected transform
behavior on grouped dataset · Issue #3740 · pandas-dev/pandas (original) (raw)
I have a simple longitudinal biomedical dataset that I am grouping according to the patient on which measurements are taken. Here are the first couple of groups:
1
patient obs week site id treat age sex twstrs treatment
0 1 1 0 1 1 5000U 65 F 32 1
1 1 2 2 1 1 5000U 65 F 30 1
2 1 3 4 1 1 5000U 65 F 24 1
3 1 4 8 1 1 5000U 65 F 37 1
4 1 5 12 1 1 5000U 65 F 39 1
5 1 6 16 1 1 5000U 65 F 36 1
2
patient obs week site id treat age sex twstrs treatment
6 2 1 0 1 2 10000U 70 F 60 2
7 2 2 2 1 2 10000U 70 F 26 2
8 2 3 4 1 2 10000U 70 F 27 2
9 2 4 8 1 2 10000U 70 F 41 2
10 2 5 12 1 2 10000U 70 F 65 2
11 2 6 16 1 2 10000U 70 F 67 2
However, when I try to transform
these data, say by normalization, I get nonsensical results:
normalize = lambda x: (x - x.mean())/x.std()
normed = cdystonia_grouped.transform(normalize)
normed.head(10)
patient obs week site id \
0 -9223372036854775808 -1 -1 -9223372036854775808 -9223372036854775808
1 -9223372036854775808 0 0 -9223372036854775808 -9223372036854775808
2 -9223372036854775808 0 0 -9223372036854775808 -9223372036854775808
3 -9223372036854775808 0 0 -9223372036854775808 -9223372036854775808
4 -9223372036854775808 0 0 -9223372036854775808 -9223372036854775808
age twstrs treatment
0 -9223372036854775808 0 -9223372036854775808
1 -9223372036854775808 0 -9223372036854775808
2 -9223372036854775808 -1 -9223372036854775808
3 -9223372036854775808 0 -9223372036854775808
4 -9223372036854775808 1 -9223372036854775808
The normalize
function is straightforward, and works fine when applied to manually subsetted data:
normalize(cdystonia.twstrs[cdystonia.patient==1])
0 -0.181369
1 -0.544107
2 -1.632322
3 0.725476
4 1.088214
5 0.544107
Name: twstrs, dtype: float64
Any guidance here much appreciated. I'm hoping its something obvious.