Unexpected transform behavior on grouped dataset · Issue #3740 · pandas-dev/pandas (original) (raw)

I have a simple longitudinal biomedical dataset that I am grouping according to the patient on which measurements are taken. Here are the first couple of groups:

1
   patient  obs  week  site  id  treat  age sex  twstrs  treatment
0        1    1     0     1   1  5000U   65   F      32          1
1        1    2     2     1   1  5000U   65   F      30          1
2        1    3     4     1   1  5000U   65   F      24          1
3        1    4     8     1   1  5000U   65   F      37          1
4        1    5    12     1   1  5000U   65   F      39          1
5        1    6    16     1   1  5000U   65   F      36          1

2
    patient  obs  week  site  id   treat  age sex  twstrs  treatment
6         2    1     0     1   2  10000U   70   F      60          2
7         2    2     2     1   2  10000U   70   F      26          2
8         2    3     4     1   2  10000U   70   F      27          2
9         2    4     8     1   2  10000U   70   F      41          2
10        2    5    12     1   2  10000U   70   F      65          2
11        2    6    16     1   2  10000U   70   F      67          2

However, when I try to transform these data, say by normalization, I get nonsensical results:

normalize = lambda x: (x - x.mean())/x.std()
normed = cdystonia_grouped.transform(normalize)
normed.head(10)

               patient  obs  week                 site                   id  \
0 -9223372036854775808   -1    -1 -9223372036854775808 -9223372036854775808   
1 -9223372036854775808    0     0 -9223372036854775808 -9223372036854775808   
2 -9223372036854775808    0     0 -9223372036854775808 -9223372036854775808   
3 -9223372036854775808    0     0 -9223372036854775808 -9223372036854775808   
4 -9223372036854775808    0     0 -9223372036854775808 -9223372036854775808   

                   age  twstrs            treatment  
0 -9223372036854775808       0 -9223372036854775808  
1 -9223372036854775808       0 -9223372036854775808  
2 -9223372036854775808      -1 -9223372036854775808  
3 -9223372036854775808       0 -9223372036854775808  
4 -9223372036854775808       1 -9223372036854775808  

The normalize function is straightforward, and works fine when applied to manually subsetted data:

normalize(cdystonia.twstrs[cdystonia.patient==1])

0   -0.181369
1   -0.544107
2   -1.632322
3    0.725476
4    1.088214
5    0.544107
Name: twstrs, dtype: float64

Any guidance here much appreciated. I'm hoping its something obvious.