Combination of groupby with dropna set to True and apply function not returning expected output · Issue #43205 · pandas-dev/pandas (original) (raw)
(Using pandas 1.3.1 and numpy 1.20.3)
This is a repetition of the SO question found here. (I did not have this issue previously when working with version 1.1.3).
In the following snippet:
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
"a": [1, 2, 3, 4, 5, 6, 7, 8, 9],
"b": [1, np.nan, 1, np.nan, 2, 1, 2, np.nan, 1]
}
)
df_again = df.groupby("b", dropna=False).apply(lambda x: x)
I was expecting df and df_again to be identical. But they are not:
df
a b
0 1 1.0
1 2 NaN
2 3 1.0
3 4 NaN
4 5 2.0
5 6 1.0
6 7 2.0
7 8 NaN
8 9 1.0
df_again
a b
0 1 1.0
2 3 1.0
4 5 2.0
5 6 1.0
6 7 2.0
8 9 1.0
Now, if I tweak slightly the lambda expression slightly to "see" what is going on by df.groupby("b", dropna=False).apply(lambda x: print(x))
I can actually visualize that also the portion of the df
where b
was NaN
was processed.