BUG: Series.corr/cov raising with masked dtype by lukemanley · Pull Request #51422 · pandas-dev/pandas (original) (raw)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens here if there are NAs?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both left and right are filtered for notna
here:
valid = notna(a) & notna(b) |
---|
so this works:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: ser1 = pd.Series(np.random.randn(100), dtype="Float64")
In [4]: ser2 = pd.Series(np.random.randn(100), dtype="Float64")
In [5]: ser1[1] = pd.NA
In [6]: ser2[5:7] = pd.NA
In [7]: ser1.corr(ser2)
Out[7]: 0.09774253881093414
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool. what about if there is an nan?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean an nan within a masked array? e.g.
In [8]: ser1[1] = 0.0
In [9]: ser1 /= (ser1 != 0)
In [10]: ser1
Out[10]:
0 0.148073
1 NaN
2 0.556972
3 -0.554886
4 1.216938
...
95 0.33919
96 0.528683
97 1.590215
98 0.84015
99 0.333666
Length: 100, dtype: Float64
In [11]: ser1.corr(ser2)
Out[11]: 0.09774253881093414
It works either way since the notna
is applied to the ndarray
which will capture both np.nan
and pd.NA