BUG: Numpy ufuncs e.g. np.[op](df1, df2) aligns columns in pandas 1.2.0 where it did not before · Issue #39184 · pandas-dev/pandas (original) (raw)
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- (optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample
df = pd.DataFrame({k: [1,2,3,4,5] for k in 'abcd'}) np.add(df[['a', 'b']], df[['c', 'd']])
Problem description
This is a regression from pandas 1.1.5 (both versions are using numpy 1.19.5).
Normally if we want to add, subtract, multiply or divide df columns with different names we get NaNs because the column names don't match. E.g:
df[['a', 'b']] + df[['c', 'd']] a b c d 0 NaN NaN NaN NaN 1 NaN NaN NaN NaN 2 NaN NaN NaN NaN 3 NaN NaN NaN NaN 4 NaN NaN NaN NaN
To get around this, we would use np.[op](df1, df2).
However, we get the same output as above.
np.add(df[['a', 'b']], df[['c', 'd']]) a b c d 0 NaN NaN NaN NaN 1 NaN NaN NaN NaN 2 NaN NaN NaN NaN 3 NaN NaN NaN NaN 4 NaN NaN NaN NaN
Expected Output
Using pandas 1.1.5:
np.add(df[['a', 'b']], df[['c', 'd']]) a b 0 2 2 1 4 4 2 6 6 3 8 8 4 10 10
Temporary solution
This may have a potential copy penalty with the conversion to numpy
df[['a', 'b']] + df[['c', 'd']].to_numpy() a b 0 2 2 1 4 4 2 6 6 3 8 8 4 10 10
Output of pd.show_versions()
INSTALLED VERSIONS ------------------ commit : 3e89b4cpython : 3.9.1.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19041 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252
pandas : 1.2.0
numpy : 1.19.5
pytz : 2020.5
dateutil : 2.8.1
pip : 20.3.3
setuptools : 49.6.0.post20210108
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.19.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None
Just my 2 cents: I was more than willing to test this on a nightly / master release, but it doesn't appear you release those. It could be quite beneficial to publish nightlies to PyPl so we don't report issues that have already been fixed. For some, it might be easier to test a nightly than peruse recent open and closed issues.