BUG: Index alignment behaviour · Issue #39931 · pandas-dev/pandas (original) (raw)


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample

np.random.seed(1) df = pd.DataFrame(np.random.randint(0, 100, (10, 2)), columns=['A', 'B'])

df A B 0 37 12 1 72 9 2 75 5 3 79 64 4 16 1 5 76 71 6 6 25 7 50 20 8 18 84 9 11 28

mask = df['A'] >= 40

df[mask] A B 1 72 9 2 75 5 3 79 64 5 76 71 7 50 20

df[mask].sort_values('A') A B 7 50 20 1 72 9 2 75 5 5 76 71 3 79 64

CASE 1: Assignment with .loc

No problem here this is expected behaviour as pandas align data on indices.

df.loc[mask] = df[mask].sort_values('A')

df[mask]

A   B

1 72 9 2 75 5 3 79 64 5 76 71 7 50 20

CASE 2: Assignment without .loc

The problem is here. In pandas version 1.2.x assignment without using .loc no longer respects index alignment behaviour which was previously respected in version 1.1.x.

df[mask] = df[mask].sort_values('A')

df[mask]

A   B

1 50 20 2 72 9 3 75 5 5 76 71 7 79 64

Problem description

Lets say we want to sort the values of the slice of the dataframe and assign the sorted values back to the original dataframe in-place.

Prior to pandas version 1.2.x lets say in pandas version 1.1.5 the assignment operation even without using the loc respected the index alignment behaviour.

Expected Output

In CASE 2 the expected output should be:

df[mask] A B 1 72 9 2 75 5 3 79 64 5 76 71 7 50 20

As the assignment operation should respect the pandas alignment of indices behaviour.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 7d32926
python : 3.8.6.final.0
python-bits : 64
OS : Linux
OS-release : 5.8.0-44-generic
Version : #50-Ubuntu SMP Tue Feb 9 06:29:41 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_IN
LOCALE : en_IN.ISO8859-1

pandas : 1.2.2
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 44.0.0
Cython : None
pytest : 6.2.1
hypothesis : None
sphinx : 3.4.3
blosc : None
feather : None
xlsxwriter : 1.3.5
lxml.etree : 4.6.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: 0.9.0
bs4 : 4.9.1
bottleneck : None
fsspec : 0.8.5
fastparquet : 0.5.0
gcsfs : None
matplotlib : 3.3.2
numexpr : None
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 2.0.0
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : 1.3.22
tables : None
tabulate : 0.8.7
xarray : 0.16.2
xlrd : 1.2.0
xlwt : None
numba : 0.50.1