BUG: ensure reindex / getitem to select columns properly copies data for extension dtypes by jorisvandenbossche · Pull Request #51197 · pandas-dev/pandas (original) (raw)

I encountered this while writing more tests for Copy-on-Write. Currently, the general rule is that selecting columns with a list-like indexer using getitem gives a copy:

df = pd.DataFrame(np.random.randn(10, 4), columns=['a', 'b', 'c', 'd']) subset = df[["a", "b"]]

subset is a copy

subset.iloc[0, 0] = 0 assert df.iloc[0, 0] != 0

However, that doesn't seem to be the case when the columns we select are extension dtypes. When using dtype="Float64" in the above example, the original df gets updated because subset isn't a copy.

While I am not sure this is an explicitly documented rule (AFAIK this is de-facto behaviour, and only described as such in the discussions related to copy/view and CoW), I do think it would be expected that extension dtypes behave the same as numpy dtypes on this front.
(it also makes writing tests for copy/view behaviour harder if the behaviour doesn't only change for CoW or not, but also depending on numpy vs extension dtypes. This is where I encountered the issue)