CoW: add readonly flag to ExtensionArrays, return read-only EA/ndarray in .array/EA.to_numpy() by jorisvandenbossche · Pull Request #61925 · pandas-dev/pandas (original) (raw)

asked in the first place because it seems most of the code complexity in this PR is driven by to_numpy changes.

Looking at the diff again, I think it is a bit 50/50 between to_numpy() and __array__. But to_numpy() also reuses the result from __array__ in some cases, so if we would then want to have to_numpy() consistently not return readonly data, that would also requires some changes in to_numpy(). So regarding the implementation, not entirely sure this would be a lot simpler (but didn't look in detail).

The main reason i can think of to treat to_numpy different from .array and .values is that it has an explicit copy keyword. With copy=False, the user ideally understands that they are getting a view on existing data.

Yeah, we could potentially also make the default of copy to be None instead of False, with the same meaning (i.e. avoid a copy if possible), and so then if someone explicitly passes copy=False, then we wouldn't set the readonly flag.

From previous discussions (maybe #52823), I seem to remember that we at some point did bring up whether it would be worth having a keyword to control this behaviour, i.e. so there would be a way that you could ask for a numpy array that was guaranteed to be mutable. Of course you could do to_numpy(copy=True) which also guarantees that, but that doesn't cover the case where you want to get the data zero-copy if possible, and you know that mutating it is fine (for example because the holding dataframe or series is dismissed after converting).
At the moment, the documentation (https://pandas.pydata.org/docs/dev/user_guide/copy_on_write.html#read-only-numpy-arrays) suggests to manually reset the readonly flag:

arr = ser.to_numpy()
arr.flags.writeable = True

instead of adding a keyword like arr = ser.to_numpy(ensure_writable=True). But so in theory copy=False could also cover that.

(but this is probably a discussion for #52823)