ENH: change BlockManager pickle format to work with dup items by immerrr · Pull Request #7370 · pandas-dev/pandas (original) (raw)

Old BlockManager pickle format stored items which were ambiguous and not enough for unpickling if non-unique.

After recent BlockManager overhaul, #6745, it's now possible to it's no longer necessary to share items/ref-items between Blocks and their respective managers, so Blocks can now be safely pickled/unpickled on their own, but Blocks still cannot be safely pickled/unpickled on their own, because they a) are part of internal API and are subject to change and b) were never intended to be serialized like that and their setstate is written in non-forward-compatible manner.

So, I've took the freedom to make the format more "extendable" by storing the state as the fourth element in BlockManager state tuple:

(axes_array, block_values, block_items,
 {
     "0.14.1": {
         "axes": axes_array,
         "blocks": [{"values": blk0.values,
                     "mgr_locs": blk0.mgr_locs.indexer},
                    ...]
     }
 })

I was aiming for the following design goals:

This PR should close #7329.