ENH: change BlockManager pickle format to work with dup items by immerrr · Pull Request #7370 · pandas-dev/pandas (original) (raw)
Old BlockManager
pickle format stored items which were ambiguous and not enough for unpickling if non-unique.
After recent BlockManager overhaul, #6745, it's now possible to it's no longer necessary to share items/ref-items between Blocks and their respective managers, so Blocks can now be safely pickled/unpickled on their own, but Blocks still cannot be safely pickled/unpickled on their own, because they a) are part of internal API and are subject to change and b) were never intended to be serialized like that and their setstate is written in non-forward-compatible manner.
So, I've took the freedom to make the format more "extendable" by storing the state as the fourth element in BlockManager state tuple:
(axes_array, block_values, block_items,
{
"0.14.1": {
"axes": axes_array,
"blocks": [{"values": blk0.values,
"mgr_locs": blk0.mgr_locs.indexer},
...]
}
})
I was aiming for the following design goals:
- use the fact that pickle doesn't store same objects multiple times, i.e. adding an extra element to the tuple should add minimum overhead
- make state as forward compatible as possible: new versions simply add new elements to that dictionary along with the old ones
- make state as backward compatible as possible: no extra rules "if this then parse like that else ...", simply look up a supported version in the dictionary and use it
- don't store classes that are subject to change, I almost got bitten with this one when tried to serialize blocks as a whole, 0.13.1 version, that had different (and incompatible) getstate implementation, croaked.
This PR should close #7329.
- io pickle tests: verify 0.14.0 pickles are handled properly
- ensure forward compatibility with 13.1
- generate 14.1 pickles (for which platforms?)
- ensure forward compatibility with 14.0
ensure forward compatibility with 12.0 ?not necessary- make sure sparse containers with non-unique items work in those versions too