read_pickle much slower in v0.13 (not using cPickle when compat=False) · Issue #6899 · pandas-dev/pandas (original) (raw)
Navigation Menu
- Explore
- Pricing
Provide feedback
Saved searches
Use saved searches to filter your results more quickly
Appearance settings
Description
See discussion here http://stackoverflow.com/questions/23122180/is-pandas-read-pickle-performance-crippled-in-version-0-13
My test dataset has pickled file size 1.6Gb, and contains about 13 million records.
With 0.12 the file takes 146s to load; with 0.13 is takes 982s (about 6.7x longer).
Using cProfile, you can see that v0.13 always uses native python pickle to load, even when compat=False. In 0.12 it uses cPickle to load. Seems like something is wrong with the logic in pandas/compat/pickle_compat.py
A workaround (if you know you don't need compatibility mode) is to use cPickle.load(open('foo.pickle')) instead of pandas.read_pickle('foo.pickle').