read_pickle much slower in v0.13 (not using cPickle when compat=False) · Issue #6899 · pandas-dev/pandas (original) (raw)

Skip to content

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sign up

Appearance settings

@patricksurry

Description

@patricksurry

See discussion here http://stackoverflow.com/questions/23122180/is-pandas-read-pickle-performance-crippled-in-version-0-13

My test dataset has pickled file size 1.6Gb, and contains about 13 million records.

With 0.12 the file takes 146s to load; with 0.13 is takes 982s (about 6.7x longer).

Using cProfile, you can see that v0.13 always uses native python pickle to load, even when compat=False. In 0.12 it uses cPickle to load. Seems like something is wrong with the logic in pandas/compat/pickle_compat.py

A workaround (if you know you don't need compatibility mode) is to use cPickle.load(open('foo.pickle')) instead of pandas.read_pickle('foo.pickle').