ENH: add gzip/bz2 compression to read_pickle() (and perhaps other read_*() methods) · Issue #11666 · pandas-dev/pandas (original) (raw)
Navigation Menu
- Explore
- Pricing
Provide feedback
Saved searches
Use saved searches to filter your results more quickly
Description
Right now, read_csv()
has a compression
option, which allows the user to pass in a gzipped or bz2-compressed CSV file directly into Pandas to be read. It would be great if read_pickle()
supported the same option. Pickles actually compress surprisingly well; I have a 567M Pandas pickle (resulting from DataFrame.to_pickle()
) that packs down to 45M with pigz --best
. An order of magnitude difference in size is pretty significant. This makes storing static pickles long-term as gzipped archives a very attractive option. Workflow would be made easier if Pandas could natively handle my dataframe.pickle.gz
files in the same way it does compressed CSV files.
More generally, a compression
option should probably be allowed for most read_*
methods. Many of the read_*
methods involve formats that compress very well.