to_csv / to_pickle optionally use fast gzip compressionlevel=1 · Issue #33196 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
Enhancement proposal
df.to_csv('data.csv.gz') # Awfully slow
df.to_pickle('data.pkl.bz2') # Awfully slow
df.to_csv('data.csv.gz', fast=True) # Uses fast compressionlevel=1
or, better:
pd.options.io.compressionlevel = 1
df.to_pickle('data.pkl.bz2') # Uses fast compressionlevel=1
...
Problem description
Compression of large objects in pandas is slow.
Popular evidence comparing compression levels for average payloads shows [1] [2] that compressed size is usually far less variable than compression time, which most often spans several folds.compressionlevel=1
is orders of magnitude faster, whereas compressionlevel=9
is only 10% smaller.
One optimizes for size, the other for speed, and much fewer people ever need something in between.
Expected Output
Output of pd.show_versions()
1.1.0.dev0+786.gec7734169