to_csv / to_pickle optionally use fast gzip compressionlevel=1 · Issue #33196 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

Enhancement proposal

df.to_csv('data.csv.gz') # Awfully slow

df.to_pickle('data.pkl.bz2') # Awfully slow

df.to_csv('data.csv.gz', fast=True) # Uses fast compressionlevel=1

or, better:

pd.options.io.compressionlevel = 1

df.to_pickle('data.pkl.bz2') # Uses fast compressionlevel=1

...

Problem description

Compression of large objects in pandas is slow.

Popular evidence comparing compression levels for average payloads shows [1] [2] that compressed size is usually far less variable than compression time, which most often spans several folds.
compressionlevel=1 is orders of magnitude faster, whereas compressionlevel=9 is only 10% smaller.

One optimizes for size, the other for speed, and much fewer people ever need something in between.

Expected Output

Output of pd.show_versions()

1.1.0.dev0+786.gec7734169