Fix zlib and blosc imports in to_msgpack by invisibleroads · Pull Request #9783 · pandas-dev/pandas (original) (raw)

Here are some performance results using the following code saved as compress.py.

from cStringIO import StringIO
from pandas import read_csv, read_msgpack
from urllib2 import urlopen

url = 'http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv'
table = read_csv(StringIO(urlopen(url).read()))
print 'prun z = read_msgpack(table.to_msgpack())'

When zlib and blosc are imported globally

$ ipython -i compress.py
In [1]: prun z = read_msgpack(table.to_msgpack())

500 function calls (492 primitive calls) in 0.030 seconds
Ordered by: internal time
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.022    0.022    0.025    0.025 packers.py:134(read)
     1    0.002    0.002    0.003    0.003 {method 'pack' of 'pandas.msgpack.Packer' objects}
    15    0.001    0.000    0.001    0.000 {method 'get' of 'dict' objects}
    15    0.001    0.000    0.001    0.000 {numpy.core.multiarray.array}
     2    0.000    0.000    0.000    0.000 {method 'encode' of 'unicode' objects}

$ ipython -i compress.py
In [1]: prun z = read_msgpack(table.to_msgpack(compress='zlib'))

508 function calls (500 primitive calls) in 0.059 seconds
Ordered by: internal time
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2    0.027    0.013    0.027    0.013 {zlib.compress}
     1    0.024    0.024    0.027    0.027 packers.py:134(read)
     1    0.003    0.003    0.031    0.031 {method 'pack' of 'pandas.msgpack.Packer' objects}
     2    0.002    0.001    0.002    0.001 {zlib.decompress}
    15    0.001    0.000    0.001    0.000 {method 'get' of 'dict' objects}

$ ipython -i compress.py
In [1]: prun z = read_msgpack(table.to_msgpack(compress='blosc'))

532 function calls (524 primitive calls) in 0.053 seconds
Ordered by: internal time
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.042    0.042    0.046    0.046 packers.py:134(read)
     1    0.003    0.003    0.005    0.005 {method 'pack' of 'pandas.msgpack.Packer' objects}
     2    0.001    0.001    0.001    0.001 {blosc.blosc_extension.compress}
    15    0.001    0.000    0.001    0.000 {method 'get' of 'dict' objects}
    15    0.001    0.000    0.001    0.000 {numpy.core.multiarray.array}
$ ipython -i compress.py 
In [1]: timeit read_msgpack(table.to_msgpack())
10 loops, best of 3: 27.7 ms per loop

$ ipython -i compress.py 
In [1]: timeit read_msgpack(table.to_msgpack(compress='zlib'))
10 loops, best of 3: 49.4 ms per loop

$ ipython -i compress.py 
In [1]: timeit read_msgpack(table.to_msgpack(compress='blosc'))
10 loops, best of 3: 28.7 ms per loop

When zlib and blosc are imported locally

prun z = read_msgpack(table.to_msgpack())

500 function calls (492 primitive calls) in 0.029 seconds
Ordered by: internal time
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.022    0.022    0.025    0.025 packers.py:133(read)
     1    0.002    0.002    0.003    0.003 {method 'pack' of 'pandas.msgpack.Packer' objects}
    15    0.001    0.000    0.001    0.000 {method 'get' of 'dict' objects}
    15    0.001    0.000    0.001    0.000 {numpy.core.multiarray.array}
     2    0.001    0.000    0.001    0.000 index.py:1508(get_indexer)

$ ipython -i compress.py
In [1]: prun z = read_msgpack(table.to_msgpack(compress='zlib'))
In [2]: prun z = read_msgpack(table.to_msgpack(compress='zlib'))
In [3]: prun z = read_msgpack(table.to_msgpack(compress='zlib'))

509 function calls (501 primitive calls) in 0.081 seconds
Ordered by: internal time
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.043    0.043    0.048    0.048 packers.py:133(read)
     2    0.027    0.013    0.027    0.013 {zlib.compress}
     1    0.004    0.004    0.031    0.031 {method 'pack' of 'pandas.msgpack.Packer' objects}
     2    0.002    0.001    0.002    0.001 {zlib.decompress}
    15    0.001    0.000    0.001    0.000 {numpy.core.multiarray.array}

479 function calls (471 primitive calls) in 0.053 seconds
Ordered by: internal time
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2    0.022    0.011    0.022    0.011 {zlib.compress}
     1    0.022    0.022    0.025    0.025 packers.py:133(read)
     1    0.002    0.002    0.025    0.025 {method 'pack' of 'pandas.msgpack.Packer' objects}
     2    0.002    0.001    0.002    0.001 {zlib.decompress}
     1    0.001    0.001    0.053    0.053 <string>:1(<module>)

479 function calls (471 primitive calls) in 0.053 seconds
Ordered by: internal time
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     2    0.022    0.011    0.022    0.011 {zlib.compress}
     1    0.022    0.022    0.025    0.025 packers.py:133(read)
     1    0.002    0.002    0.025    0.025 {method 'pack' of 'pandas.msgpack.Packer' objects}
     2    0.002    0.001    0.002    0.001 {zlib.decompress}
     1    0.002    0.002    0.053    0.053 <string>:1(<module>)

$ ipython -i compress.py
In [1]: prun z = read_msgpack(table.to_msgpack(compress='blosc'))
In [2]: prun z = read_msgpack(table.to_msgpack(compress='blosc'))
In [3]: prun z = read_msgpack(table.to_msgpack(compress='blosc'))

596 function calls (588 primitive calls) in 0.056 seconds
Ordered by: internal time
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.043    0.043    0.047    0.047 packers.py:133(read)
     1    0.003    0.003    0.007    0.007 {method 'pack' of 'pandas.msgpack.Packer' objects}
     2    0.002    0.001    0.002    0.001 {blosc.blosc_extension.compress}
    15    0.001    0.000    0.001    0.000 {method 'get' of 'dict' objects}
    15    0.001    0.000    0.001    0.000 {numpy.core.multiarray.array}

503 function calls (495 primitive calls) in 0.055 seconds
Ordered by: internal time
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.043    0.043    0.046    0.046 packers.py:133(read)
     1    0.003    0.003    0.006    0.006 {method 'pack' of 'pandas.msgpack.Packer' objects}
     1    0.002    0.002    0.055    0.055 <string>:1(<module>)
     2    0.002    0.001    0.002    0.001 {blosc.blosc_extension.compress}
    13    0.001    0.000    0.001    0.000 {method 'get' of 'dict' objects}

503 function calls (495 primitive calls) in 0.054 seconds
Ordered by: internal time
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.042    0.042    0.045    0.045 packers.py:133(read)
     1    0.003    0.003    0.006    0.006 {method 'pack' of 'pandas.msgpack.Packer' objects}
     1    0.002    0.002    0.054    0.054 <string>:1(<module>)
     2    0.002    0.001    0.002    0.001 {blosc.blosc_extension.compress}
    13    0.001    0.000    0.001    0.000 {method 'get' of 'dict' objects}
$ ipython -i compress.py 
In [1]: timeit read_msgpack(table.to_msgpack())
10 loops, best of 3: 28.2 ms per loop

$ ipython -i compress.py 
In [1]: timeit read_msgpack(table.to_msgpack(compress='zlib'))
10 loops, best of 3: 49.1 ms per loop

$ ipython -i compress.py 
In [1]: timeit read_msgpack(table.to_msgpack(compress='blosc'))
10 loops, best of 3: 29.5 ms per loop