Fix zlib and blosc imports in to_msgpack by invisibleroads · Pull Request #9783 · pandas-dev/pandas (original) (raw)
Here are some performance results using the following code saved as compress.py.
from cStringIO import StringIO
from pandas import read_csv, read_msgpack
from urllib2 import urlopen
url = 'http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_month.csv'
table = read_csv(StringIO(urlopen(url).read()))
print 'prun z = read_msgpack(table.to_msgpack())'
When zlib and blosc are imported globally
$ ipython -i compress.py
In [1]: prun z = read_msgpack(table.to_msgpack())
500 function calls (492 primitive calls) in 0.030 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.022 0.022 0.025 0.025 packers.py:134(read)
1 0.002 0.002 0.003 0.003 {method 'pack' of 'pandas.msgpack.Packer' objects}
15 0.001 0.000 0.001 0.000 {method 'get' of 'dict' objects}
15 0.001 0.000 0.001 0.000 {numpy.core.multiarray.array}
2 0.000 0.000 0.000 0.000 {method 'encode' of 'unicode' objects}
$ ipython -i compress.py
In [1]: prun z = read_msgpack(table.to_msgpack(compress='zlib'))
508 function calls (500 primitive calls) in 0.059 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
2 0.027 0.013 0.027 0.013 {zlib.compress}
1 0.024 0.024 0.027 0.027 packers.py:134(read)
1 0.003 0.003 0.031 0.031 {method 'pack' of 'pandas.msgpack.Packer' objects}
2 0.002 0.001 0.002 0.001 {zlib.decompress}
15 0.001 0.000 0.001 0.000 {method 'get' of 'dict' objects}
$ ipython -i compress.py
In [1]: prun z = read_msgpack(table.to_msgpack(compress='blosc'))
532 function calls (524 primitive calls) in 0.053 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.042 0.042 0.046 0.046 packers.py:134(read)
1 0.003 0.003 0.005 0.005 {method 'pack' of 'pandas.msgpack.Packer' objects}
2 0.001 0.001 0.001 0.001 {blosc.blosc_extension.compress}
15 0.001 0.000 0.001 0.000 {method 'get' of 'dict' objects}
15 0.001 0.000 0.001 0.000 {numpy.core.multiarray.array}
$ ipython -i compress.py
In [1]: timeit read_msgpack(table.to_msgpack())
10 loops, best of 3: 27.7 ms per loop
$ ipython -i compress.py
In [1]: timeit read_msgpack(table.to_msgpack(compress='zlib'))
10 loops, best of 3: 49.4 ms per loop
$ ipython -i compress.py
In [1]: timeit read_msgpack(table.to_msgpack(compress='blosc'))
10 loops, best of 3: 28.7 ms per loop
When zlib and blosc are imported locally
prun z = read_msgpack(table.to_msgpack())
500 function calls (492 primitive calls) in 0.029 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.022 0.022 0.025 0.025 packers.py:133(read)
1 0.002 0.002 0.003 0.003 {method 'pack' of 'pandas.msgpack.Packer' objects}
15 0.001 0.000 0.001 0.000 {method 'get' of 'dict' objects}
15 0.001 0.000 0.001 0.000 {numpy.core.multiarray.array}
2 0.001 0.000 0.001 0.000 index.py:1508(get_indexer)
$ ipython -i compress.py
In [1]: prun z = read_msgpack(table.to_msgpack(compress='zlib'))
In [2]: prun z = read_msgpack(table.to_msgpack(compress='zlib'))
In [3]: prun z = read_msgpack(table.to_msgpack(compress='zlib'))
509 function calls (501 primitive calls) in 0.081 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.043 0.043 0.048 0.048 packers.py:133(read)
2 0.027 0.013 0.027 0.013 {zlib.compress}
1 0.004 0.004 0.031 0.031 {method 'pack' of 'pandas.msgpack.Packer' objects}
2 0.002 0.001 0.002 0.001 {zlib.decompress}
15 0.001 0.000 0.001 0.000 {numpy.core.multiarray.array}
479 function calls (471 primitive calls) in 0.053 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
2 0.022 0.011 0.022 0.011 {zlib.compress}
1 0.022 0.022 0.025 0.025 packers.py:133(read)
1 0.002 0.002 0.025 0.025 {method 'pack' of 'pandas.msgpack.Packer' objects}
2 0.002 0.001 0.002 0.001 {zlib.decompress}
1 0.001 0.001 0.053 0.053 <string>:1(<module>)
479 function calls (471 primitive calls) in 0.053 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
2 0.022 0.011 0.022 0.011 {zlib.compress}
1 0.022 0.022 0.025 0.025 packers.py:133(read)
1 0.002 0.002 0.025 0.025 {method 'pack' of 'pandas.msgpack.Packer' objects}
2 0.002 0.001 0.002 0.001 {zlib.decompress}
1 0.002 0.002 0.053 0.053 <string>:1(<module>)
$ ipython -i compress.py
In [1]: prun z = read_msgpack(table.to_msgpack(compress='blosc'))
In [2]: prun z = read_msgpack(table.to_msgpack(compress='blosc'))
In [3]: prun z = read_msgpack(table.to_msgpack(compress='blosc'))
596 function calls (588 primitive calls) in 0.056 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.043 0.043 0.047 0.047 packers.py:133(read)
1 0.003 0.003 0.007 0.007 {method 'pack' of 'pandas.msgpack.Packer' objects}
2 0.002 0.001 0.002 0.001 {blosc.blosc_extension.compress}
15 0.001 0.000 0.001 0.000 {method 'get' of 'dict' objects}
15 0.001 0.000 0.001 0.000 {numpy.core.multiarray.array}
503 function calls (495 primitive calls) in 0.055 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.043 0.043 0.046 0.046 packers.py:133(read)
1 0.003 0.003 0.006 0.006 {method 'pack' of 'pandas.msgpack.Packer' objects}
1 0.002 0.002 0.055 0.055 <string>:1(<module>)
2 0.002 0.001 0.002 0.001 {blosc.blosc_extension.compress}
13 0.001 0.000 0.001 0.000 {method 'get' of 'dict' objects}
503 function calls (495 primitive calls) in 0.054 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.042 0.042 0.045 0.045 packers.py:133(read)
1 0.003 0.003 0.006 0.006 {method 'pack' of 'pandas.msgpack.Packer' objects}
1 0.002 0.002 0.054 0.054 <string>:1(<module>)
2 0.002 0.001 0.002 0.001 {blosc.blosc_extension.compress}
13 0.001 0.000 0.001 0.000 {method 'get' of 'dict' objects}
$ ipython -i compress.py
In [1]: timeit read_msgpack(table.to_msgpack())
10 loops, best of 3: 28.2 ms per loop
$ ipython -i compress.py
In [1]: timeit read_msgpack(table.to_msgpack(compress='zlib'))
10 loops, best of 3: 49.1 ms per loop
$ ipython -i compress.py
In [1]: timeit read_msgpack(table.to_msgpack(compress='blosc'))
10 loops, best of 3: 29.5 ms per loop