Performance issue with timeseries plotting on py3? · Issue #11831 · pandas-dev/pandas (original) (raw)
I noticed a performance issue with plotting timeseries. After some trying with different environments (different pandas, matplotlib and python versions), it seems there is a problem on python 3 -> up to 10 x slowdown compared to python 2.7:
Python 2 and pandas 0.16.2 and 0.17.1:
In [2]: sys.version
Out[2]: '2.7.10 |Anaconda 1.7.0 (64-bit)| (default, Oct 21 2015, 19:35:23) [MSC
v.1500 64 bit (AMD64)]'
In [3]: pd.__version__
Out[3]: '0.16.2'
In [4]: matplotlib.__version__
Out[4]: '1.4.3'
In [6]: N = 2000
In [7]: df = pd.DataFrame(np.random.randn(N, 5), index=pd.date_range('1/1/1975', periods=N))
In [8]: %timeit df.plot()
1 loops, best of 3: 228 ms per loop
In [10]: df = pd.DataFrame(np.random.randn(N, 5))
In [11]: %timeit df.plot()
10 loops, best of 3: 110 ms per loop
In [1]: import sys
In [2]: sys.version
Out[2]: '2.7.11 |Continuum Analytics, Inc.| (default, Dec 7 2015, 14:10:42) [MS
C v.1500 64 bit (AMD64)]'
In [3]: pd.__version__
Out[3]: u'0.17.1'
In [4]: matplotlib.__version__
Out[4]: '1.5.0'
In [5]: N = 2000
In [6]: df = pd.DataFrame(np.random.randn(N, 5), index=pd.date_range('1/1/1975', periods=N))
In [7]: %timeit df.plot()
1 loops, best of 3: 269 ms per loop
In [8]: df = pd.DataFrame(np.random.randn(N, 5))
In [9]: %timeit df.plot()
10 loops, best of 3: 139 ms per loop
With python 3, pandas 0.16.2 and 0.17.1:
In [2]: sys.version
Out[2]: '3.5.0 |Continuum Analytics, Inc.| (default, Dec 1 2015, 11:46:22) [MSC
v.1900 64 bit (AMD64)]'
In [3]: pd.__version__
Out[3]: '0.16.2'
In [4]: matplotlib.__version__
Out[4]: '1.5.0'
In [5]: N = 2000
In [6]: df = pd.DataFrame(np.random.randn(N, 5), index=pd.date_range('1/1/1975', periods=N))
In [7]: %timeit df.plot()
1 loops, best of 3: 1.02 s per loop
In [9]: df = pd.DataFrame(np.random.randn(N, 5))
In [10]: %timeit df.plot()
10 loops, best of 3: 143 ms per loop
In [1]: import sys
In [2]: sys.version
Out[2]: '3.5.0 |Anaconda 2.4.0 (64-bit)| (default, Nov 7 2015, 13:15:24) [MSC v
.1900 64 bit (AMD64)]'
In [3]: pd.__version__
Out[3]: '0.17.1'
In [4]: matplotlib.__version__
Out[4]: '1.5.0'
In [5]: N = 2000
In [6]: df = pd.DataFrame(np.random.randn(N, 5), index=pd.date_range('1/1/1975', periods=N))
In [7]: %timeit df.plot()
1 loops, best of 3: 2.37 s per loop <------------ !!!! 10x slower than on py2.7
In [8]: df = pd.DataFrame(np.random.randn(N, 5))
In [9]: %timeit df.plot()
10 loops, best of 3: 132 ms per loop