Performance issue with timeseries plotting on py3? · Issue #11831 · pandas-dev/pandas (original) (raw)

I noticed a performance issue with plotting timeseries. After some trying with different environments (different pandas, matplotlib and python versions), it seems there is a problem on python 3 -> up to 10 x slowdown compared to python 2.7:

Python 2 and pandas 0.16.2 and 0.17.1:

In [2]: sys.version
Out[2]: '2.7.10 |Anaconda 1.7.0 (64-bit)| (default, Oct 21 2015, 19:35:23) [MSC
v.1500 64 bit (AMD64)]'

In [3]: pd.__version__
Out[3]: '0.16.2'

In [4]: matplotlib.__version__
Out[4]: '1.4.3'

In [6]: N = 2000

In [7]: df = pd.DataFrame(np.random.randn(N, 5), index=pd.date_range('1/1/1975',  periods=N))

In [8]: %timeit df.plot()
1 loops, best of 3: 228 ms per loop

In [10]: df = pd.DataFrame(np.random.randn(N, 5))

In [11]: %timeit df.plot()
10 loops, best of 3: 110 ms per loop
In [1]: import sys

In [2]: sys.version
Out[2]: '2.7.11 |Continuum Analytics, Inc.| (default, Dec  7 2015, 14:10:42) [MS
C v.1500 64 bit (AMD64)]'

In [3]: pd.__version__
Out[3]: u'0.17.1'

In [4]: matplotlib.__version__
Out[4]: '1.5.0'

In [5]: N = 2000

In [6]: df = pd.DataFrame(np.random.randn(N, 5), index=pd.date_range('1/1/1975',  periods=N))

In [7]: %timeit df.plot()
1 loops, best of 3: 269 ms per loop

In [8]: df = pd.DataFrame(np.random.randn(N, 5))

In [9]: %timeit df.plot()
10 loops, best of 3: 139 ms per loop

With python 3, pandas 0.16.2 and 0.17.1:

In [2]: sys.version
Out[2]: '3.5.0 |Continuum Analytics, Inc.| (default, Dec  1 2015, 11:46:22) [MSC
 v.1900 64 bit (AMD64)]'

In [3]: pd.__version__
Out[3]: '0.16.2'

In [4]: matplotlib.__version__
Out[4]: '1.5.0'

In [5]: N = 2000

In [6]: df = pd.DataFrame(np.random.randn(N, 5), index=pd.date_range('1/1/1975',  periods=N))

In [7]: %timeit df.plot()
1 loops, best of 3: 1.02 s per loop

In [9]: df = pd.DataFrame(np.random.randn(N, 5))

In [10]: %timeit df.plot()
10 loops, best of 3: 143 ms per loop
In [1]: import sys

In [2]: sys.version
Out[2]: '3.5.0 |Anaconda 2.4.0 (64-bit)| (default, Nov  7 2015, 13:15:24) [MSC v
.1900 64 bit (AMD64)]'

In [3]: pd.__version__
Out[3]: '0.17.1'

In [4]: matplotlib.__version__
Out[4]: '1.5.0'

In [5]: N = 2000

In [6]: df = pd.DataFrame(np.random.randn(N, 5), index=pd.date_range('1/1/1975', periods=N))

In [7]: %timeit df.plot()
1 loops, best of 3: 2.37 s per loop    <------------ !!!! 10x slower than on py2.7

In [8]: df = pd.DataFrame(np.random.randn(N, 5))

In [9]: %timeit df.plot()
10 loops, best of 3: 132 ms per loop