df.plot() very slow compared to explicit matplotlib on large dataframes · Issue #18236 · pandas-dev/pandas (original) (raw)
import numpy as np
import pandas as pd
import matplotlib
#matplotlib.use('tkagg')
import matplotlib.pyplot as plt
import time
#plt.ion()
class Timer():
def __init__(self):
self.time0=time.time()
def __call__(self, str_):
print(str_+str(time.time()-self.time0))
self.time0=time.time()
df=pd.DataFrame({'a':np.random.randn(1000000)})
fig=plt.figure(1)
fig.clf()
timer=Timer()
ax0=fig.add_subplot(2,1,1)
ax0.plot(df.index,df['a'])
timer('matplotlib took: ')
ax2=fig.add_subplot(2,1,2)
df.plot(legend=None, ax=ax2)
timer('pandas took: ')
fig.canvas.draw()
Result on my slow computer:
matplotlib took:0.48404979705810547
pandas took: 47.75511360168457
Why is the df.plot() much slower than the explicit matplotlib plot when ploting the same dataframe?
The above is a simple example but I encountered the same difference on real world data with times as the index when run an a much faster computer with more memory (a recent anaconda download)
INSTALLED VERSIONS ------------------ commit: None python: 3.6.2.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: AMD64 Family 20 Model 2 Stepping 0, AuthenticAMD byteorder: little LC_ALL: None LANG: en LOCALE: None.None
pandas: 0.20.3
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 0.9.8
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None