PERF: regression in MultiIndex get_loc performance · Issue #16319 · pandas-dev/pandas (original) (raw)

Your code here

import pandas as pd import numpy as np import time print pd.version

iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']] multind=pd.MultiIndex.from_product(iterables, names=['first', 'second'])

df = pd.DataFrame(np.random.randn(4, 8), columns=multind) df2 = pd.DataFrame(np.random.randn(4, 8), columns=multind)

t2=time.time() df.combine_first(df2) print "%f" % (time.time()-t2)

Problem description

Running this same code takes 116 ms in version 0.20.1
however it takes 3.6 ms in version 0.19.2.
This makes version 0.20.1 more than 30 times slower than 0.19.2 for this method.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.13.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: C LANG: fr_FR.UTF-8 LOCALE: fr_FR.UTF-8

pandas: 0.20.1
pytest: 2.8.5
pip: 9.0.1
setuptools: 19.6.2
Cython: 0.24.1
numpy: 1.11.3
scipy: 0.18.1
xarray: None
IPython: 5.1.0
sphinx: 1.3.5
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.2.0
tables: 3.2.2
numexpr: 2.6.1
feather: None
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.8.4
lxml: 3.5.0
bs4: 4.4.1
html5lib: 0.9999999
sqlalchemy: 1.0.13
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext)
jinja2: 2.8
s3fs: 0.0.9
pandas_gbq: None
pandas_datareader: None