KeyError shows incorrect column name when DataFrame has duplicate columns (original) (raw)

import pandas as pd df = pd.DataFrame({'x': [1.], 'y': [2.], 'z': [3.]}) df.columns = ['x', 'x', 'z'] df[['x', 'y', 'z']] KeyError: "['z'] not in index"

I expected to see KeyError: "['y'] not in index".

I've tested this on the latest code in master (and on 0.16):

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.18-400.1.1.el5
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB
LOCALE: None.None

pandas: 0.18.1+279.g31f8e4d
nose: None
pip: 1.3.1
setuptools: 0.6
Cython: 0.22
numpy: 1.9.2
scipy: None
statsmodels: None
xarray: None
IPython: 3.2.0-1
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.6
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None