BUG: drop_duplicates() throws exception if DF has duplicate column names · Issue #17836 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

Your code here

df=pd.DataFrame([[1,2],[3,4]], columns=list('aa')) . # NOTICE TWO COLUMNS EACH CALLED 'a' df.drop_duplicates()

Problem description

May be related to 10161

Above code blows up with exception:

[...]
/opt/virtualenvs/luigi/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in f(vals)
3095 labels, shape = algos.factorize(vals,
3096 size_hint=min(len(self),
-> 3097 _SIZE_HINT_LIMIT))
3098 return labels.astype('i8', copy=False), len(shape)
3099

/opt/virtualenvs/luigi/local/lib/python2.7/dist-packages/pandas/core/algorithms.pyc in factorize(values, sort, order, na_sentinel, size_hint)
183 table = hash_klass(size_hint or len(vals))
184 uniques = vec_klass()
--> 185 labels = table.get_labels(vals, uniques, 0, na_sentinel, True)
186
187 labels = com._ensure_platform_int(labels)

pandas/hashtable.pyx in pandas.hashtable.Int64HashTable.get_labels (pandas/hashtable.c:7941)()

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Expected Output

I was excepting simple dataframe output as there are no duplicate rows there.

a a
0 1 2
1 3 4

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.32-15.41.amzn1.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: None
pip: 9.0.1
setuptools: 36.4.0
Cython: None
numpy: 1.11.0
scipy: 0.17.1
statsmodels: 0.8.0
xarray: None
IPython: 5.4.1
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999999999
httplib2: None
apiclient: None
sqlalchemy: 1.1.14
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.7.3
boto: 2.48.0
pandas_datareader: None