Regression from 0.19.2 to 0.20.1 in pandas.unique() when applied to list of tuples · Issue #16519 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

import pandas as pd

input = [(0, 0), (0, 1), (1, 0), (1, 1), (0, 0), (0, 1), (1, 0), (1, 1)] print pd.unique(input)

Problem description

The code exits unexpectedly

Traceback (most recent call last):
  File "pandas_bug.py", line 6, in <module>
    pd.unique(input)
  File "/Users/johannes/.virtualenvs/pandas/lib/python2.7/site-packages/pandas/core/algorithms.py", line 351, in unique
    uniques = table.unique(values)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1271, in pandas._libs.hashtable.PyObjectHashTable.unique (pandas/_libs/hashtable.c:21384)
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Expected Output

The code works on pandas version 0.19.2 and produces the expected output

[(0, 0) (0, 1) (1, 0) (1, 1)]

Moreover this problem is not limited to MacOSX, but was also encounter on Ubuntu CI server.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.10.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.20.1
pytest: None
pip: 9.0.1
setuptools: 35.0.2
Cython: None
numpy: 1.12.1
scipy: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None
None