DataFrame.nunique is incorrect for DataFrame with no columns · Issue #21959 · pandas-dev/pandas (original) (raw)

(edit by @TomAugspurger)

Current output:

In [33]: pd.DataFrame(index=[0, 1]).nunique() Out[33]: Empty DataFrame Columns: [] Index: [0, 1]

Expected Output is an empty series:

Out[34]: Series([], dtype: float64)

Not sure what the expected dtype of that Series should be... probably object.

original post below:


Code Sample, a copy-pastable example if possible

With Pandas 0.20.3

create a DataFrame with 3 rows

df = pd.DataFrame({'a': ['A','B','C']})

lookup unique values for each column, excluding 'a'

unique = df.loc[:, (df.columns != 'a')].nunique()

this results in an empty Series, the index is also empty

unique.index.tolist()

[]

and

unique[unique == 1].index.tolist()

[]

With pandas 0.23.3

create a DataFrame with 3 rows

df = pd.DataFrame({'a': ['A','B','C']})

lookup unique values for each column, excluding 'a'

unique = df.loc[:, (df.columns != 'a')].nunique()

this results in an empty Series, but the index is not empty

unique.index.tolist()

[1,2,3] also: unique[unique == 1].index.tolist() [1,2,3]

Note:

if we have don't have an empty df, the behavior of nunique() seems fine:

df = pd.DataFrame({'a': ['A','B','C'], 'b': [1,1,1]}) unique = df.loc[:, (df.columns != 'a')].nunique()

unique[unique == 1]

b 1 dtype: int64

and

unique[unique == 1].index.tolist()

['b']

Problem description

The change of behavior is a bit disturbing, and seems like it is a bug:
nunique() ends up creating a Series, and it should be a Series of the df columns, but that doesn't seem to be the case here, instead it is picking up the index of the df.

This is likely related to:

#21932
#21255

I am posting this because in my use case I use the list to drop the columns, but i end up with column names that do not exist in the df

INSTALLED VERSIONS ------------------ commit: None python: 2.7.10.final.0 python-bits: 64 OS: Darwin OS-release: 17.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.23.3 pytest: None pip: 10.0.1 setuptools: 39.2.0 Cython: None numpy: 1.13.3 scipy: 0.19.1 pyarrow: None xarray: None IPython: None sphinx: 1.5.5 patsy: 0.5.0 dateutil: 2.6.1 pytz: 2015.7 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: 0.9.4 xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: 2.7.3.2 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None