DataFrame.nunique is incorrect for DataFrame with no columns · Issue #21959 · pandas-dev/pandas (original) (raw)
(edit by @TomAugspurger)
Current output:
In [33]: pd.DataFrame(index=[0, 1]).nunique() Out[33]: Empty DataFrame Columns: [] Index: [0, 1]
Expected Output is an empty series:
Out[34]: Series([], dtype: float64)
Not sure what the expected dtype of that Series should be... probably object.
original post below:
Code Sample, a copy-pastable example if possible
With Pandas 0.20.3
create a DataFrame with 3 rows
df = pd.DataFrame({'a': ['A','B','C']})
lookup unique values for each column, excluding 'a'
unique = df.loc[:, (df.columns != 'a')].nunique()
this results in an empty Series, the index is also empty
unique.index.tolist()
[]
and
unique[unique == 1].index.tolist()
[]
With pandas 0.23.3
create a DataFrame with 3 rows
df = pd.DataFrame({'a': ['A','B','C']})
lookup unique values for each column, excluding 'a'
unique = df.loc[:, (df.columns != 'a')].nunique()
this results in an empty Series, but the index is not empty
unique.index.tolist()
[1,2,3] also: unique[unique == 1].index.tolist() [1,2,3]
Note:
if we have don't have an empty df, the behavior of nunique() seems fine:
df = pd.DataFrame({'a': ['A','B','C'], 'b': [1,1,1]}) unique = df.loc[:, (df.columns != 'a')].nunique()
unique[unique == 1]
b 1 dtype: int64
and
unique[unique == 1].index.tolist()
['b']
Problem description
The change of behavior is a bit disturbing, and seems like it is a bug:nunique()
ends up creating a Series, and it should be a Series of the df columns, but that doesn't seem to be the case here, instead it is picking up the index of the df.
This is likely related to:
I am posting this because in my use case I use the list to drop the columns, but i end up with column names that do not exist in the df
INSTALLED VERSIONS ------------------ commit: None python: 2.7.10.final.0 python-bits: 64 OS: Darwin OS-release: 17.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: None.None pandas: 0.23.3 pytest: None pip: 10.0.1 setuptools: 39.2.0 Cython: None numpy: 1.13.3 scipy: 0.19.1 pyarrow: None xarray: None IPython: None sphinx: 1.5.5 patsy: 0.5.0 dateutil: 2.6.1 pytz: 2015.7 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: None openpyxl: None xlrd: 0.9.4 xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: 2.7.3.2 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None