Filtering by exclusion of duplicate rows does not preserve column list for an empty dataframe · Issue #25184 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

import pandas as pd

x_df = pd.DataFrame(columns=['a', 'b']) series = x_df.duplicated(subset=['a'])

list(x_df[~series])

Expected output on Pandas 0.23.4: ['a', 'b']

But, Pandas 0.24.1 returns: []

Problem description

We have been using this approach to remove duplicate rows on a dataframe, where rows are compared by one column only. Everything worked perfectly until we found out that, if the original dataframe is empty, in the result dataframe column list is lost after Pandas upgrade to latest version.

Expected Output

We would expect the column list to be preserved.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.15.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-44-generic
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: None.None

pandas: 0.24.1
pytest: None
pip: 18.0
setuptools: 40.0.0
Cython: 0.22
numpy: 1.12.1
scipy: None
pyarrow: None
xarray: None
IPython: 4.2.0
sphinx: 1.4.4
patsy: None
dateutil: 2.5.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: 2.5.14
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: None
lxml.etree: 3.7.3
bs4: 4.5.3
html5lib: 1.