Filtering by exclusion of duplicate rows does not preserve column list for an empty dataframe · Issue #25184 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
import pandas as pd
x_df = pd.DataFrame(columns=['a', 'b']) series = x_df.duplicated(subset=['a'])
list(x_df[~series])
Expected output on Pandas 0.23.4: ['a', 'b']
But, Pandas 0.24.1 returns: []
Problem description
We have been using this approach to remove duplicate rows on a dataframe, where rows are compared by one column only. Everything worked perfectly until we found out that, if the original dataframe is empty, in the result dataframe column list is lost after Pandas upgrade to latest version.
Expected Output
We would expect the column list to be preserved.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.15.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-44-generic
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: None.None
pandas: 0.24.1
pytest: None
pip: 18.0
setuptools: 40.0.0
Cython: 0.22
numpy: 1.12.1
scipy: None
pyarrow: None
xarray: None
IPython: 4.2.0
sphinx: 1.4.4
patsy: None
dateutil: 2.5.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: 2.5.14
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: None
lxml.etree: 3.7.3
bs4: 4.5.3
html5lib: 1.