BUG: dropna('rows') changes index type even when no row was dropped · Issue #41965 · pandas-dev/pandas (original) (raw)
after dropna('rows'), the dataframe index changes from RangeIndex to Int64Index even when nothing changed in the dataframe.
reset_index after dropna will fix it, but it should be avoided in the first place.
Code Sample, a copy-pastable example
df = pd.DataFrame ({'a':range(1000)}) df.info()
df = df.dropna ('rows') print ('\nafter dropna\n') df.info()
output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 1000 non-null int64
dtypes: int64(1)
memory usage: 7.9 KB
after dropna
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1000 entries, 0 to 999
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 1000 non-null int64
dtypes: int64(1)
memory usage: 15.6 KB
Problem description
changing from RangeIndex to Int64Index increases the dataframe size, for no good reason.
Output of pd.show_versions()
commit : db08276
python : 3.8.5.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_Israel.1252
pandas : 1.1.3
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.4
setuptools : 50.3.1.post20201107
Cython : 0.29.21
pytest : 6.1.1
hypothesis : None
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.1
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.10.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.8.3
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : 1.3.20
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.51.2