Index name assignment resolution logic changes depending on whether DataFrame has any rows · Issue #31368 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
import pandas as pd
def get_name_of_assigned_index(df_size, series_size): df = pd.DataFrame({}, index=pd.RangeIndex(df_size, name='df_index')) series = pd.Series(1.23, index=pd.RangeIndex(series_size, name='series_index'))
df['series'] = series
return df.index.name
Problem description
get_name_of_assigned_index(n, m)
returns 'df_index'
unless n = 0
, in which case it returns 'series_index'
.
I don't know which of these is necessarily the better behavior, but the resulting index name ought to be consistent regardless of how many - or few - rows are in the DataFrame. Subsequent operations may rely on expectations about that name, so this behavior, wherein the name changes as an unintuitive consequence of a rowless frame, could cause bugs.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.7.final.0
python-bits: 64
OS: Linux
OS-release: 4.3.5-smp-821.23.0.0
machine: x86_64
processor:
byteorder: little
LC_ALL: en_US.UTF-8
LANG: None
LOCALE: en_US.UTF-8
pandas: 0.24.2
pytest: None
pip: None
setuptools: unknown
Cython: None
numpy: 1.16.4
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 2.0.0
sphinx: None
patsy: 0.4.1
dateutil: 2.8.1
pytz: 2019.3
blosc: None
bottleneck: None
tables: 3.5.2
numexpr: 2.6.10dev0
feather: None
matplotlib: 3.0.3
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10.3
s3fs: None
fastparquet: None
pandas_gbq: 0+unknown
pandas_datareader: None
gcsfs: None