Creating a new row with np.inf as an index causes an OverflowError · Issue #16957 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

make a dataframe with one column

df = pd.DataFrame(columns=[0])

pandas is greate because we cna use arbitrary objects to insert new

rows and columns dynamically.

df.loc[11] = 'a' df.loc['y', 0] = 'b' df.loc[object(), 0] = 'c'

BUT np.inf is not one of these object

df.loc[np.inf, 0] = 'OverflowError'

This raises: OverflowError: cannot convert float infinity to integer

HOWEVER, that doesn't stop us from using np.inf statically.

obj = object() index = pd.Index([11, 'y', obj, np.inf]) df = pd.DataFrame([['a'], ['b'], ['c'], ['d']], index=index) df.loc[np.inf, 0]

This seem to work when we use it to look look up specific entry

df.loc[np.inf, 0]

BUT it breaks when trying to lookup a whole row

df.loc[np.inf]

However, this is doable with other indices

df.loc['y'] df.loc[obj]

Problem description

I came across this bug when using different threshold values to index my DataFrame.
When I tried to dynamically insert a np.inf value, it failed with OverflowError: cannot convert float infinity to integer.

At first I thought that np.inf might just not be a valid hashable object, but {np.inf: 2} is valid code. Furthermore, I found that I could create a pd.Index object with np.inf as a value and then set that to be the index of my DataFrame. When I did that most behavior worked as expected as long as both the row and column were specified in df.loc[r, c]. Calling it with just a row fails when the row is np.inf.

Expected Output

I think it is perfectly sensible that np.inf should be allowed as a valid index.

I think the trouble that is happening is when you try to insert np.inf into an integer or object index it attempts to turn it into an integer. What probably should happen is the index should be converted into a float or object index.

I just tested this idea:

df = pd.DataFrame(columns=[0]) df.index print('df.index = {!r}'.format(df.index)) df.index = Index([], dtype='object') df.loc[3.3, 0] = 3 df.index df.index = Float64Index([3.3], dtype='float64') print('df.index = {!r}'.format(df.index)) df.loc[np.inf, 0] = 3 print(df) 0 3.3000 3 inf 3 df.loc[np.inf] 0 3 Name: inf, dtype: object

It seems that if you ensure the index is a float index, then np.inf becomes a valid key.

So maybe the fix is just doing whatever is done when a float is given to an integer index.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-59-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.20.3
pytest: 3.0.3
pip: 9.0.1
setuptools: 34.1.1
Cython: 0.25.2
numpy: 1.13.1
scipy: 0.19.0
xarray: None
IPython: 5.2.2
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.3
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.5
s3fs: None
pandas_gbq: None
pandas_datareader: None