Creating a new row with np.inf as an index causes an OverflowError · Issue #16957 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
make a dataframe with one column
df = pd.DataFrame(columns=[0])
pandas is greate because we cna use arbitrary objects to insert new
rows and columns dynamically.
df.loc[11] = 'a' df.loc['y', 0] = 'b' df.loc[object(), 0] = 'c'
BUT np.inf is not one of these object
df.loc[np.inf, 0] = 'OverflowError'
This raises: OverflowError: cannot convert float infinity to integer
HOWEVER, that doesn't stop us from using np.inf statically.
obj = object() index = pd.Index([11, 'y', obj, np.inf]) df = pd.DataFrame([['a'], ['b'], ['c'], ['d']], index=index) df.loc[np.inf, 0]
This seem to work when we use it to look look up specific entry
df.loc[np.inf, 0]
BUT it breaks when trying to lookup a whole row
df.loc[np.inf]
However, this is doable with other indices
df.loc['y'] df.loc[obj]
Problem description
I came across this bug when using different threshold values to index my DataFrame.
When I tried to dynamically insert a np.inf value, it failed with OverflowError: cannot convert float infinity to integer
.
At first I thought that np.inf might just not be a valid hashable object, but {np.inf: 2}
is valid code. Furthermore, I found that I could create a pd.Index
object with np.inf
as a value and then set that to be the index of my DataFrame. When I did that most behavior worked as expected as long as both the row and column were specified in df.loc[r, c]
. Calling it with just a row fails when the row is np.inf
.
Expected Output
I think it is perfectly sensible that np.inf
should be allowed as a valid index.
I think the trouble that is happening is when you try to insert np.inf
into an integer or object index it attempts to turn it into an integer. What probably should happen is the index should be converted into a float or object index.
I just tested this idea:
df = pd.DataFrame(columns=[0]) df.index print('df.index = {!r}'.format(df.index)) df.index = Index([], dtype='object') df.loc[3.3, 0] = 3 df.index df.index = Float64Index([3.3], dtype='float64') print('df.index = {!r}'.format(df.index)) df.loc[np.inf, 0] = 3 print(df) 0 3.3000 3 inf 3 df.loc[np.inf] 0 3 Name: inf, dtype: object
It seems that if you ensure the index is a float index, then np.inf
becomes a valid key.
So maybe the fix is just doing whatever is done when a float is given to an integer index.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-59-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.20.3
pytest: 3.0.3
pip: 9.0.1
setuptools: 34.1.1
Cython: 0.25.2
numpy: 1.13.1
scipy: 0.19.0
xarray: None
IPython: 5.2.2
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.3
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.5
s3fs: None
pandas_gbq: None
pandas_datareader: None