OverflowError in read_csv when specifying certain na_values · Issue #17128 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
import pandas as pd from pandas.compat import StringIO data = StringIO("a,b,c\n1,2,3\n4,5,6\n7,8,9") na_values = ['-inf'] index_col = 0 df = pd.read_csv(data, na_values=na_values, index_col=index_col)
Problem description
read_csv()
fails with the following traceback when specifying certain na_values
with index_col
:
Traceback (most recent call last):
File "run.py", line 9, in <module>
df = pd.read_csv(data, na_values=na_values, index_col=index_col)
File "/home/liauys/Code/pandas/pandas/io/parsers.py", line 660, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/liauys/Code/pandas/pandas/io/parsers.py", line 416, in _read
data = parser.read(nrows)
File "/home/liauys/Code/pandas/pandas/io/parsers.py", line 1010, in read
ret = self._engine.read(nrows)
File "/home/liauys/Code/pandas/pandas/io/parsers.py", line 1837, in read
index, names = self._make_index(data, alldata, names)
File "/home/liauys/Code/pandas/pandas/io/parsers.py", line 1347, in _make_index
index = self._agg_index(index)
File "/home/liauys/Code/pandas/pandas/io/parsers.py", line 1440, in _agg_index
arr, _ = self._infer_types(arr, col_na_values | col_na_fvalues)
File "/home/liauys/Code/pandas/pandas/io/parsers.py", line 1524, in _infer_types
mask = algorithms.isin(values, list(na_values))
File "/home/liauys/Code/pandas/pandas/core/algorithms.py", line 408, in isin
values, _, _ = _ensure_data(values, dtype=dtype)
File "/home/liauys/Code/pandas/pandas/core/algorithms.py", line 74, in _ensure_data
return _ensure_int64(values), 'int64', 'int64'
File "pandas/_libs/algos_common_helper.pxi", line 3227, in pandas._libs.algos.ensure_int64
File "pandas/_libs/algos_common_helper.pxi", line 3232, in pandas._libs.algos.ensure_int64
OverflowError: cannot convert float infinity to integer
Any of the following makes the error go away:
- The index column does contain the said NA value
- Using
na_values
of['inf']
instead of['-inf']
- Not specifying index_col
- Using version 0.19 or older
Expected Output
There should not be any error.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Linux
OS-release: 4.11.9-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.21.0.dev+316.gf2b0bdc9b
pytest: None
pip: 9.0.1
setuptools: 36.2.5
Cython: 0.26
numpy: 1.13.1
scipy: None
pyarrow: None
xarray: None
IPython: 5.4.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None