OverflowError in read_csv when specifying certain na_values · Issue #17128 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

import pandas as pd from pandas.compat import StringIO data = StringIO("a,b,c\n1,2,3\n4,5,6\n7,8,9") na_values = ['-inf'] index_col = 0 df = pd.read_csv(data, na_values=na_values, index_col=index_col)

Problem description

read_csv() fails with the following traceback when specifying certain na_values with index_col:

Traceback (most recent call last):
  File "run.py", line 9, in <module>
    df = pd.read_csv(data, na_values=na_values, index_col=index_col)
  File "/home/liauys/Code/pandas/pandas/io/parsers.py", line 660, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/liauys/Code/pandas/pandas/io/parsers.py", line 416, in _read
    data = parser.read(nrows)
  File "/home/liauys/Code/pandas/pandas/io/parsers.py", line 1010, in read
    ret = self._engine.read(nrows)
  File "/home/liauys/Code/pandas/pandas/io/parsers.py", line 1837, in read
    index, names = self._make_index(data, alldata, names)
  File "/home/liauys/Code/pandas/pandas/io/parsers.py", line 1347, in _make_index
    index = self._agg_index(index)
  File "/home/liauys/Code/pandas/pandas/io/parsers.py", line 1440, in _agg_index
    arr, _ = self._infer_types(arr, col_na_values | col_na_fvalues)
  File "/home/liauys/Code/pandas/pandas/io/parsers.py", line 1524, in _infer_types
    mask = algorithms.isin(values, list(na_values))
  File "/home/liauys/Code/pandas/pandas/core/algorithms.py", line 408, in isin
    values, _, _ = _ensure_data(values, dtype=dtype)
  File "/home/liauys/Code/pandas/pandas/core/algorithms.py", line 74, in _ensure_data
    return _ensure_int64(values), 'int64', 'int64'
  File "pandas/_libs/algos_common_helper.pxi", line 3227, in pandas._libs.algos.ensure_int64
  File "pandas/_libs/algos_common_helper.pxi", line 3232, in pandas._libs.algos.ensure_int64
OverflowError: cannot convert float infinity to integer

Any of the following makes the error go away:

Expected Output

There should not be any error.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Linux
OS-release: 4.11.9-1-ARCH
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.21.0.dev+316.gf2b0bdc9b
pytest: None
pip: 9.0.1
setuptools: 36.2.5
Cython: 0.26
numpy: 1.13.1
scipy: None
pyarrow: None
xarray: None
IPython: 5.4.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None