read_csv errors when low_memory=True, index_col is not None, and nrows=0 · Issue #21141 · pandas-dev/pandas (original) (raw)

Code Sample

import pandas as pd from pandas.compat import StringIO

data = """A,B,C,D,E 2000-01-03 00:00:00,0.980268513777,3.68573087906,-0.364216805298,-1.15973806169,foo 2000-01-04 00:00:00,1.04791624281,-0.0412318367011,-0.16181208307,0.212549316967,bar 2000-01-05 00:00:00,0.498580885705,0.731167677815,-0.537677223318,1.34627041952,baz 2000-01-06 00:00:00,1.12020151869,1.56762092543,0.00364077397681,0.67525259227,qux 2000-01-07 00:00:00,-0.487094399463,0.571454623474,-1.6116394093,0.103468562917,foo2"""

df = pd.read_csv(StringIO(data), low_memory=True, index_col=0, nrows=0)

Problem description

The above code results in TypeError: 'NoneType' object is not iterable. read_csv behaves correctly if low_memory=False, index_col=None or nrows>0.

Traceback:

Traceback (most recent call last):
  File "/home/peter/workspace/ray_env/lib/python3.6/site-packages/pandas/io/parsers.py", line 1848, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 916, in pandas._libs.parsers.TextReader._read_low_memory
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pd_bug.py", line 12, in <module>
    df = pd.read_csv(StringIO(data), low_memory=True, index_col=0, nrows=0)
  File "/home/peter/workspace/ray_env/lib/python3.6/site-packages/pandas/io/parsers.py", line 678, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/peter/workspace/ray_env/lib/python3.6/site-packages/pandas/io/parsers.py", line 446, in _read
    data = parser.read(nrows)
  File "/home/peter/workspace/ray_env/lib/python3.6/site-packages/pandas/io/parsers.py", line 1036, in read
    ret = self._engine.read(nrows)
  File "/home/peter/workspace/ray_env/lib/python3.6/site-packages/pandas/io/parsers.py", line 1855, in read
    dtype=self.kwds.get('dtype'))
  File "/home/peter/workspace/ray_env/lib/python3.6/site-packages/pandas/io/parsers.py", line 3215, in _get_empty_meta
    data = [Series([], dtype=dtype[name]) for name in index_names]
TypeError: 'NoneType' object is not iterable

Expected Output

No error.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-36-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.0
pytest: 3.3.1
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.3
scipy: 1.0.0
pyarrow: 0.8.0
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: 0.4.0
matplotlib: 2.1.1
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: 4.2.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None