read_csv errors when low_memory=True, index_col is not None, and nrows=0 · Issue #21141 · pandas-dev/pandas (original) (raw)
Code Sample
import pandas as pd from pandas.compat import StringIO
data = """A,B,C,D,E 2000-01-03 00:00:00,0.980268513777,3.68573087906,-0.364216805298,-1.15973806169,foo 2000-01-04 00:00:00,1.04791624281,-0.0412318367011,-0.16181208307,0.212549316967,bar 2000-01-05 00:00:00,0.498580885705,0.731167677815,-0.537677223318,1.34627041952,baz 2000-01-06 00:00:00,1.12020151869,1.56762092543,0.00364077397681,0.67525259227,qux 2000-01-07 00:00:00,-0.487094399463,0.571454623474,-1.6116394093,0.103468562917,foo2"""
df = pd.read_csv(StringIO(data), low_memory=True, index_col=0, nrows=0)
Problem description
The above code results in TypeError: 'NoneType' object is not iterable
. read_csv
behaves correctly if low_memory=False
, index_col=None
or nrows>0
.
Traceback:
Traceback (most recent call last):
File "/home/peter/workspace/ray_env/lib/python3.6/site-packages/pandas/io/parsers.py", line 1848, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 916, in pandas._libs.parsers.TextReader._read_low_memory
StopIteration
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "pd_bug.py", line 12, in <module>
df = pd.read_csv(StringIO(data), low_memory=True, index_col=0, nrows=0)
File "/home/peter/workspace/ray_env/lib/python3.6/site-packages/pandas/io/parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/peter/workspace/ray_env/lib/python3.6/site-packages/pandas/io/parsers.py", line 446, in _read
data = parser.read(nrows)
File "/home/peter/workspace/ray_env/lib/python3.6/site-packages/pandas/io/parsers.py", line 1036, in read
ret = self._engine.read(nrows)
File "/home/peter/workspace/ray_env/lib/python3.6/site-packages/pandas/io/parsers.py", line 1855, in read
dtype=self.kwds.get('dtype'))
File "/home/peter/workspace/ray_env/lib/python3.6/site-packages/pandas/io/parsers.py", line 3215, in _get_empty_meta
data = [Series([], dtype=dtype[name]) for name in index_names]
TypeError: 'NoneType' object is not iterable
Expected Output
No error.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.13.0-36-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.0
pytest: 3.3.1
pip: 9.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.3
scipy: 1.0.0
pyarrow: 0.8.0
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: 0.4.0
matplotlib: 2.1.1
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: 4.2.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None