BUG: read_csv with empty header row raising · Issue #12494 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

"""Example of Pandas bug.""" from pandas import read_csv

try: from StringIO import StringIO except ImportError: from io import StringIO

s = StringIO(',,')

df = read_csv(s)

print(list(df))

s = StringIO(',,,')

df = read_csv(s)

print(list(df))

Expected Output

Current output:

['Unnamed: 0', 'Unnamed: 1', 'Unnamed: 2']
Traceback (most recent call last):
  File "pandas_bug4.py", line 17, in <module>
    df = read_csv(s)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/io/parsers.py", line 498, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/io/parsers.py", line 275, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/io/parsers.py", line 590, in __init__
    self._make_engine(self.engine)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/io/parsers.py", line 731, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/io/parsers.py", line 1103, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 515, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4948)
  File "pandas/parser.pyx", line 632, in pandas.parser.TextReader._get_header (pandas/parser.c:6493)
  File "pandas/parser.pyx", line 829, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:8838)
  File "pandas/parser.pyx", line 1833, in pandas.parser.raise_parser_error (pandas/parser.c:22649)
pandas.parser.CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

Expected output:

['Unnamed: 0', 'Unnamed: 1', 'Unnamed: 2']
['Unnamed: 0', 'Unnamed: 1', 'Unnamed: 2', 'Unnamed: 3']

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.1
nose: None
pip: 7.1.2
setuptools: 18.3.2
Cython: None
numpy: 1.10.1
scipy: 0.16.1
statsmodels: None
IPython: 4.0.1
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.0
openpyxl: 2.3.2
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Jinja2: 2.8
None