Core dumped in read_csv (C engine) when reading multiple corrupted gzip files · Issue #12098 · pandas-dev/pandas (original) (raw)
I am using read_csv to read some gzip compressed log files. Some of these files are corrupted and they cannot be uncompressed.
At different iterations in the loop that reads these files my script crashes with a core dumped message:
*** Error in `/usr/bin/python': corrupted double-linked list: 0x0000000003836790 ***
or just:
Segmentation fault (core dumped)
This is a stripped-down version (just looping over one of the corrupted files) of the code where this error occurs:
import pandas as pd
for i in xrange(n):
try:
pd.read_csv(fPath,delim_whitespace=True,header=None, compression='gzip')
except Exception,e:
continue
The traceback of the catched exception is:
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 498, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 285, in _read
return parser.read()
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 747, in read
ret = self._engine.read(nrows)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1197, in read
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 766, in pandas.parser.TextReader.read (pandas/parser.c:7988)
File "pandas/parser.pyx", line 788, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:8244)
File "pandas/parser.pyx", line 842, in pandas.parser.TextReader._read_rows (pandas/parser.c:8970)
File "pandas/parser.pyx", line 829, in pandas.parser.TextReader._tokenize_rows (pandas/parser.c:8838)
File "pandas/parser.pyx", line 1833, in pandas.parser.raise_parser_error (pandas/parser.c:22649)
CParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'.
If I remove the delim_whitespace argument the loop completes without segmentation fault. I tried adding low_memory=False but the program still crashes.
I am using pandas version 0.17.1 on Ubuntu 14.04 OS.
It looks like a similar issue to #5664 but the problems should have been resolved in v0.16.1