Pandas core dumps when reading large CSV file using read_csv(..., low_memory=False) · Issue #16798 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

try:
    table = pd.read_csv(csv_file_name, low_memory=False)
except:
    raise

Problem description

From the stacktrace in the core file, pandas seems to be throwing an exception complaining "out of memory" (which it is not, the machine has 64 G of RAM and the interpreter was using maybe 5 G) but, during the cleanup of that exception, attempts to double free the self->error_msg pointer (according to gcc). Results in a SIGSEGV.

Expected Output

Pandas successfully converts the CSV into a dataframe

Output of pd.show_versions()

>>> pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-81-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 36.0.1
Cython: 0.25.2
numpy: 1.13.0
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.4
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.9.6
boto: 2.47.0
pandas_datareader: None

I can provide the source CSV if necessary (though it happens reliably with "large" CSVs, for a definition of "large" I haven't nailed down but in the multi-GB range). Below is the stack trace:

#0 0x00007f278e970532 in __GI___libc_free (mem=0x7f2757218b0e) at malloc.c:2967#1 0x00007f275720fd2a in free_if_not_null (ptr=0x3cf5ee8) at pandas/src/parser/tokenizer.c:94#2 parser_cleanup (self=self@entry=0x3cf5df0) at pandas/src/parser/tokenizer.c:189#3 0x00007f275720ff09 in parser_free (self=0x3cf5df0) at pandas/src/parser/tokenizer.c:285#4 0x00007f27571b0562 in __pyx_pf_6pandas_6parser_10TextReader_4__dealloc__ (__pyx_v_self=0x7f2757166bc8) at pandas/parser.c:6330#5 __pyx_pw_6pandas_6parser_10TextReader_5__dealloc__ (__pyx_v_self=) at pandas/parser.c:6313#6 __pyx_tp_dealloc_6pandas_6parser_TextReader (o=) at pandas/parser.c:45130#7 0x000000000055dbea in dict_dealloc.lto_priv.164 (mp=0x7f2753a0bbc8) at ../Objects/dictobject.c:1594#8 subtype_dealloc () at ../Objects/typeobject.c:1193#9 0x000000000055dbea in dict_dealloc.lto_priv.164 (mp=0x7f2753c765c8) at ../Objects/dictobject.c:1594#10 subtype_dealloc () at ../Objects/typeobject.c:1193#11 0x00000000004e9137 in frame_dealloc.lto_priv () at ../Objects/frameobject.c:431#12 0x0000000000541457 in tb_dealloc.lto_priv.286 (tb=0x7f27537b8048) at ../Python/traceback.c:55#13 0x000000000054146d in tb_dealloc.lto_priv.286 (tb=0x7f27537b8088) at ../Python/traceback.c:54#14 0x000000000054146d in tb_dealloc.lto_priv.286 (tb=0x7f27537b80c8) at ../Python/traceback.c:54#15 0x000000000054146d in tb_dealloc.lto_priv.286 (tb=0x7f27537b8108) at ../Python/traceback.c:54#16 0x0000000000526b99 in PyEval_EvalFrameEx () at ../Python/ceval.c:2132#17 0x0000000000528814 in fast_function (nk=, na=, n=, pp_stack=0x7ffdc3b87590, func=) at ../Python/ceval.c:4803#18 call_function (oparg=, pp_stack=0x7ffdc3b87590) at ../Python/ceval.c:4730#19 PyEval_EvalFrameEx () at ../Python/ceval.c:3236#20 0x0000000000528814 in fast_function (nk=, na=, n=, pp_stack=0x7ffdc3b876c0, func=) at ../Python/ceval.c:4803#21 call_function (oparg=, pp_stack=0x7ffdc3b876c0) at ../Python/ceval.c:4730#22 PyEval_EvalFrameEx () at ../Python/ceval.c:3236#23 0x000000000052d2e3 in _PyEval_EvalCodeWithName () at ../Python/ceval.c:4018#24 0x000000000052dfdf in PyEval_EvalCodeEx () at ../Python/ceval.c:4039#25 PyEval_EvalCode (co=, globals=, locals=) at ../Python/ceval.c:777#26 0x00000000005fd2c2 in run_mod () at ../Python/pythonrun.c:976#27 0x00000000005ff76a in PyRun_FileExFlags () at ../Python/pythonrun.c:929#28 0x00000000005ff95c in PyRun_SimpleFileExFlags () at ../Python/pythonrun.c:396#29 0x000000000063e7d6 in run_file (p_cf=0x7ffdc3b87930, filename=0x1e96260 L"/home/jknupp/lion/components/api/scripts/parquet_export.py", fp=0x1fa1310) at ../Modules/main.c:318#30 Py_Main () at ../Modules/main.c:768#31 0x00000000004cfe41 in main () at ../Programs/python.c:65#32 0x00007f278e90c830 in __libc_start_main (main=0x4cfd60 , argc=2, argv=0x7ffdc3b87b48, init=, fini=, rtld_fini=, stack_end=0x7ffdc3b87b38) at ../csu/libc-start.c:291#33 0x00000000005d5f29 in _start ()