Segmentation fault when conctructing DataFrame with specified datetime dtype of one column · Issue #5191 · pandas-dev/pandas (original) (raw)
Description
When building a DataFrame with specified column names and dtypes, one might expect one of two possible behaviours:
- The column names and dtypes specs are perfectly cromulent, and Pandas goes on to build the object.
- The column names or dtypes don't match the data shape, or the dtypes are badly specified, and Pandas gives an error message.
Instead, I have encountered a segmentation fault.
Now, it is unclear to me whether my column names spec and dtypes are correctly written and if my data is proper too (see example below). But in any case, it should not crash.
Reproducing
To reproduce, please run:
import pandas as pd import datetime as dt import itertools as it
df_test = pd.DataFrame(data = list(it.repeat((dt.datetime(2001, 1, 1), "aa", 20), 9)), columns=["A", "B", "C"], dtype=[("A","datetime64[h]"), ("B","str"), ("C","int32")])
Modes of failure
I have found that the above script always crashes on my machine (see next section for detailed configuration information). It does it in 2 possible ways:
First mode of failure: hanging
Python 2.7.5 (default, Sep 6 2013, 09:55:21)
[GCC 4.8.1 20130725 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import datetime as dt
>>> import itertools as it
>>>
>>> df_test = pd.DataFrame(data = list(it.repeat((dt.datetime(2001, 1, 1), "aa", 20), 9)),
... columns=["A", "B", "C"],
... dtype=[("A","datetime64[h]"), ("B","str"), ("C","int32")])
*** Error in `python': corrupted double-linked list: 0x0000000001bfd8e0 ***
After that line, the terminal is dead.
Second mode of failure: segfault
Python 2.7.5 (default, Sep 6 2013, 09:55:21)
[GCC 4.8.1 20130725 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import datetime as dt
>>> import itertools as it
>>>
>>> df_test = pd.DataFrame(data = list(it.repeat((dt.datetime(2001, 1, 1), "aa", 20), 9)),
... columns=["A", "B", "C"],
... dtype=[("A","datetime64[h]"), ("B","str"), ("C","int32")])
*** Error in `python2': double free or corruption (!prev): 0x00000000027161d0 ***
======= Backtrace: =========
/usr/lib/libc.so.6(+0x72ecf)[0x7f2bd7ab9ecf]
/usr/lib/libc.so.6(+0x7869e)[0x7f2bd7abf69e]
/usr/lib/libc.so.6(+0x79377)[0x7f2bd7ac0377]
/home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/numpy/core/multiarray.so(_field_transfer_data_free+0x2e)[0x7f2bd634d47e]
/home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/numpy/core/multiarray.so(+0x9a1c9)[0x7f2bd63a61c9]
/home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/numpy/core/multiarray.so(+0xa4a3a)[0x7f2bd63b0a3a]
/home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/numpy/core/multiarray.so(+0xab0a1)[0x7f2bd63b70a1]
/home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/numpy/core/multiarray.so(+0xb838b)[0x7f2bd63c438b]
/home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/numpy/core/multiarray.so(+0xb8643)[0x7f2bd63c4643]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4c2f)[0x7f2bd80ec2ef]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x850)[0x7f2bd80ed290]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4dc9)[0x7f2bd80ec489]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x850)[0x7f2bd80ed290]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4dc9)[0x7f2bd80ec489]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x850)[0x7f2bd80ed290]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4dc9)[0x7f2bd80ec489]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x850)[0x7f2bd80ed290]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x4dc9)[0x7f2bd80ec489]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x850)[0x7f2bd80ed290]
/usr/lib/libpython2.7.so.1.0(+0x6dbdd)[0x7f2bd807cbdd]
/usr/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7f2bd8058c13]
/usr/lib/libpython2.7.so.1.0(+0x5841d)[0x7f2bd806741d]
/usr/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7f2bd8058c13]
/usr/lib/libpython2.7.so.1.0(+0x9de57)[0x7f2bd80ace57]
/usr/lib/libpython2.7.so.1.0(+0x9cbcf)[0x7f2bd80abbcf]
/usr/lib/libpython2.7.so.1.0(PyObject_Call+0x43)[0x7f2bd8058c13]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x1321)[0x7f2bd80e89e1]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x850)[0x7f2bd80ed290]
/usr/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x32)[0x7f2bd80ed392]
/usr/lib/libpython2.7.so.1.0(+0xf708f)[0x7f2bd810608f]
/usr/lib/libpython2.7.so.1.0(PyRun_InteractiveOneFlags+0x140)[0x7f2bd8107fb0]
/usr/lib/libpython2.7.so.1.0(PyRun_InteractiveLoopFlags+0x4e)[0x7f2bd810819e]
/usr/lib/libpython2.7.so.1.0(PyRun_AnyFileExFlags+0x3e)[0x7f2bd81087fe]
/usr/lib/libpython2.7.so.1.0(Py_Main+0xc7f)[0x7f2bd8118c2f]
/usr/lib/libc.so.6(__libc_start_main+0xf5)[0x7f2bd7a68bc5]
python2[0x400741]
======= Memory map: ========
00400000-00401000 r-xp 00000000 08:11 1886483 /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/bin/python2
00600000-00601000 r--p 00000000 08:11 1886483 /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/bin/python2
00601000-00602000 rw-p 00001000 08:11 1886483 /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/bin/python2
012d1000-029b7000 rw-p 00000000 00:00 0 [heap]
7f2bced0d000-7f2bced11000 r-xp 00000000 08:01 923895 /usr/lib/python2.7/lib-dynload/termios.so
7f2bced11000-7f2bcef10000 ---p 00004000 08:01 923895 /usr/lib/python2.7/lib-dynload/termios.so
7f2bcef10000-7f2bcef11000 r--p 00003000 08:01 923895 /usr/lib/python2.7/lib-dynload/termios.so
7f2bcef11000-7f2bcef13000 rw-p 00004000 08:01 923895 /usr/lib/python2.7/lib-dynload/termios.so
7f2bcef13000-7f2bcef26000 r-xp 00000000 08:11 57747 /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/json.so
7f2bcef26000-7f2bcf125000 ---p 00013000 08:11 57747 /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/json.so
7f2bcf125000-7f2bcf126000 r--p 00012000 08:11 57747 /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/json.so
7f2bcf126000-7f2bcf127000 rw-p 00013000 08:11 57747 /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/json.so
7f2bcf127000-7f2bcf171000 r-xp 00000000 08:11 57858 /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/parser.so
7f2bcf171000-7f2bcf370000 ---p 0004a000 08:11 57858 /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/parser.so
7f2bcf370000-7f2bcf371000 r--p 00049000 08:11 57858 /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/parser.so
7f2bcf371000-7f2bcf376000 rw-p 0004a000 08:11 57858 /home/agravier/metahome/.local-common/share/python2.7/venvs/finance64/lib/python2.7/site-packages/pandas/parser.so
7f2bcf376000-7f2bcf377000 rw-p 00000000 00:00 0
7f2bcf377000-7f2bcf3d9000 r-xp 00000000 08:01 798526 /usr/lib/libssl.so.1.0.0
7f2bcf3d9000-7f2bcf5d8000 ---p 00062000 08:01 798526 /usr/lib/libssl.so.1.0.0
7f2bcf5d8000-7f2bcf5dc000 r--p 00061000 08:01 798526 /usr/lib/libssl.so.1.0.0
7f2bcf5dc000-7f2bcf5e3000 rw-p 00065000 08:01 798526 /usr/lib/libssl.so.1.0.0
7f2bcf5e3000-7f2bcf5eb000 r-xp 00000000 08:01 923889 /usr/lib/python2.7/lib-dynload/_ssl.soAborted (core dumped)
Configuration information
Python:
uname -a:
Linux agravier-archvm 3.10.10-1-ARCH #1 SMP PREEMPT Fri Aug 30 11:30:06 CEST 2013 x86_64 GNU/Linux
pip freeze --local:
QSTK==0.2.6
matplotlib==1.3.0
nose==1.3.0
numpy==1.7.1
pandas==0.12.0
pyparsing==2.0.1
python-dateutil==2.1
pytz==2013.7
scikit-learn==0.14.1
scipy==0.12.1
six==1.4.1
yolk==0.4.3
Concluding remarks
Note that in the line that I use to create the data list(it.repeat((dt.datetime(2001, 1, 1), "aa", 20), 9))
, the number of rows has an influence on whether Python crashes. If less than 9, there is the output:
Python 2.7.5 (default, Sep 6 2013, 09:55:21)
[GCC 4.8.1 20130725 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> import datetime as dt
>>> import itertools as it
>>>
>>> df_test = pd.DataFrame(data = list(it.repeat((dt.datetime(2001, 1, 1), "aa", 20), 8)),
... columns=["A", "B", "C"],
... dtype=[("A","datetime64[h]"), ("B","str"), ("C","int32")])
>>> df_test
A B C
0 2001-01-01 00:00:00 (1972-11-04 17:00:00, , 0) (1970-01-01 20:00:00, , 20)
1 2001-01-01 00:00:00 (1972-11-04 17:00:00, , 0) (1970-01-01 20:00:00, , 20)
2 2001-01-01 00:00:00 (1972-11-04 17:00:00, , 0) (1970-01-01 20:00:00, , 20)
3 2001-01-01 00:00:00 (1972-11-04 17:00:00, , 0) (1970-01-01 20:00:00, , 20)
4 2001-01-01 00:00:00 (1972-11-04 17:00:00, , 0) (1970-01-01 20:00:00, , 20)
5 2001-01-01 00:00:00 (1972-11-04 17:00:00, , 0) (1970-01-01 20:00:00, , 20)
6 2001-01-01 00:00:00 (1972-11-04 17:00:00, , 0) (1970-01-01 20:00:00, , 20)
7 2001-01-01 00:00:00 (1972-11-04 17:00:00, , 0) (1970-01-01 20:00:00, , 20)
Now, this output doesn't make much sense to me, it doesn't seem to respect the dtype spec that I give, but it's very possible that I don't understand the dtype spec well and that it's actually perfectly sensible output.