| msg76119 - (view) |
Author: Giuseppe Ottaviano (ot) |
Date: 2008-11-20 17:36 |
| Hi all, trying to compile Python 2.6 I got a segmentation fault while byte-compiling the modules. The segmentation fault happened in ast_for_atom, parsing an Unicode entity. I found out that another problem prevented unicodedata to be imported, so unicode_decode raised an exception. I think the problem is in the following lines: if (PyErr_ExceptionMatches(PyExc_UnicodeError)){ PyObject *type, *value, *tback, *errstr; PyErr_Fetch(&type, &value, &tback); errstr = ((PyUnicodeErrorObject *)value)->reason; I'm not an expert of CPython internals, but the exception is raised with PyErr_SetString, so value is a PyStringObject, and that cast is invalid. Changing the last line to errstr = value; everything works. The patch is attached. |
|
|
| msg76124 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2008-11-20 18:26 |
| Can you provide the code that caused the seg fault? |
|
|
| msg76125 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2008-11-20 18:39 |
| I think I got the point: to decode strings like "\N{CHARACTER NAME}" PyUnicode_DecodeUnicodeEscape imports the unicodedata module. If this fails, PyErr_SetString(PyExc_UnicodeError, "some message") is called. The exception will eventually be caught by ast_for_atom (in Python/ast.c), but the exception there is not normalized: type is PyExc_UnicodeError when the value is a string. Hence the invalid cast. The exception cannot be normalized there: UnicodeError.__init__ needs 5 arguments. I think the error was to call PyErr_SetString in the first place. The attached patch replaces it with PyErr_SetObject and a full UnicodeDecodeError object. This deserves a unit test, but I wonder how to reliably make the import fail. |
|
|
| msg76130 - (view) |
Author: Giuseppe Ottaviano (ot) |
Date: 2008-11-20 19:08 |
| @amaury: Yes, this is exactly what I noticed. I didn't know how to create an instance of a PyUnicodeErrorObject. BTW, isn't PyErr_SetString used throughout the code? Should all those calls replaced with PyErr_SetObject? @benjamin: The bug can be easily reproduced: brian:tmp ot$ echo 'raise Exception()' > unicodedata.py brian:tmp ot$ python2.6 -c "print u'\N{SOFT HYPHEN}'" Segmentation fault I don't know if this can be used as an unit test. |
|
|
| msg76163 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2008-11-21 03:42 |
| Here's an alternative patch which simply calls PyObject_Str on the value of the exception. It also has a test. |
|
|
| msg76166 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2008-11-21 08:39 |
| Yes, it's better to simply call str() on the exception. I was confused by the fact that UnicodeError is a simple exception, with a single message, unlike UnicodeDecodeError which has more attributes. About the patch: - The test does not work if test_ucn.py is run before: the unicodedata module is imported only once on the first \N{} decoding. You could try to run the test in a subprocess. - In ast.c, errstr should be DECREF'd. |
|
|
| msg76206 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2008-11-21 22:04 |
| This new patch address review issues. |
|
|
| msg76210 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2008-11-21 22:21 |
| Everything looks good. (20 lines of a complex test for two simple lines of code that nobody will ever run... unit tests is a religion) |
|
|
| msg76211 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2008-11-21 22:27 |
| On Fri, Nov 21, 2008 at 4:21 PM, Amaury Forgeot d'Arc <report@bugs.python.org> wrote: > > Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment: > > Everything looks good. Thanks for the review. Fixed in r67320. > > (20 lines of a complex test for two simple lines of code that nobody > will ever run... unit tests is a religion) One can't say we didn't try. :) |
|
|