Issue 23367: integer overflow in unicodedata.normalize (original) (raw)
Bug
---
static PyObject*
unicodedata_normalize(PyObject *self, PyObject *args)
{
...
if (strcmp(form, "NFKC") == 0) {
if (is_normalized(self, input, 1, 1)) {
Py_INCREF(input);
return input;
}
return nfc_nfkc(self, input, 1);
We need to pass the is_normalized() check (repeated \xa0 char takes care of
that). nfc_nfkc calls:
static PyObject*
nfd_nfkd(PyObject *self, PyObject *input, int k)
{
...
Py_ssize_t space, isize;
...
isize = PyUnicode_GET_LENGTH(input);
/* Overallocate at most 10 characters. */
space = (isize > 10 ? 10 : isize) + isize;
osize = space;
1 output = PyMem_Malloc(space * sizeof(Py_UCS4));
1. if isize=2^30, then space=2^30+10, so space*sizeof(Py_UCS4)=(2^30+10)*4 ==
40 (modulo 2^32), so PyMem_Malloc allocates buffer too small to hold the
result.
Crash
-----
nfd_nfkd (self=<module at remote 0x4056e574>, input='...', k=1) at /home/p/Python-3.4.1/Modules/unicodedata.c:552
552 stackptr = 0;
(gdb) n
553 isize = PyUnicode_GET_LENGTH(input);
(gdb) n
555 space = (isize > 10 ? 10 : isize) + isize;
(gdb) n
556 osize = space;
(gdb) n
557 output = PyMem_Malloc(space * sizeof(Py_UCS4));
(gdb) print space
$9 = 1073741834
(gdb) print space*4
$10 = 40
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x40579cbb in nfd_nfkd (self=<module at remote 0x4056e574>, input='', k=1) at /home/p/Python-3.4.1/Modules/unicodedata.c:614
614 output[o++] = code;
OS info
-------
% ./python -V
Python 3.4.1
% uname -a
Linux ubuntu 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 15:31:16 UTC 2013 i686 i686 i386 GNU/Linux
import unicodedata as ud s="\xa0"*(2**30) ud.normalize("NFKC", s)
True, but that could change and is not true in Python 2. I suppose we could revert the change and add a static assertion. On Mon, Mar 2, 2015, at 14:24, Serhiy Storchaka wrote:
Serhiy Storchaka added the comment:
Because isize is the size of real PyUnicode object. It's maximal value is PY_SSIZE_T_MAX - sizeof(PyASCIIObject) - 1.
Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23367>