Issue 2620: Multiple buffer overflows in unicode processing (original) (raw)

Created on 2008-04-11 22:35 by jnferguson, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
python-2.5.2-unicode_resize-utf7.py jnferguson,2008-04-11 22:35
python-2.5.2-unicode_resize-utf8.py jnferguson,2008-04-11 22:35
python-2.5.2-unicode_resize-utf16.py jnferguson,2008-04-11 22:36
issue2620-gps02-patch.txt gregory.p.smith,2008-07-06 06:42 always check for overflow in _New and _Resize
Messages (21)
msg65379 - (view) Author: Justin Ferguson (jnferguson) Date: 2008-04-11 22:35
174 static 175 int unicode_resize(register PyUnicodeObject *unicode, 176 Py_ssize_t length) 177 { [...] 201 202 oldstr = unicode->str; 203 PyMem_RESIZE(unicode->str, Py_UNICODE, length + 1); [...] 209 unicode->str[length] = 0; 210 unicode->length = length; 211 95 #define PyMem_RESIZE(p, type, n) \ 96 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \ 97 ( (p) = (type *) PyMem_REALLOC((p), (n) * sizeof(type)) ) ) The unicode_resize() function acts essentially as a wrapper to realloc(), it accomplishes this via the PyMem_RESIZE() macro which factors the size with the size of the type, in this case it multiplies by two as Py_UNICODE is typedef'd to a wchar_t. When resizing large strings, this results in an incorrect allocation that in turn leads to buffer overflow. This is specific to the Unicode objects, however I would not be surprised to see that other types have this complication as well. Please see attached proof of concepts.
msg65382 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2008-04-11 23:29
You are probably referring to 32-bit platforms. At least on 64-bit platforms, there's no problem with your test cases: >>> # this is to get the unicode_freelist initialized ... # the length of the string must be <= 9 to keep ... # unicode->str from being deallocated and set to ... # NULL ... bla = unicode('IOActive') >>> del bla >>> >>> >>> msg = 'A'*2147483647 >>> >>> msg.decode('utf7') Traceback (most recent call last): File "", line 1, in MemoryError The code does check for success of the realloc(): PyMem_RESIZE(unicode->str, Py_UNICODE, length + 1); if (!unicode->str) { unicode->str = (Py_UNICODE *)oldstr; PyErr_NoMemory(); return -1; } Are you after the integer overflow and the fact that realloc() would (if possible) allocate a buffer smaller than needed ?
msg65384 - (view) Author: Justin Ferguson (jnferguson) Date: 2008-04-12 00:20
Yes, excuse me-- this should be 32-bit specific as I believe Python will not let me get a string long enough to overflow the integer on 64-bit. It's a big string, the only realistic scenario I can see is XML parsing or similar. theory$ ./python -V Python 2.5.2 theory$ cat /proc/cpuinfo | grep -i model model : 4 model name : Intel(R) Pentium(R) 4 CPU 3.00GHz theory$ ./python python-2.5.2-unicode_resize-utf7.py Segmentation fault
msg65386 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-04-12 03:26
Note that in r61458 Neal replaced PyMem_RESIZE with a direct call to PyMem_REALLOC thus eliminating integer overflow check even from the debug builds.
msg65387 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-04-12 03:42
Justin, Where did you find the definition that you cited: 95 #define PyMem_RESIZE(p, type, n) \ 96 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \ 97 ( (p) = (type *) PyMem_REALLOC((p), (n) * sizeof(type)) ) ) ? Current Include/pymem.h does not have the assert: 94 #define PyMem_RESIZE(p, type, n) \ 95 ( (p) = (type *) PyMem_REALLOC((p), (n) * sizeof(type)) ) and it did not change for a while.
msg65389 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-04-12 04:18
The following simple change should be enough for this issue, but I would consider implementing the overflow check in the PyMem_RESIZE and PyMem_NEW macros and de-deprecate their use. =================================================================== --- Objects/unicodeobject.c (revision 62237) +++ Objects/unicodeobject.c (working copy) @@ -261,8 +261,8 @@ it contains). */ oldstr = unicode->str; - unicode->str = PyObject_REALLOC(unicode->str, - sizeof(Py_UNICODE) * (length + 1)); + unicode->str = SIZE_MAX/sizeof(Py_UNICODE) - 1 < length ? NULL : + PyObject_REALLOC(unicode->str, sizeof(Py_UNICODE) * (length + 1)); if (!unicode->str) { unicode->str = (Py_UNICODE *)oldstr; PyErr_NoMemory();
msg65393 - (view) Author: Justin Ferguson (jnferguson) Date: 2008-04-12 06:04
i pulled the Macros out of pymem.h in a Vanille 2.5.2?
msg65395 - (view) Author: Justin Ferguson (jnferguson) Date: 2008-04-12 06:16
sorry didnt mean to change components and version-- I'm typing this from my phone and its being uncooperative at the moment
msg65397 - (view) Author: Justin Ferguson (jnferguson) Date: 2008-04-12 10:54
just fixing the modifications my phone made earlier tonight
msg65398 - (view) Author: Justin Ferguson (jnferguson) Date: 2008-04-12 12:13
Additionally-- the PyMem_NEW()/PyMem_New() macro's need to be fixed: 231 static 232 PyUnicodeObject *_PyUnicode_New(Py_ssize_t length) 233 { 234 register PyUnicodeObject *unicode; 235 236 /* Optimization for empty strings */ 237 if (length == 0 && unicode_empty != NULL) { 238 Py_INCREF(unicode_empty); 239 return unicode_empty; 240 } 241 242 /* Unicode freelist & memory allocation */ 243 if (unicode_freelist) { 244 unicode = unicode_freelist; 245 unicode_freelist = *(PyUnicodeObject **)unicode; 246 unicode_freelist_size--; 247 if (unicode->str) { 248 /* Keep-Alive optimization: we only upsize the buffer, 249 never downsize it. */ 250 if ((unicode->length < length) && 251 unicode_resize(unicode, length) < 0) { 252 PyMem_DEL(unicode->str); 253 goto onError; 254 } 255 } 256 else { 257 unicode->str = PyMem_NEW(Py_UNICODE, length + 1); 258 } 85 #define PyMem_New(type, n) \ 86 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \ 87 ( (type *) PyMem_Malloc((n) * sizeof(type)) ) ) 88 #define PyMem_NEW(type, n) \ 89 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \ 90 ( (type *) PyMem_MALLOC((n) * sizeof(type)) ) ) 91 92 #define PyMem_Resize(p, type, n) \ 93 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \ 94 ( (p) = (type *) PyMem_Realloc((p), (n) * sizeof(type)) ) ) 95 #define PyMem_RESIZE(p, type, n) \ 96 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \ 97 ( (p) = (type *) PyMem_REALLOC((p), (n) * sizeof(type)) ) ) It looks like much of this is reachable from modules, unicode objects, dict objects, set objects, et cetera. Also, if a 2G string across the network seems unrealistic consider what 2 billion A's compresses down to.(Macros again taken from 2.5.2 vanilla)
msg65441 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2008-04-13 12:22
On 32-bit platforms, it's probably best to add a size check. I don't it's worth doing that on 64-bit platforms - overflows are rather unlikely there.
msg65457 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-04-14 03:12
Here's a patch that fixes this by making both Python's malloc and realloc return NULL if (0 <= size <= PY_SSIZE_T_MAX). A side effect of this is that strings on 32bit platforms can no longer be allocated up to 2**31-1 in length as the malloc includes the internal python object structure overhead. The maximum string size becomes 2147483609 with an optimized build on this system. I do not think that is a problem. A 32-bit process by definition can only ever have one such object allocated at a time anyways. ;) any objections?
msg65458 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-04-14 03:29
On Sun, Apr 13, 2008 at 11:12 PM, Gregory P. Smith <report@bugs.python.org> wrote: .. > Here's a patch that fixes this by making both Python's malloc and > realloc return NULL if (0 <= size <= PY_SSIZE_T_MAX). > This will not solve the original problem completely: multiplicative overflow may produce size in the 0 to PY_SSIZE_T_MAX range. Furthemore, malloc and realloc take unsigned arguments and I believe there are cases when they are called with unsigned arguments in python code. Using the proposed macro definitions in these cases will lead to compiler warnings. I don't object to limiting the allowed malloc/realoc size, but the check should be expressed as unsigned comparison: (size_t)(n) > (size_t)PY_SSIZE_T_MAX and multiplications by n > 2 should still be checked for overflow before the result can be used for malloc.
msg69319 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-07-06 06:42
here's an updated patch. It makes PyMem_NEW and PyMem_RESIZE macros always do explicit an overflow check before doing the multiplication. Uses of the the macros have been cleaned up in the couple places I noticed that would leak memory or corrupt their own state by replacing the original pointer to their memory with NULL on error before raising MemoryError. This bug was already present in the existing code if realloc ever returned NULL. (IMHO PyMem_RESIZE & PyMem_Resize are a poorly designed macros. The blind pointer assignment should never have been done within the macro. But given that it is exposed as an API and presumably used by third party extension modules the broken API must be maintained.)
msg70061 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-07-19 23:17
diff up for review on http://codereview.appspot.com/2599
msg70103 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2008-07-21 11:52
The patch looks good to me.
msg70137 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-07-22 04:47
Commited to trunk. r65182. This still needs backporting to release25-maint. (and release24-maint if anyone is maintaining that)
msg70337 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2008-07-28 05:26
Committed revision 65261 for 2.5 Committed revision 65262 for 2.4.
msg92146 - (view) Author: Boya Sun (boya) Date: 2009-09-01 22:59
In Python/pyarena.c: block_new(size_t size) { /* Allocate header and block as one unit. ab_mem points just past header. */ block *b = (block *)malloc(sizeof(block) + size); ... } Should a check for overflow of "size" also be performed before calling "malloc"?
msg105545 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-05-11 21:01
Neal committed changes for 2.4,2.5, so I removed those. 3.0 is dead. Is this an issue for 3.1,3.2 or should it be closed?
msg107420 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-06-09 21:34
Brett, open and fixed are contradictory? for what version did you reopen this?
History
Date User Action Args
2022-04-11 14:56:33 admin set github: 46872
2010-08-03 18:06:26 terry.reedy set status: open -> closedversions: - Python 2.4, Python 3.0
2010-06-09 21:34:23 terry.reedy set messages: +
2010-05-12 11:54:48 r.david.murray set nosy: + vstinner
2010-05-11 21:01:12 terry.reedy set nosy: + terry.reedymessages: +
2009-09-02 20:42:30 brett.cannon set status: closed -> open
2009-09-01 22:59:04 boya set nosy: + boyamessages: +
2008-09-17 01:05:11 brett.cannon set resolution: fixed
2008-07-28 05:26:53 nnorwitz set status: open -> closedmessages: +
2008-07-22 04:47:46 gregory.p.smith set keywords: + patchmessages: + versions: + Python 3.0, - Python 2.6
2008-07-21 11:52:02 lemburg set messages: +
2008-07-19 23:17:52 gregory.p.smith set messages: +
2008-07-06 06:42:31 gregory.p.smith set files: + issue2620-gps02-patch.txtmessages: +
2008-07-06 05:37:19 gregory.p.smith set files: - issue2620-gps01-patch.txt
2008-04-14 03:29:33 belopolsky set messages: +
2008-04-14 03:12:43 gregory.p.smith set files: + issue2620-gps01-patch.txtmessages: +
2008-04-14 01:08:58 gregory.p.smith set assignee: gregory.p.smithversions: + Python 2.6
2008-04-13 12:22:16 lemburg set messages: +
2008-04-12 12:13:56 jnferguson set messages: +
2008-04-12 10:54:57 jnferguson set messages: + components: - Library (Lib), Noneversions: - Python 3.0
2008-04-12 06:16:05 jnferguson set messages: + components: + Library (Lib)
2008-04-12 06:04:21 jnferguson set messages: + components: + Noneversions: + Python 3.0
2008-04-12 04🔞46 belopolsky set messages: +
2008-04-12 03:55:42 gregory.p.smith set priority: highnosy: + gregory.p.smithversions: + Python 2.4
2008-04-12 03:42:10 belopolsky set messages: +
2008-04-12 03:26:51 belopolsky set nosy: + belopolsky, nnorwitzmessages: +
2008-04-12 00:20:34 jnferguson set messages: +
2008-04-11 23:29:29 lemburg set nosy: + lemburgmessages: +
2008-04-11 22:36:03 jnferguson set files: + python-2.5.2-unicode_resize-utf16.py
2008-04-11 22:35:51 jnferguson set files: + python-2.5.2-unicode_resize-utf8.py
2008-04-11 22:35:37 jnferguson create