Issue 5308: cannot marshal objects with more than 2**31 elements (original) (raw)

Created on 2009-02-18 22:45 by mark.dickinson, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
marshal_overflow.patch serhiy.storchaka,2013-02-06 15:51 Patch for 3.x review
marshal_overflow-2.7.patch serhiy.storchaka,2013-02-06 15:52 Patch for 2.7 review
marshal_string_leak.patch serhiy.storchaka,2013-07-01 14:23 review
Messages (22)
msg82437 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-02-18 22:45
Two closely related issues in Python/marshal.c, involving writing and reading of variable-length objects (lists, strings, long integers, ...) (1) The w_object function in marshal contains many instances of code like the following: else if (PyList_CheckExact(v)) { w_byte(TYPE_LIST, p); n = PyList_GET_SIZE(v); w_long((long)n, p); for (i = 0; i < n; i++) { w_object(PyList_GET_ITEM(v, i), p); } } On a 64-bit platform there's potential loss of information here either in the cast "(long)n" (if sizeof(long) is 4), or in w_long itself (if sizeof(long) is 8). Note that w_long, despite its name, always writes exactly 4 bytes. There should at least be an exception raised here if n is not in the range [-2**31, 2**31). This would make marshalling of large objects illegal (rather than just wrong). A more involved fix would allow marshalling of objects of size >= 2**31. This would obviously involve changing the marshal format, and would make it impossible to marshal a large object on a 64-bit platform and then unmarshal it on a 32-bit platform. The latter may not really be a problem, since memory considerations ought to rule that out anyway. (2) In r_object (and possibly elsewhere) there are corresponding checks of the form: case TYPE_LIST: n = r_long(p); if (n < 0 | n > INT_MAX) { PyErr_SetString(PyExc_ValueError, "bad marshal data"); retval = NULL; break; } ... if we allow marshalling of objects with more than 2**31-1 elements then these error checks can be relaxed. (And as a matter of principle, INT_MAX isn't really right here: an int might be only 16 bits long on some strange platforms...).
msg82440 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-02-18 23:02
Given that marshal is primarily about supporting pyc files, do we care?
msg82570 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-02-21 17:32
It wouldn't hurt to add the overflow checks though, would it?
msg82583 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-02-21 21:08
Why not. Besides it ought to be fun to write the test case for this one :-)
msg181535 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-06 15:51
Here is a patch for 3.x which adds checks for size overflow (only on 64-bit platform).
msg181536 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-06 15:52
And here is a patch for 2.7.
msg181537 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-06 15:56
> (And as a matter of principle, > INT_MAX isn't really right here: an int might be only 16 bits long on > some strange platforms...). AFAIK Python doesn't support such platforms (and C standard requites at least 32-bit ints). However there are strange platforms with 64-bit ints. That's why I used the explicit constant 0x7FFFFFFF.
msg181554 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-02-06 18:33
Perhaps you could add a bigmem test for this? (assuming you have enough memory somewhere to test it)
msg181560 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-02-06 19:08
> (and C standard requites at least 32-bit ints) No, that's not true: see Annex E of the standard, where the minimum for INT_MAX is declared to be 32767.
msg181561 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-02-06 19:10
Theoretically int may be 16 bits (C99): 5.2.4.2.1 Sizes of integer types <limits.h> -- maximum value for an object of type int INT_MAX +32767 // 2**15 − 1 I agree that Python wouldn't compile on such a platform anyway.
msg181562 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013-02-06 19:11
Hmm, Mark was faster. ;)
msg181564 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-06 19:19
> Perhaps you could add a bigmem test for this? > (assuming you have enough memory somewhere to test it) I think this is possible. I now have some experience in the writing of bigmem tests. ;)
msg181570 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-06 20:10
> No, that's not true: see Annex E of the standard, where the minimum for > INT_MAX is declared to be 32767. My fault. Do I have to support such a case in the code?
msg181572 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-02-06 20:25
> Do I have to support such a case in the code? No, I don't see any need for that: after all, you're making the code *more* portable by replacing those occurrences of INT_MAX with 0x7fffffff. :-)
msg181573 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-06 20:29
Perhaps we should change signatures of w_string() and r_string() -- replace "int" by at least "long".
msg181574 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-02-06 20:38
Hmm, yes, I think that would also make sense. I missed those uses of int. I'll give in, though, and accept that >= 32-bit ints are a de facto standard, even if not de jure.
msg182011 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-02-13 10:15
New changeset 385d982ce641 by Serhiy Storchaka in branch '2.7': Issue #5308: Raise ValueError when marshalling too large object (a sequence http://hg.python.org/cpython/rev/385d982ce641 New changeset e0464fa28c85 by Serhiy Storchaka in branch '3.2': Issue #5308: Raise ValueError when marshalling too large object (a sequence http://hg.python.org/cpython/rev/e0464fa28c85 New changeset b48e1cd2d3be by Serhiy Storchaka in branch '3.3': Issue #5308: Raise ValueError when marshalling too large object (a sequence http://hg.python.org/cpython/rev/b48e1cd2d3be New changeset ea36478a36ee by Serhiy Storchaka in branch 'default': Issue #5308: Raise ValueError when marshalling too large object (a sequence http://hg.python.org/cpython/rev/ea36478a36ee
msg182014 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-02-13 10:34
New changeset 72e75ea25d00 by Serhiy Storchaka in branch '2.7': Fix tests for issue #5308. http://hg.python.org/cpython/rev/72e75ea25d00 New changeset 0407e5e5915e by Serhiy Storchaka in branch '3.3': Cleanup a test for issue #5308. http://hg.python.org/cpython/rev/0407e5e5915e New changeset e45f2fcf202c by Serhiy Storchaka in branch 'default': Cleanup a test for issue #5308. http://hg.python.org/cpython/rev/e45f2fcf202c
msg182015 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-02-13 10:38
> Perhaps you could add a bigmem test for this? > (assuming you have enough memory somewhere to test it) Some tests require more than 252 GiB of memory.
msg192093 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-06-30 23:44
The macro W_SIZE at http://hg.python.org/cpython/file/dbdb6f7f9a1a/Python/marshal.c#l130 has introduced a reference leak at http://hg.python.org/cpython/file/dbdb6f7f9a1a/Python/marshal.c#l390 . Because W_SIZE returns in an error case the reference count of PyObject *utf8 isn't decreased. CID 1040640 (#1 of 1): Resource leak (RESOURCE_LEAK) leaked_storage: Variable "utf8" going out of scope leaks the storage it points to.
msg192131 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-07-01 14:23
Good catch. There is another resource leak for general bytes-like object (the buffer can be not released). Here is a patch.
msg192878 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-07-11 16:21
New changeset 1cf2c42af815 by Serhiy Storchaka in branch '2.7': Fix reference leaks introduced by the patch for issue #5308. http://hg.python.org/cpython/rev/1cf2c42af815 New changeset 8b99f2224c3a by Serhiy Storchaka in branch '3.3': Fix reference leaks introduced by the patch for issue #5308. http://hg.python.org/cpython/rev/8b99f2224c3a New changeset 19ed630d8d75 by Serhiy Storchaka in branch 'default': Fix reference leaks introduced by the patch for issue #5308. http://hg.python.org/cpython/rev/19ed630d8d75
History
Date User Action Args
2022-04-11 14:56:45 admin set github: 49558
2013-07-14 12:39:32 serhiy.storchaka set status: pending -> closedstage: commit review -> resolved
2013-07-11 16:23:08 serhiy.storchaka set status: open -> pendingstage: needs patch -> commit reviewresolution: fixedversions: - Python 3.2
2013-07-11 16:21:54 python-dev set messages: +
2013-07-01 14:23:45 serhiy.storchaka set files: + marshal_string_leak.patchmessages: +
2013-06-30 23:44:31 christian.heimes set status: closed -> opennosy: + christian.heimesmessages: + resolution: fixed -> (no value)stage: resolved -> needs patch
2013-02-15 08:25:27 serhiy.storchaka set status: open -> closedresolution: fixedstage: patch review -> resolved
2013-02-13 10:38:38 serhiy.storchaka set messages: +
2013-02-13 10:34:26 python-dev set messages: +
2013-02-13 10:15:18 python-dev set nosy: + python-devmessages: +
2013-02-12 23:44:44 serhiy.storchaka set assignee: serhiy.storchaka
2013-02-06 20:38:01 mark.dickinson set messages: +
2013-02-06 20:29:13 serhiy.storchaka set messages: +
2013-02-06 20:25:23 mark.dickinson set messages: +
2013-02-06 20:10:10 serhiy.storchaka set messages: +
2013-02-06 19:19:39 serhiy.storchaka set messages: +
2013-02-06 19:11:51 skrah set messages: +
2013-02-06 19:10:56 skrah set nosy: + skrahmessages: +
2013-02-06 19:09:01 eric.snow set nosy: + eric.snow
2013-02-06 19:08:42 mark.dickinson set messages: +
2013-02-06 18:33:29 pitrou set nosy: + pitroumessages: +
2013-02-06 15:56:03 serhiy.storchaka set messages: +
2013-02-06 15:52:09 serhiy.storchaka set files: + marshal_overflow-2.7.patchmessages: +
2013-02-06 15:51:10 serhiy.storchaka set files: + marshal_overflow.patchkeywords: + patchmessages: + stage: needs patch -> patch review
2013-02-06 10:19:22 serhiy.storchaka set nosy: + serhiy.storchakaversions: + Python 3.4
2011-06-26 20:46:43 terry.reedy set stage: needs patchversions: + Python 3.2, Python 3.3, - Python 3.1
2009-06-07 14:54:00 mark.dickinson set assignee: mark.dickinson -> (no value)
2009-02-21 21:08:12 rhettinger set messages: +
2009-02-21 17:32:40 mark.dickinson set priority: normalassignee: mark.dickinsonmessages: +
2009-02-18 23:02:02 rhettinger set nosy: + rhettingermessages: +
2009-02-18 22:45:03 mark.dickinson create