This was mentioned during the review of issue #9410 (http://codereview.appspot.com/1694050/diff/2001/3001#newcode347), however we forgot to fix this. The new array-based memo for the Unpickler class assumes incorrectly that memo indices are always contiguous. This is not the case. And due to this, the following pickle will cause Unpickler to use about 3GB of memory to store the memo array. ./python -c "import pickle; pickle.loads(b'\x80\x02]r\xff\xff\xff\x06.')" To fix this, we can add code to fall-back to a dictionary-based memo when the memo keys are not contiguous.
I don't think there's any point doing this. Pickle is insecure by construction; it shouldn't crash when used legitimately, but trying to make it robust in the face of hand-crafted pickle strings sounds like an uphill battle (*). (*) e.g. http://nadiana.com/python-pickle-insecure
As an example of malicious pickle causing "excessive" memory usage, you can simply write: >>> s = b'\x80\x03cbuiltins\nbytearray\nq\x00J\x00\x00\x00\x7f\x85q\x01Rq\x02.' >>> _ = pickle.loads(s) This will allocate an almost 2GB bytearray. You can of course change the size as you like. Here is the disassembly: >>> pickletools.dis(s) 0: \x80 PROTO 3 2: c GLOBAL 'builtins bytearray' 22: q BINPUT 0 24: J BININT 2130706432 29: \x85 TUPLE1 30: q BINPUT 1 32: R REDUCE 33: q BINPUT 2 35: . STOP highest protocol among opcodes = 2 Therefore, I would recommend closing this issue.
I was going to say this method http://docs.python.org/dev/py3k/library/pickle.html#restricting-globals could be used to prevent this kind of attack on bytearray. But, I came up with this fun thing: pickle.loads(b'\x80\x03cbuiltins\nlist\ncbuiltins\nrange\nJ\xff\xff\xff\x03\x85R\x85R.') Sigh... you are right about pickle being insecure by design. The only solution is to use HMAC to check the integrity and the authenticity of incoming pickles.