[Python-Dev] Is core dump always a bug? Advice requested (original) (raw)

Christian Tismer tismer at stackless.com
Thu May 13 08:58:30 EDT 2004


Michael Hudson wrote:

...

I don't know what we can do about this. Armin suggested another hack: stick the address of a stack variable in PyMain() in a global and compare the address of another stack variable when you want to see how much stack you have left. Obvious problems include knowing what's safe and which direction the stack is growing in... Even more scarily, what SBCL (a Common Lisp implementation) does is mprotect() a VM page and the end of the stack and deal with overflow in a SIGSEGV handler. It's hard to see what else could be really safe (have I mentioned that I hate C recently?).

Option 3, I guess, is integrate stackless :-)

Just as a note: Even Stackless can get into deep recursions and then uses a hack similar to Armin's suggestion. As a side effect, this made it quite simple to convince cPickle to pickle very deeply nested structures.

On verification:

I think I'm all against writing a bytecode verifier, because everything needed is already there. Hartmut Goebel has written a nice Python decompyler, based upon John Aycock's spark and prior work. It produces output that often looks better than the original source. And they verify the decompiled bytecode by compiling it again. The drawback is that this appears to be no longer an open source project, see http://www.crazy-compilers.com/decompyle/ Maybe we should talk to Hartmut...

On sending code over the network:

Bytecode verification is fine, but you don't want to execute foreign bytecode, even if it is valid. This is like executing any binary program, which might do anything, being Python or not. Since people are going to send programs over the network, we will need some way to exchange compiled code in a trusted manner, and I believe this take more considerations which is a matter for the crypto people.

Anyway, here is

My Proposal (TM)

Let's assume that you don't trust any bytecode that has not yet been verified for your machine. Further, you generate a private key for your machine, or yourself.

For .pyc archives which you create yourself, your private key is used to initialize a sha digest, and then the .pyc is run through the digest, and the digest added to the .pyc. When loading any .pyc, your private key is used again to compute the digest, and the result verified against the digest which is in the pyc (at its end I guess).

For code objects which come over a network, I suggest a similar check, given that you have gotten the other side's key and you can use it to verify foreign code. This works in trusted networks, only. In public networks you would need public/private key pairs... much harder.

Now, any bytecode that is not yet verified can be run through decompyle just once, and the result is compiled again, signed with your key, and stored. This is more than requested, since it doesn't allow for any bytecode sequence, but just something that has an existing possible source code. Well, actually I think we want this, because bytecode is an optimization artifact.

Summarizing, I think to add SHA keys to bytecode and .pyc files, at least as an option, makes some sense and is fast to check on every external access to bytecode. Verification of healthy code on the VM level could be made easy with decompyle.

cheers - chris

Christian Tismer :^) <mailto:tismer at stackless.com> Mission Impossible 5oftware : Have a break! Take a ride on Python's Johannes-Niemeyer-Weg 9a : Starship http://starship.python.net/ 14109 Berlin : PGP key -> http://wwwkeys.pgp.net/ work +49 30 89 09 53 34 home +49 30 802 86 56 mobile +49 173 24 18 776 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/



More information about the Python-Dev mailing list