Issue 7829: dis module documentation gives no indication of the dangers of bytecode inspection (original) (raw)

Created on 2010-02-01 13:25 by exarkun, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (8)

msg98661 - (view)

Author: Jean-Paul Calderone (exarkun) * (Python committer)

Date: 2010-02-01 13:25

From python-dev:

On Fri, Jan 29, 2010 at 15:04, <exarkun@twistedmatrix.com> wrote:

On 10:47 pm, tjreedy@udel.edu wrote:

On 1/29/2010 4:19 PM, Collin Winter wrote:

On Fri, Jan 29, 2010 at 7:22 AM, Nick Coghlan<ncoghlan@gmail.com> wrote:

Agreed. We originally switched Unladen Swallow to wordcode in our 2009Q1 release, and saw a performance improvement from this across the board. We switched back to bytecode for the JIT compiler to make upstream merger easier. The Unladen Swallow benchmark suite should provided a thorough assessment of the impact of the wordcode -> bytecode switch. This would be complementary to a JIT compiler, rather than a replacement for it.

I would note that the switch will introduce incompatibilities with libraries like Twisted. IIRC, Twisted has a traceback prettifier that removes its trampoline functions from the traceback, parsing CPython's bytecode in the process. If running under CPython, it assumes that the bytecode is as it expects. We broke this in Unladen's wordcode switch. I think parsing bytecode is a bad idea, but any switch to wordcode should be advertised widely.

Several years, there was serious consideration of switching to a registerbased vm, which would have been even more of a change. Since I learned 1.4, Guido has consistently insisted that the CPython vm is not part of the language definition and, as far as I know, he has rejected any byte- code hackery in the stdlib. While he is not one to, say, randomly permute the codes just to frustrate such hacks, I believe he has always considered vm details private and subject to change and any usage thereof 'at one's own risk'.

Language to such effect might be a useful addition to this page (amongst others, perhaps):

http://docs.python.org/library/dis.html

which very clearly and helpfully lays out quite a number of APIs which can be used to get pretty deep into the bytecode. If all of this is subject to be discarded at the first sign that doing so might be beneficial for some reason, don't keep it a secret that people need to join python-dev to learn.

Can you file a bug and assign it to me?

-Brett

msg98906 - (view)

Author: Terry J. Reedy (terry.reedy) * (Python committer)

Date: 2010-02-05 20:44

The doc begins "30.12. dis — Disassembler for Python bytecode The dis module supports the analysis of Python bytecode by disassembling it. Since there is no Python assembler, this module defines the Python assembly language. The Python bytecode which this module takes as an input is defined in the file Include/opcode.h and used by the compiler and the interpreter."

This goes back to when python.exe (CPython) was the only implementation. "Python bytecode" is no longer appropriate. It should be changed to CPython bytecode. My suggestion for a possible update:

30.12. dis — Disassembler for CPython bytecode CPython currently compiles Python source code to a custom bytecode that is defined by the CPytyon source file Include/opcode.h and explained below. While such implementation details are subject to change in any CPython x.y version, the dis module supports the analysis of current bytecode by disassembling it to a format similar to assembly language."

Calling it an actual assembly language, as the current doc does, implies to me that there is/should be an assembler (which Guido has said there should not be).

"30.12.1. Python Bytecode Instructions The Python compiler ..."

Python -> CPython

In the glossary: "bytecode Python source code is compiled into bytecode, the internal representation of a Python program in the interpret" => something like "bytecode CPython currently compiles Python source code to an internal bytecode representation that it uses to execute the program. Some other implementations do something similar."

These suggestions touch on the larger issue of differentiating and disentangling language doc from CPython implementation doc. I support this even though I have never used any of the other implementations.

msg109126 - (view)

Author: Terry J. Reedy (terry.reedy) * (Python committer)

Date: 2010-07-02 20:11

Brett, should this be reassigned to docs@python? I gave a suggested text change months ago. The need for a change like this was mentioned again today on pydev in the thread "Can Python implementations reject semantically invalid expressions?". Since the current deficiency has been noted repeatedly, I think the priority should be at least normal.

Unless someone suggests something even better, I think my proposed replacememt should be accepted, formatted, and applied. I think it is definitely better than the current text.

msg109133 - (view)

Author: Brett Cannon (brett.cannon) * (Python committer)

Date: 2010-07-02 21:20

Sorry, Terry, I didn't even notice the corrections in the issue since they were inlined in a comment instead of as an attached file. I will have a look right now.

msg109143 - (view)

Author: Brett Cannon (brett.cannon) * (Python committer)

Date: 2010-07-02 22:04

Fixed in r82456. I decided to make a warning directive so that it's really obvious that people should not consider the dis module and bytecode as stable.

Once Python 2.7.0final is out the door I will backport the patch.

msg109277 - (view)

Author: Terry J. Reedy (terry.reedy) * (Python committer)

Date: 2010-07-05 00:01

I believe Brett meant to leave this open until he finished it by backporting to 2.7 (which is now reopened for patches). Otherwise, it might get forgotten.

Sidenote: this is the issue I referred to in #9132 re the 'patch' keyword.

msg109278 - (view)

Author: Brett Cannon (brett.cannon) * (Python committer)

Date: 2010-07-05 00:28

On Sun, Jul 4, 2010 at 17:01, Terry J. Reedy <report@bugs.python.org> wrote:

I believe Brett meant to leave this open until he finished it by backporting to 2.7 (which is now reopened for patches). Otherwise, it might get forgotten.

Exactly right, Terry.

msg111030 - (view)

Author: Brett Cannon (brett.cannon) * (Python committer)

Date: 2010-07-21 09:52

r83012 for 3.1 r83013 for 2.7

History

Date

User

Action

Args

2022-04-11 14:56:57

admin

set

github: 52077

2010-07-21 09:52:29

brett.cannon

set

status: open -> closed

messages: +

2010-07-19 02:44:39

belopolsky

set

keywords: - needs review

2010-07-05 00:28:05

brett.cannon

set

messages: +

2010-07-05 00:01:28

terry.reedy

set

status: closed -> open

messages: +

2010-07-04 14:13:07

eric.araujo

set

status: open -> closed

2010-07-02 22:04:37

brett.cannon

set

resolution: fixed
stage: patch review -> resolved
messages: +
versions: + Python 2.7

2010-07-02 21:20:18

brett.cannon

set

keywords: + needs review

messages: +
stage: needs patch -> patch review

2010-07-02 20:11:32

terry.reedy

set

priority: low -> normal

messages: +

2010-02-06 15:23:33

eric.araujo

set

nosy: + eric.araujo

2010-02-05 20:44:27

terry.reedy

set

nosy: + terry.reedy
messages: +

2010-02-03 08:20:11

brett.cannon

set

priority: low
stage: needs patch

2010-02-01 13:28:10

pitrou

set

assignee: georg.brandl -> brett.cannon

nosy: + brett.cannon

2010-02-01 13:25:14

exarkun

create