Issue 26448: dis.findlabels ignores EXTENDED_ARG (original) (raw)

Created on 2016-02-27 03:54 by eric.fahlgren, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (24)

msg260913 - (view)

Author: Eric Fahlgren (eric.fahlgren) *

Date: 2016-02-27 03:54

When trying out dis.dis on some synthetically long functions, I noted that spurious branch targets were being generated in the output. First one is at address 8:

157 0 LOAD_CONST 1 (1) 3 DUP_TOP 4 STORE_FAST 0 (z) 7 DUP_TOP >> 8 STORE_FAST 1 (a) 11 DUP_TOP

I dug into findlabels and notices that it pays no attention to EXTENDED_ARG. The fix is pretty simple, basically copy pasta from dis._get_instructions_bytes, at line 369, in the 3.5.1 release code add all the "extended_arg" bits:

extended_arg = 0
while i < n:
    op = code[i]
    i = i+1
    if op >= HAVE_ARGUMENT:
        arg = code[i] + code[i+1]*256 + extended_arg
        extended_arg = 0
        i = i+2
        if op == EXTENDED_ARG:
             extended_arg = arg*65536
        label = -1

msg260914 - (view)

Author: Barun Parruck (Barun Parruck) *

Date: 2016-02-27 04:01

I'll take a look at it! :) Would you like unit tests as well?

msg260915 - (view)

Author: Barun Parruck (Barun Parruck) *

Date: 2016-02-27 04:12

Hi, to check if I've done things right, which functions did you try out dis.dis on?

msg260916 - (view)

Author: Eric Fahlgren (eric.fahlgren) *

Date: 2016-02-27 04:17

My test case:

def long(): z = a = b = c = d = e = f = g = h = 1 while x: x = x if x and x or not x else x above line repeated 2999 more times

import dis print(dis.findlabels(long.code.co_code)[:10])

Buggy output: [35510, 35509, 62, 69, 78, 81, 93, 100, 109, 112]

Correct output: [101046, 101045, 62, 69, 78, 81, 93, 100, 109, 112]

msg260917 - (view)

Author: Eric Fahlgren (eric.fahlgren) *

Date: 2016-02-27 04:20

Our paths crossed, I don't know exactly how you'd add a test case for this, maybe construct the monster function in a string, eval the string, the use the synthesized function in dis.findlabels?

msg260918 - (view)

Author: Eric Fahlgren (eric.fahlgren) *

Date: 2016-02-27 04:34

Well, now that I'm thinking about it, you could synthesize a bytecode stream trivially and have a much better test. This is completely off the top of my head, so take it is guaranteed to (probably) not work as written, but it should get you started:

from opcodes import * import dis bytecode = ( chr(EXTENDED_ARG) + chr(1) + chr(0) + chr(JUMP_IF_TRUE_OR_POP) + chr(0) + chr(0) ) print(dis.findlabels(bytecode))

msg260919 - (view)

Author: Barun Parruck (Barun Parruck) *

Date: 2016-02-27 04:34

Allright, so I'm uploading a preliminary patch, please look through it to see if I understood what you meant me to do about dis.findlabels.

The tests seem to mostly pass, except oddly enough, urlstdlib2, which is probably system-specific?

I get the correct output that you specify for that particular test function.

I'll see about building a unit test.

msg260920 - (view)

Author: Barun Parruck (Barun Parruck) *

Date: 2016-02-27 04:36

The patch I forgot to attach. Ha ha.

msg260921 - (view)

Author: Barun Parruck (Barun Parruck) *

Date: 2016-02-27 05:36

Hi, I'm a little confused as to how to write a test using bytecode streams...probably due to my lack of clarity as to what exactly dis.disassemble does. Is there any way you could give me a bit more information? :)

msg260925 - (view)

Author: Eric Fahlgren (eric.fahlgren) *

Date: 2016-02-27 06:48

The findlabels function takes a bytecode array of type bytes, usually the actual code from a function. My original test case uses the full Python compiler from source code to a CodeType object to create the bytecodes (plus all that other stuff that makes up a function), then extracts just the interesting part and passes that into findlabels.

The good part is that you can pretend you're the compiler by just putting the correct bytes into a array and feed it into the various dis functions. The EXTENDED_ARG operator plays with the operand of the succeeding instruction, everything else either doesn't have an argument or has two bytes.

Here's a real test case, I don't know how you write unit tests for the stdlib, but you can compare the output of the findlabels call with a known value, and that should get you pretty close.

from opcode import * code = bytes( chr(opmap["JUMP_FORWARD"]) + chr(0) + chr(0) + chr(EXTENDED_ARG) + chr(1) + chr(0) + chr(opmap["JUMP_FORWARD"]) + chr(0) + chr(0) + chr(opmap["RETURN_VALUE"]), encoding="latin-1" ) import dis dis.dis(code) print(dis.findlabels(code)) if dis.findlabels(code) == [0x0000+3, 0x00010000+9]: print("Test passed")

Take a look in the stdlib opcode.py and find the various "JUMP" operators, those are the guys we care about for this. Try out a bunch of cases by augmenting the above definition of "code" and you'll soon get a feel for what's going on.

As real, executable bytecode the above is of course non-sensical, but for a test, it's great because you can predict exactly what should be produced. ​

msg260929 - (view)

Author: Barun Parruck (Barun Parruck) *

Date: 2016-02-27 07:44

Hi, yes, that works much better, and I've definitely understood!

Now, about the tests, how large would you like them to be? For instance :

code = bytes ( chr(opmap["JUMP_FORWARD"]) + chr(0) + chr(255) + chr(EXTENDED_ARG) + chr(1) + chr(1) + chr(opmap["JUMP_FORWARD"]) + chr(0) + chr(0) + chr(EXTENDED_ARG) + chr(1) + chr(0) + chr(opmap["JUMP_ABSOLUTE"]) + chr(0) + chr(1) + chr(opmap["RETURN_VALUE"]), encoding="latin-1" )

Would return [65283, 16842761, 65792]

And I'll make sure I test all the jump instances (I'm shuddering to imagine what a real computer would do if fed with those byte codes)

Do you want the tests to be large? (array size > 10^2/2 as a result), or is the example above fine, if I add three or four more example testing each jump instance?

msg260931 - (view)

Author: Barun Parruck (Barun Parruck) *

Date: 2016-02-27 08:18

Four test cases have been included in the unittests, using the module unittest. One is the original one you gave me, the others are some that I played around with it, and thought would be useful to include.

I would love some code reviews at this point!

msg260939 - (view)

Author: Barun Parruck (Barun Parruck) *

Date: 2016-02-27 10:26

Sorry, the previous file had a bug.

msg260940 - (view)

Author: Eric Fahlgren (eric.fahlgren) *

Date: 2016-02-27 14:02

Lookin' good so far. How about we try it on all the opcodes that have arguments?

See attached example for which I can see two obvious improvements:

  1. It could be improved by taking apart that "args" list and using it to synthesize "sample_code" rather than having to hand duplicate the values in two places, albeit with different byte order.
  2. Likewise, my hard-coded "offsets" table is pretty awful. :)

Also, is there already a test for the dis module in which you could just add this as a case?

msg260977 - (view)

Author: Barun Parruck (Barun Parruck) *

Date: 2016-02-28 13:46

Hi, sorry, been pretty busy, and will be for a few days, so my replies may not be instant!

Having looked through the dis test module, to be honest, I can't even find a call to dis.findlabels...odd. Maybe I've got it wrong, and I wouldn't mind somebody else taking a look! But either way, a quick overview shows me that it basically picks up functions, figures out the disassembled function, and compares expected output, in a similar way that we're working, actually.

I'll look through your example, which looks pretty good! It works as expected on my repo (fails and passes appropriately), and I'll see about improving it, but for now, it seems to do its job.

msg260979 - (view)

Author: Eric Fahlgren (eric.fahlgren) *

Date: 2016-02-28 16:11

Oh, don't worry about time between responses, we all have lives (well, some of us anyhow :) ).

After looking at Lib/test/test_dis.py, I think it's just an oversight that there is no specific findlabels test (it's tested implicitly by dis.dis after all), and it would be good thing for you to add one. The obvious place seems like a new method at the bottom of the DisTests class, but we'll have to rely on (probably) Yury to make that call.

msg260981 - (view)

Author: Eric Fahlgren (eric.fahlgren) *

Date: 2016-02-28 18:20

I just remembered that code can have more than one (up to three?) EXTENDED_ARG operators before the real opcode, so I added that generalization to build code around the args list...

msg260987 - (view)

Author: Eric Fahlgren (eric.fahlgren) *

Date: 2016-02-28 21:02

Two things:

  1. Verified this has always been a problem and still is in the development branch, so I added 2.7 and 3.6 to the versions list.

  2. Couldn't tolerate the duplicate code handling the extended args operator, so in the interests of DRY, I moved the code scanner to a generator function (see attached dis_with_code_scanner.diff).

The patch is definitely not required to fix this bug, but it does isolate the original problem area to just one piece of code.

Could we get a get a quick review from a core dev saying either "go with Barun's less invasive preliminarypatch.diff" or "go with Eric's greater-churn dis_with_code_scanner.diff patch?"

(We still need to flesh out the testing a bit, so don't call us, we'll call you. :) )

msg260988 - (view)

Author: Eric Fahlgren (eric.fahlgren) *

Date: 2016-02-28 23:26

Oops, wrong/bad patch, delete line 310 "arg = None" in _get_instruction_bytes...

msg261112 - (view)

Author: Barun Parruck (Barun Parruck) *

Date: 2016-03-02 12:58

So...firstly hi, and sorry for disappearing like that!

You want tests for the dis.findlabels library? What sort of tests are you looking for? Similiarly sized bytecode streams like the ones we worked with, or did you have something else in mind?

B

msg261142 - (view)

Author: Eric Fahlgren (eric.fahlgren) *

Date: 2016-03-02 23:44

Barun, take a look at the latest version of the testfindlabels.py, see what you think. If it works for you, maybe move the test function into Lib/test/test_dis.py as part of the standard dis module tests. Still need to look at the code that's being tested and find out what cases could cause problems and then augment the test to make sure it covers those.

msg261590 - (view)

Author: Barun Parruck (Barun Parruck) *

Date: 2016-03-11 18:15

Hi!

Okay, sorry for disappearing like that(this is becoming rather a habit isn't it!), but with Gsoc applications, I've been insanely busy trying to get patches and bugs accepted for my first Gsoc ever. I do apologize for leaving you hanging, it was not my intention!

I'll try my best to give you an update by the end of this week and hopefully be able to keep working on this!

msg267419 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2016-06-05 15:30

This bug was fixed in with similar patch. Sorry, I didn't know about this issue. Your patches look good. In any case thank you for your effort.

msg267477 - (view)

Author: Eric Fahlgren (eric.fahlgren) *

Date: 2016-06-05 21:15

Thanks, Serhiy. I sort of figured that it would get fixed with the wordcode rework, and was going to verify that it was when 3.6 settled down.

On Sun, Jun 5, 2016 at 8:30 AM, Serhiy Storchaka <report@bugs.python.org> wrote:

Serhiy Storchaka added the comment:

This bug was fixed in with similar patch. Sorry, I didn't know about this issue. Your patches look good. In any case thank you for your effort.


nosy: +serhiy.storchaka resolution: -> out of date stage: -> resolved status: open -> closed superseder: -> modulefinder should reuse the dis module


Python tracker <report@bugs.python.org> <http://bugs.python.org/issue26448>


History

Date

User

Action

Args

2022-04-11 14:58:28

admin

set

github: 70635

2016-06-05 21:15:47

eric.fahlgren

set

messages: +

2016-06-05 15:30:34

serhiy.storchaka

set

status: open -> closed

superseder: modulefinder should reuse the dis module

nosy: + serhiy.storchaka
messages: +
resolution: out of date
stage: resolved

2016-03-11 18:15:06

Barun Parruck

set

messages: +

2016-03-02 23:44:42

eric.fahlgren

set

files: + dis_with_code_scanner.diff

messages: +

2016-03-02 20:22:34

vstinner

set

nosy: + vstinner

2016-03-02 12:58:04

Barun Parruck

set

messages: +

2016-02-28 23:26:39

eric.fahlgren

set

messages: +

2016-02-28 21:02:23

eric.fahlgren

set

files: + bad.diff

messages: +
versions: + Python 2.7, Python 3.6

2016-02-28 18:20:55

eric.fahlgren

set

files: + testfindlabels.py

messages: +

2016-02-28 16:11:42

eric.fahlgren

set

messages: +

2016-02-28 13:46:10

Barun Parruck

set

messages: +

2016-02-28 06:36:30

ned.deily

set

nosy: + ncoghlan, yselivanov

2016-02-27 14:02:38

eric.fahlgren

set

files: + testfindlabels.py

messages: +

2016-02-27 10:26:50

Barun Parruck

set

files: + testfindlabels2.py

messages: +

2016-02-27 08🔞27

Barun Parruck

set

files: + wrongtestfindlabels.py

messages: +

2016-02-27 07:44:58

Barun Parruck

set

messages: +

2016-02-27 06:48:44

eric.fahlgren

set

messages: +

2016-02-27 05:36:45

Barun Parruck

set

messages: +

2016-02-27 04:36:09

Barun Parruck

set

files: + preliminarypatch.diff
keywords: + patch
messages: +

2016-02-27 04:34:29

Barun Parruck

set

messages: +

2016-02-27 04:34:10

eric.fahlgren

set

messages: +

2016-02-27 04:20:31

eric.fahlgren

set

messages: +

2016-02-27 04:17:51

eric.fahlgren

set

messages: +

2016-02-27 04:12:30

Barun Parruck

set

messages: +

2016-02-27 04:01:44

Barun Parruck

set

nosy: + Barun Parruck
messages: +

2016-02-27 03:59:31

llllllllll

set

nosy: + llllllllll

2016-02-27 03:54:10

eric.fahlgren

create