[Python-Dev] [OT] stats on type-inferencing atomic types in local variables in the stdlib (original) (raw)

Brett C. bac at OCF.Berkeley.EDU
Thu Sep 30 22:02:34 CEST 2004


My thesis (which, for those who don't know, was to come up with a way to do type inferencing in the compiler without requiring any semantic changes; basically type inferencing atomic types assigned to local variables) is now far enough long that I have the algorithm done and I can generate statistics on what opcodes are called with the most common types that I can specifically infer (can also do static type checking on occasion; only triggers were actual unit tests making sure TypeError was raised for certain things like ~4.2 and such). Thought some of you might get a kick out of this since the numbers are rather blatent for certain opcodes and methods.

To read the stats, the number to the left is the number of times the opcode was compiled (not executed!) with the specific type(s) known for the opcode (if it took two args, then both types are listed; order was considered irrelevant). Now they are listed as integers, so here is the conversion::

Basestring 4 IntegralType 8 FloatType 16 ImagType 32 DictType 64 ListType 128 TupleType 256

For the things named "meth_" that is the method being called immediately on the type.

Now realize these numbers are only for opcodes where I could definitely infer the type; ones where it could be more than one type, regardless if those possibilities were very specific, I just ignored it and did not include in the stats.

I also tweaked some opcodes knowing how they are more often used. So, for instance, BINARY_MODULO checks specifically for the case of when the left side is a basestring and then just doesn't worry about the other args. Other ones I just didn't bother with all the args since it was not interesting to me in terms of deciding what type-specific opcodes I want to come up with.

Anyway, here are the numbers on Lib sans Lib/test (129,814 lines according to SLOCCount) for the ones above 100::

(101, ('BINARY_MULTIPLY', (8, 4))), (106, ('BINARY_SUBSCR', 128)), (118, ('GET_ITER', 128)), (124, ('BINARY_MODULO', None)), (195, ('meth_join', 4)), (204, ('BINARY_ADD', (8, 8))), (331, ('BINARY_ADD', (4, 4))), (513, ('BINARY_LSHIFT', (8, 8))), (840, ('meth_append', 128)), (1270, ('PRINT_ITEM', 4)), (1916, ('BINARY_MODULO', 4)), (12302, ('STORE_SUBSCR', 64))]

We sure like our dictionaries (for those that don't know, dictionaries are created by making an empty dict and then basically doing an indivual assignment for each value). We also seem to love to use string interpolation, and printing stuff. Using list.append is also popular. Now the BINARY_LSHIFT is rather interesting, and that ties into the whole issue of how much I can actually infer; since binary work tends to be with all constants I can infer it really easily and so its frequency is rather high. Its actual frequency of use, though, compared to other things probably is not high, though. Plus I doubt Plone, for instance, uses << very often so I suspect the opcode will get weeded out when I incorporate stats from the other apps I am taking stats from.

As for the stuff I cut out, the surprising thing from those numbers was how few mathematical expressions could be inferred. I checked my numbers with grep and there really is only 3 times where a float constant is divided by a float constant (and they are all in colorsys). I was not expecting that at all. Guess global variables or object attributes tend to have them or I just can't infer the values. Either way I just wasn't expecting that.

Anyway, as I said I just thought some people might find this interesting. Don't read into this too much since I am just using these numbers as guidelines for type-specific opcodes to write for use as a quantifiable measurement of the usefulness of type inferencing like this.

-Brett

P.S.: anyone who is really interested I can send you the full stats for the apps I have run my modified version of compile.c against.



More information about the Python-Dev mailing list