[Python-Dev] Speeding up 2to3: Results from a GSOC Project (original) (raw)

Guido van Rossum guido at python.org
Fri Jul 30 22:43:47 CEST 2010


Great! CS FTW!

--Guido

On Fri, Jul 30, 2010 at 1:28 PM, George Boutsioukis <gboutsioukis at gmail.com> wrote:

Hi everyone, I haven't had a chance to introduce myself yet. I'm George Boutsioukis, a CS student from Greece and I'm currently enrolled as a GSOC student for the PSF. The task I was involved with for the past few weeks was speeding up the 2to3 tool.

For those who are not aware of 2to3's internals, the tool matches a series of tree patterns to a python syntax tree and applies a code transformation on each match. The real bottleneck of this tool is the tree pattern matching algorithm, since the current version traverses all nodes of the tree, top-to-bottom, and checks each node against a set of tree patterns. The new algorithm I've implemented reduces the candidate nodes for matching. The process works in three steps: 1) Each pattern tree is reduced to the most 'characteristic' path, i.e. the rarest path you'll encounter on a real code tree. (this is simpler than it sounds) 2) From the set of the above linear patterns an Aho-Corasick automaton is constructed that is capable of inclusively matching all patterns(meaning that it may mark a wrong candidate but will never miss an instance) 3) The automaton is run on each leaf until the root of the tree is reached and the resulting match set is returned. And finally we apply each fixer for each node in the match set. Of course the process is a bit more complicated since we have to recheck the transformed code for new matches etc. Since it is quite a hard concept to illustrate in a post, tell me if you have any questions or need more info. The benefit of this somewhat complex process is substantial; the total run time is reduced by roughly 50%. The new module is still experimental code, but mature enough to pass all tests. I propose to include the new module with the next version of Python, perhaps as an explicit option for 2to3 until it can be thoroughly tested. The code repository can be found here: https://code.google.com/p/2to3-speedup2/source/browse/ (the code is a fork from py3k's 2to3 and was tested with version 3.1.2) The blog I've been using for GSOC(containing more details on the matching algorithm): http://george-gsoc.blogspot.com/ Regards, George


Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org

-- --Guido van Rossum (python.org/~guido)



More information about the Python-Dev mailing list