Issue 7826: support caching for 2to3 (original) (raw)

Created on 2010-02-01 00:59 by ferringb, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
allow_alternate_output_tools.patch	ferringb,2010-02-01 00:59	allow alternate RefactoringTools w/in main.main
2to3-caching.patch	ferringb,2010-02-01 09:35	2to3-caching.patch

Messages (14)
msg98639 - (view)	Author: Ferringb (ferringb) *	Date: 2010-02-01 00:59
Bit like unittest, right now it's rather hard to extend 2to3 for caching support (and other outputs) w/out duplicating the main function. Attached is a patch that allows the refactoring tool class to be passed in- at the very least, this patch is enough to remove some of the nastier monkey patching tricks I had to level to inline caching support into 2to3. Actual caching patch will follow shortly; roughly what it does is track the md5 of the original source and use that as a lookup to the transformed version. That caching support isn't useful to the majority of users, but for developers w/ buildslaves it's a very useful reduction in runtime- for example, for 6 buildslaves I run for pkgcore (a py2k source that we translate upon install to py3k if the target is py3k), if the cache is primed this is a reduction of 20s to 1.8s. Two minutes cpu time across the slaves brought down to ~11s. As said, I understand it's a corner case, so getting the caching into 2to3 directly may not be possible. This initial patch however makes it way easier to do inline the caching into 2to3 without having to copy/paste main (or do some heinous monkeypatching). Patch is against svn trunk; doesn't apply cleanly to 3.1/2.6, but that's just a minor modification to make it so.
msg98641 - (view)	Author: Ferringb (ferringb) *	Date: 2010-02-01 01:24
Attached is a derivative of http://pkgcore.org/trac/pkgcore/browser/ferringb/snakeoil-dev/snakeoil/caching_2to3.py As you can see in that function, some nastyness is required right now to slip it in w/out duplicating code (hence the first patch). Patch attached here inlines the support directly into a new module lib2to3.caching, and derives on the fly a caching version of the RefactoringTool. The on the fly bit is somewhat fancy I admit, but was done so that anyone using an alternate RefactoringTool can get the same benefits. The sole con to this functionality is that it assumes the transformation will always be the same; in other words, it doesn't account for --fix/--nofix, nor changes in the transformation algorithms that would result in differing output. Personally for my uses this was completely valid- when changing the python version (even a minor upgrade) on my buildslaves I'd wipe the caches to be safe anyways. That said, if there is interest in getting caching into stdlib for this, I can poke through the code and incorporate awareness of nofix/fix/python version into the cache key. I skipped doing that however since I didn't think there was a chance in hell of caching going into mainline however ;) Either way, for folks targeting py2k and using buildslaves to validate their code and the translation of said code for py3k, either of these patches really is needed to support caching and stop burning cycles.
msg98642 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-02-01 01:28
Not sure what your use case is, but I always call the refactor method of my subclassed RefactoringTool, instead of calling main. See distutils.util.run_2to3 for an example (and distutils.command.build_py.build_py_2to3 for an application of that).
msg98646 - (view)	Author: Ferringb (ferringb) *	Date: 2010-02-01 03:22
@martin: Yeah... that's probably the better approach, although seperation of processes (in terms of setup.py triggering conversion) has some benefits. The reason I used process seperation and invocation of main was to protect my distutils extensions against having to know the innards of lib2to3- either it could invoke the normal 2to3 or invoke the caching version for the buildslaves. Essentially I wanted the caching version to behave the same as normal 2to3 invocations. Not great I'll admit, but so it goes. I'd still like to see the caching integrated in some form w/in stdlib since it has definite benefits- and folk are more likely to use something from stdlib than go raiding from a backwater pkg like snakeoil. That said, we'll see what folks think ;)
msg98649 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-02-01 06:42
If you meant to propose a patch that does caching, you should include the actual caching code in the patch.
msg98651 - (view)	Author: Ferringb (ferringb) *	Date: 2010-02-01 09:31
Cough. Ya, going to blame the bugzie on that one. right, it's at fault rather than me being a complete freaking moron. Patch attached, hopefully with far less idiocy than I've demonstrated thus far.... Pardon.
msg98652 - (view)	Author: Ferringb (ferringb) *	Date: 2010-02-01 09:35
Related note, don't be drunk when posting the missing patch- sorry for the noise, here is the caching version,, daftly presumed the early patch contained lib2to3.caching Now I'm going to go crawl in a corner, if you need me, I'll be there. Doubly pardon. As for the algo, sha1 might be wiser- as said, integrating fix/nofix is definitely needed, let alone writing tests. This is more a proof of concept I was using for speed reasons- assuming folk are game, I'm game to write the tests for it.
msg98653 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-02-01 09:53
Please submit a contributor agreement, see http://www.python.org/psf/contrib/ I'm still not clear on what the specific use case is: how exactly are you going to run 2to3 when you have that patch integrated? In particular, where do you put the output (3.x) files? I'd rather recommend to create a --outdir option (assuming that would also fit your needs). I'm -0 on using hashing. The whole point of this caching is speed, so wasting time in recomputing hash codes is, well, wasted. If you do think you need hashing, please read the source file in binary, rather than first decoding and then encoding it. Don't use print(). If you must output progress, use the same approach as the rest of 2to3 (i.e. logging).
msg99070 - (view)	Author: Ferringb (ferringb) *	Date: 2010-02-08 22:16
Contributor agreement form is enroute via snail mail... kind of wish y'all accepted pdf's via email though ;) As for the print("cache hit"), that should be punted- not very useful in it's current form (doesn't indicate what 'hit') and slipped in accidentally.
msg99081 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2010-02-09 01:25
I'm apprehensive about accepting this patch for several reasons. First of all, it has no tests. Secondly, and more importantly, I dislike the hacks that it adds. I would prefer 2to3 to grow a more general plugin API, which people could use to add new fixer etc as well and customizes behavior like this.
msg99246 - (view)	Author: Ferringb (ferringb) *	Date: 2010-02-12 01:05
@benjamin: Tests can be written; the reason this patch doesn't bundle tests up front is that I wasn't going to burn the time till I knew they were needed since I expected the concept to require some debate. As for the hacks angle, there isn't anything hackish there- hackish is what you have to do w/out either of these patches (http://www.pkgcore.org/trac/pkgcore/browser/snakeoil/snakeoil/caching_2to3.py). I understand not everyone likes classes mixed on the fly, but it's a perfectly valid technique w/ many, many valid uses. Converting the innards of 2to3 over to a generalized plugin interface frankly is overkill from where I'm standing- being able to plugin new fixers is one thing (very useful in my opinion but orthogonal to what this ticke tis about), being able to plugin on the fly different refactoring outputers is another thing. Personally what's in place w/in allow_alternate_output_tools.patch is more than enough for the crazy ideas I've got kicking around. If you've got a specific complaint in the patch, an area that is dodgy/hacky, state it please- I'm more than open to actionable criticisms.
msg99333 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2010-02-13 23:05
2010/2/11 Brian Harring <report@bugs.python.org>: > > Brian Harring <ferringb@gmail.com> added the comment: > > @benjamin: > > Tests can be written; the reason this patch doesn't bundle tests up front is that I wasn't going to burn the time till I knew they were needed since I expected the concept to require some debate. > > As for the hacks angle, there isn't anything hackish there- hackish is what you have to do w/out either of these patches (http://www.pkgcore.org/trac/pkgcore/browser/snakeoil/snakeoil/caching_2to3.py). I understand not everyone likes classes mixed on the fly, but it's a perfectly valid technique w/ many, many valid uses. While it may be a valid pattern, I still think there are more elegant solutions. > > Converting the innards of 2to3 over to a generalized plugin interface frankly is overkill from where I'm standing- being able to plugin new fixers is one thing (very useful in my opinion but orthogonal to what this ticke tis about), being able to plugin on the fly different refactoring outputers is another thing. I plan to work on this this spring or summer if I have time. > > Personally what's in place w/in allow_alternate_output_tools.patch is more than enough for the crazy ideas I've got kicking around. > > If you've got a specific complaint in the patch, an area that is dodgy/hacky, state it please- I'm more than open to actionable criticisms. I think delegation would be more appropriate than adding mixin classes at runtime.
msg120222 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2010-11-02 13:17
I'm going to reject this for now.
msg137845 - (view)	Author: Éric Araujo (eric.araujo) *	Date: 2011-06-07 16:00
FTR: https://github.com/pv/lib2to3cache

History
Date	User	Action	Args
2022-04-11 14:56:57	admin	set	github: 52074
2011-06-07 16:00:49	eric.araujo	set	nosy: + eric.araujomessages: +
2010-11-02 13:17:34	benjamin.peterson	set	status: open -> closedresolution: rejectedmessages: +
2010-02-13 23:05:15	benjamin.peterson	set	messages: +
2010-02-12 01:05:58	ferringb	set	messages: +
2010-02-09 01:25:46	benjamin.peterson	set	priority: normal -> lowkeywords: - needs reviewmessages: +
2010-02-08 22:16:32	ferringb	set	messages: +
2010-02-07 09:37:34	georg.brandl	set	assignee: benjamin.petersonnosy: + benjamin.peterson
2010-02-01 09:53:11	loewis	set	messages: +
2010-02-01 09:35:44	ferringb	set	files: - 2to3-caching.patch
2010-02-01 09:35:38	ferringb	set	files: - 2to3-caching.patch
2010-02-01 09:35:03	ferringb	set	files: + 2to3-caching.patchmessages: +
2010-02-01 09:31:30	ferringb	set	files: + 2to3-caching.patchmessages: +
2010-02-01 06:42:00	loewis	set	messages: +
2010-02-01 03:30:38	brian.curtin	set	priority: normalkeywords: + needs reviewstage: test needed
2010-02-01 03:22:51	ferringb	set	messages: +
2010-02-01 01:28:47	loewis	set	nosy: + loewismessages: +
2010-02-01 01:24:32	ferringb	set	files: + 2to3-caching.patchmessages: +
2010-02-01 00:59:09	ferringb	create