msg70277 - (view) |
Author: Nick Edds (nedds) |
Date: 2008-07-25 21:06 |
Here is a working, multiprocess version of 2to3 with a few caveats. First, you need to already have the processing module installed for this to work. If we don't want to include processing in some way, I think I can modify this to only import processing and use the Process method if the user wants to run it with more than one process. Also, I know that logger is supposed to be thread safe, so I am correct in assuming that this means it is also multi-process safe? This fix delegates the fixing of files to a designated number of processes, so on a multi-core machine, it is significantly faster. It may be appropriate to add a test suite for it, but I have not yet done so. I've tested it on my dual-core laptop running Ubuntu and a quad-core mac and it appears to be working fine. I don't know if the use of tempfile was the right choice, but it seemed reasonable. Another possibility is to instead using a processing Queue and pass it the output, filename, and input rather than calling write_file. Also, the join timeout I used is completely arbitrary because I don't really have a sense of what a reasonable value for it should be. |
|
|
msg70280 - (view) |
Author: Nick Edds (nedds) |
Date: 2008-07-25 21:20 |
Here is a version that only imports processing if the multi-process option is specified. I don't know if this is the most efficient way it can be done, and I think there's a better way to do it, but this works. |
|
|
msg74108 - (view) |
Author: Nick Edds (nedds) |
Date: 2008-10-01 02:10 |
Is there still any interest in this Collin? Is there anything else you need me to do for it? |
|
|
msg74109 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2008-10-01 03:08 |
Nick, is there a way you could isolate the process functionality in a RefactoringTool subclass? It's an interesting idea, but I don't it needs to infect the main library. |
|
|
msg74121 - (view) |
Author: Collin Winter (collinwinter) *  |
Date: 2008-10-01 09:05 |
I believe the only issue I recall was that the patch didn't work out-of-the-box for Python 2.6 (changed imports, PEP 8 compliance changes in the multiprocess module). Has that been fixed? I disagree with Benjamin: this is an import speed increase, and I don't see the point in adding the needless complexity of a subclass. The user won't notice any difference, and anyone wanting to use this as a library will want the faster version. |
|
|
msg74144 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2008-10-01 20:35 |
I don't think we should force people using it as a library to go multiprocess. Also, it's trivial to just change the name of the class used if that is wanted. |
|
|
msg74145 - (view) |
Author: Nick Edds (nedds) |
Date: 2008-10-01 20:43 |
The currently attached patch works in Python2.5 not Python2.6, so I will update it for 2.6 when I get the chance. But as it is currently written, the default behavior is not multiprocess. Instead, if you want multiprocess, you specify how many processes you want when you run 2to3 from the command line. And as long as you've got multiple files to run 2to3 on and multiple cores, doing multiprocess 2to3 is significantly faster. |
|
|
msg74155 - (view) |
Author: Collin Winter (collinwinter) *  |
Date: 2008-10-02 08:40 |
"I don't think we should force people using it as a library to go multiprocess." I don't understand this. What downsides do you perceive in multiprocessing support? Multiprocessing is a significant speed-up for 2to3 on multicore systems. |
|
|
msg74200 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2008-10-02 22:07 |
I'm just saying that a client of lib2to3 shouldn't have to use multiple processes. Just as long as the multiple processes are optional, I'm happy. :) |
|
|
msg74226 - (view) |
Author: Collin Winter (collinwinter) *  |
Date: 2008-10-03 06:52 |
You have yet to articulate a reason for that preference. |
|
|
msg74272 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2008-10-03 21:01 |
Because I would like to use lib2to3 without the complexity of multiple processes. That it is a good performance boost is excellent, but I don't think it should be a required part of using the library. |
|
|
msg74273 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-10-03 21:04 |
I think it's reasonable to only enable multiprocessing if the adequate command-line option has been set. It's how `make` already works (next time you compile Python, try `make -jN` where N is your number of CPU cores). |
|
|
msg74429 - (view) |
Author: Collin Winter (collinwinter) *  |
Date: 2008-10-07 09:09 |
Benjamin, what "complexity" did you encounter when trying to use lib2to3 in your own work? Unless there's a concrete use-case where the mere existence of multiprocessing support (as opposed to actually enabling that support) made a tangible project harder, I say we leave it in. I can't imagine someone making the conscious choice to slow down their own lib2to3-based refactoring tool by 2x; if those people appear later, we'll deal with their concerns at that time. Nick: if you don't have time to make this work in 2.6, I can do so. The patch will also need doc fixes and tests, though the core functionality is fine. |
|
|
msg74437 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2008-10-07 11:38 |
I suggest that when using lib2to3 as a library, multiprocessing is not enabled by default; there may be uses of the library that are incompatible with multiprocessing. It may be enabled by default when using it from the command line (or the lib2to3.main module). But which default number of processes would this use? Concerning the patch itself: - the line "from processing import Process" seems suspect. - Did you consider using something as simple as: pool = multiprocessing.Pool(self.options.num_processes) pool.map(self.refactor_file, fullnames) It should do all the job: start processes, queue tasks, wait for results. |
|
|
msg74486 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2008-10-07 21:07 |
I'm not opposed to having the support available. I just don't what it enabled by default. |
|
|
msg74615 - (view) |
Author: Nick Edds (nedds) |
Date: 2008-10-10 02:57 |
I had very little experience with the processing module prior to the creation of this patch, and because pool objects are listed last in the documentation, I did not read about them because I saw a way to achieve what I wanted using Process. But having looked at the documentation for Pool, I think you are correct that it would be a much cleaner solution to the problem, as I was essentially implementing a Pool myself. I will post a new patch to reflect this change tomorrow. I will also submit the patch to take advantage of the fact that the multiprocessing module is included in Python2.6, as opposed to my prior patch which was designed for Python2.5 which required the user to get the processing module. And in the patch, as before, multiprocess will not be default, but it will be something the user can enable via the command line. |
|
|
msg85103 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2009-04-01 22:41 |
I added some support for concurrent running in r70999. |
|
|