[Python-Dev] parallelizing (original) (raw)

Chris Barker chris.barker at noaa.gov
Wed Sep 13 20:07:07 EDT 2017


On Wed, Sep 13, 2017 at 12:11 PM, Matthieu Bec <mdcb808 at gmail.com> wrote:

Regarding your example, I think it gives the illusion to work because sleep() is GIL aware under the hood.

It'll work for anything -- it just may not buy you any performance.

I don't know off the top of my head if file I/O captures the GIL -- for your example of file parsing.

I don't think it works for process() that mainly runs bytecode, because of the GIL. If you are trying to get around the GIL that that is a totally different question.

But the easy way is to use multiprocessing instead:

import time import random import multiprocessing

def process(infile, outfile): "fake function to simulate a process that takes a random amount of time" time.sleep(random.random()) print("processing: {} to make {}".format(infile, outfile))

for i in range(10): multiprocessing.Process(target=process, args=("file%i.xml" % i, "file%i.xml" % i)).start()

More overhead creating the processes, but no more GIL issues.

Sorry if I wrongly thought that was a language level discussion.

This list is for discussion of the development of the cPython interpreter. So this kind of discussion doesn't belong here unless/until it gets to the point of actually implementing something.

If you have an idea as to how to improve Python, then python-ideas is the place for that discussion.

But "there should be a way to run threads without the GIL" isn't a well-enough formed idea to get far there....

If you want to discuss further, let's take this offline.

Can't there be a way to capture that idiom and multi thread it in the language itself?

Example:

loop: read an XML produce a JSON like note about this -- code like this would be using all sorts of shared modules. The code in those modules is going to be touched by all the threads. There is no way the python interpreter can know which python objects are used by what how --- the GIL is there for good (and complex) reasons, not an easy task to avoid it. It's all using the same interpreter.

Also -- it's not easy to know what code may work OK with the GIL. intensive computation is bad. But Python is a poor choice for that anyway.

And code that does a lot in C -- numpy, text processing, etc. may not hold the GIL. And I/O

So for your example of parsing XML and writing JSON -- it may well do a lot of work without holding the GIL.

No way to know but to profile it.

-CHB

--

Christopher Barker, Ph.D. Oceanographer

Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20170913/9ae7059d/attachment.html>



More information about the Python-Dev mailing list