[Python-Dev] Making python C-API thread safe (try 2) (original) (raw)
Harri Pesonen fuerte at sci.fi
Fri Sep 12 08:56:55 EDT 2003
- Previous message: [Python-Dev] Making python C-API thread safe (try 2)
- Next message: [Python-Dev] python/dist/src/Lib/bsddb __init__.py,1.5,1.6
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Phillip J. Eby wrote:
Please do not CC: my mail to Python-Dev again; I intentionally did not include python-dev on my CC: because it was asked that we move this thread elsewhere.
At 10:16 PM 9/11/03 +0300, Harri Pesonen wrote:
Phillip J. Eby wrote:
At 08:47 PM 9/11/03 +0300, Harri Pesonen wrote:
But my basic message is this: Python needs to be made thread safe. Making the individual interpreters thread safe is trivial, and benefits many people, and is a necessary first step;
It's far from trivial - you're talking about invalidating every piece of C code written for Python over a multi-year people by dozens upon dozens of extension authors. The change is trivial in Python C API. I already said that it would break everything outside the Python distribution, but the change in other applications is also trivial. How do you propose that C code called from Python receive the threadstate pointer?
Exactly like that. Is there a problem? I'm suggesting that every function call gets that pointer, unless the function can get it from some other argument, that contains a pointer to it.
It doesn't benefit many people: only those using isolated interpreters embedded in a multithreaded C program.
I don't know how many people are writing threads in Python, either. I guess that not so many. In my case I only need a thread safe interpreter, I don't create threads in Python code. So just having what I described would be enough for me: no need for global interpreter lock, and Python would be really multithreading. It would benefit many people, I'm sure. Obviously, it's enough for you, or you wouldn't be proposing it. What does it do for me? Nothing whatsoever, except add needless overhead and make me rewrite every C extension I've ever written for Python. So, by and large, you're not going to get much support for your change from Python developers, especially those who write C extensions, or depend on extensions written by others.
Probably. That's why I'm thinking now that the language should be called something else, like MPython for "multi-threading Python". It would be 99% compatible with the existing Python syntax, but have different internals.
Yes, I'm aware of the None problem at least (only one instance of it). Please enlighten me about the other critical sections? Object allocation/freeing? Data structure manipulations, e.g. all use of dictionaries. Python spends most of its time doing dictionary lookups or modifications, all of which need to be protected.
After sleeping over night, I think that I got it. :-) The simple solution is, that each thread created in Python gets its own independent interpreter state as well. And there could be a separate thread-global interpreter state for shared memory access. Access to this global state would always be synchronized. There could even be multiple named global states, so that the thread interlocking could be minimized. The python syntax for creating objects in this global state should be invented:
synchronize a = "abcd"
Also when creating the new thread, it's arguments would be copied from the creating state to the new state.
What does it sound? Of course it would be incompatible with the current threading system in Python, but it would be totally multithreading, no global interpreter lock needed. It would be faster than current Python, there would be no need to free or acquire the lock when calling OS functions, and no need to check how many byte codes have been processed, etc.
I'm guessing you haven't done much writing of C extensions for Python (or Python core C), or else you'd realize why trying to make INCREF/DECREF threadsafe would absolutely decimate performance. Reference count updates happen way too often in normal code flow. I also knew that already. But how else can you do it? The way it's done now! :)
I understand why the current Python works like it does. But I think that it's time for the next generation. If you don't do it, and I have no time now to do it, I'm still sure that this is done at some point, rather sooner than later.
Of course, changing Python to not have a single None would help a lot. Or, perhaps it could have a single None, but in case of None, the reference count would have no meaning, it would never be deallocated, because it would be checked in code. Maybe it does it already, I don't know. I really don't mean to be rude (another reason I'm writing this to you privately), but this paragraph shows you are really new to Python both at the level of coding in Python and coding with Python's C API. I wish I could explain in detail why, but there's really far too much that you don't understand and it would take me too long. I will attempt to summarize a very few points, however: first, identity (pointer comparison) is a part of the Python language, so you can't have multiple None instances any more than you can have more than one value be NULL in C. Second, at the C level, all Python objects (including None) have an absolutely uniform API, so having refcount behavior be different for different kinds of objects is not at all practical. Third, if you had more than one PyNone at the C level, you'd either have to make PyNone a macro, or rewrite all the C. If you don't think that's a problem, you have absolutely no idea how much C code out there is written to the Python API.
Yes, Py_None would be a macro. All access to interpreter state would go through the interpreter state pointer that is always in stack, the first argument each C API function gets. That pointer should be named so that the macros will always work ("tState", for example, so that Py_None macro would expand to tState->mPy_None, for example).
I'm also wondering why this problem has not been addressed before? It has; the cure is worse than the disease. A few years ago, somebody wrote a "free-threading" version of Python, which locked individual data objects rather than use the global interpreter lock. The performance for single-threaded programs was abominable, and the performance gain even on multiprocessor machines was not thought worth the cost. So the project was scrapped.
There would be no locking in my proposal, except when accessing the shared memory global thread state.
I don't know, I got mail about writing a PEP. It is clear that it would not be accepted, because it would break the existing API. The change is so big that I think that it has to be called a different language.
This is the last message I will make about this matter (before actually starting to code it), so I'm posting this to python-list as well, because this is too important to be ignored. Python needs to be free-threading...
Harri
- Previous message: [Python-Dev] Making python C-API thread safe (try 2)
- Next message: [Python-Dev] python/dist/src/Lib/bsddb __init__.py,1.5,1.6
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]