[Python-Dev] Making python C-API thread safe (try 2)

Fri Sep 12 00:56:55 EDT 2003

Phillip J. Eby wrote:

> Please do not CC: my mail to Python-Dev again; I intentionally did not 
> include python-dev on my CC: because it was asked that we move this 
> thread elsewhere.
>
> At 10:16 PM 9/11/03 +0300, Harri Pesonen wrote:
>
>> Phillip J. Eby wrote:
>>
>>> At 08:47 PM 9/11/03 +0300, Harri Pesonen wrote:
>>>
>>>> But my basic message is this: Python needs to be made thread safe. 
>>>> Making the individual interpreters thread safe is trivial, and 
>>>> benefits many people, and is a necessary first step;
>>>
>>>
>>>
>>> It's far from trivial - you're talking about invalidating every 
>>> piece of C code written for Python over a multi-year people by 
>>> dozens upon dozens of extension authors.
>>
>>
>> The change is trivial in Python C API. I already said that it would 
>> break everything outside the Python distribution, but the change in 
>> other applications is also trivial.
>
>
> How do you propose that C code called *from* Python *receive* the 
> threadstate pointer? 

Exactly like that. Is there a problem? I'm suggesting that every 
function call gets that pointer, unless the function can get it from 
some other argument, that contains a pointer to it.

>
>>> It doesn't benefit many people: only those using isolated 
>>> interpreters embedded in a multithreaded C program.
>>
>>
>> I don't know how many people are writing threads in Python, either. I 
>> guess that not so many. In my case I only need a thread safe 
>> interpreter, I don't create threads in Python code. So just having 
>> what I described would be enough for me: no need for global 
>> interpreter lock, and Python would be really multithreading. It would 
>> benefit many people, I'm sure.
>
>
> Obviously, it's enough for you, or you wouldn't be proposing it.  What 
> does it do for me?  Nothing whatsoever, except add needless overhead 
> and make me rewrite every C extension I've ever written for Python.  
> So, by and large, you're not going to get much support for your change 
> from Python developers, especially those who write C extensions, or 
> depend on extensions written by others. 

Probably. That's why I'm thinking now that the language should be called 
something else, like MPython for "multi-threading Python". It would be 
99% compatible with the existing Python syntax, but have different 
internals.

>
>> Yes, I'm aware of the None problem at least (only one instance of 
>> it). Please enlighten me about the other critical sections? Object 
>> allocation/freeing?
>
>
> Data structure manipulations, e.g. all use of dictionaries.  Python 
> spends most of its time doing dictionary lookups or modifications, all 
> of which need to be protected. 

After sleeping over night, I think that I got it. :-) The simple 
solution is, that each thread created in Python gets its own independent 
interpreter state as well. And there could be a separate thread-global 
interpreter state for shared memory access. Access to this global state 
would always be synchronized. There could even be multiple named global 
states, so that the thread interlocking could be minimized. The python 
syntax for creating objects in this global state should be invented:

synchronize a = "abcd"

Also when creating the new thread, it's arguments would be copied from 
the creating state to the new state.

What does it sound? Of course it would be incompatible with the current 
threading system in Python, but it would be totally multithreading, no 
global interpreter lock needed. It would be faster than current Python, 
there would be no need to free or acquire the lock when calling OS 
functions, and no need to check how many byte codes have been processed, 
etc.

>
>>> I'm guessing you haven't done much writing of C extensions for 
>>> Python (or Python core C), or else you'd realize why trying to make 
>>> INCREF/DECREF threadsafe would absolutely decimate performance.
>>> Reference count updates happen *way* too often in normal code flow.
>>
>> I also knew that already. But how else can you do it?
>
>
> The way it's done now!  :) 

I understand why the current Python works like it does. But I think that 
it's time for the next generation. If you don't do it, and I have no 
time now to do it, I'm still sure that this is done at some point, 
rather sooner than later.

>
>> Of course, changing Python to not have a single None would help a 
>> lot. Or, perhaps it could have a single None, but in case of None, 
>> the reference count would have no meaning, it would never be 
>> deallocated, because it would be checked in code. Maybe it does it 
>> already, I don't know.
>
>
> I really don't mean to be rude (another reason I'm writing this to you 
> privately), but this paragraph shows you are *really* new to Python 
> both at the level of coding in Python and coding with Python's C API.  
> I wish I could explain in detail why, but there's really far too much 
> that you don't understand and it would take me too long.  I will 
> attempt to summarize a very few points, however: first, identity 
> (pointer comparison) is a part of the Python language, so you can't 
> have multiple None instances any more than you can have more than one 
> value be NULL in C.  Second, at the C level, all Python objects 
> (including None) have an absolutely uniform API, so having refcount 
> behavior be different for different kinds of objects is not at all 
> practical.  Third, if you had more than one Py_None at the C level, 
> you'd either have to make Py_None a macro, or rewrite all the C.  If 
> you don't think that's a problem, you have absolutely no idea how much 
> C code out there is written to the Python API. 

Yes, Py_None would be a macro. All access to interpreter state would go 
through the interpreter state pointer that is always in stack, the first 
argument each C API function gets. That pointer should be named so that 
the macros will always work ("tState", for example, so that Py_None 
macro would expand to tState->mPy_None, for example).

>
>> I'm also wondering why this problem has not been addressed before?
>
>
> It has; the cure is worse than the disease.  A few years ago, somebody 
> wrote a "free-threading" version of Python, which locked individual 
> data objects rather than use the global interpreter lock.  The 
> performance for single-threaded programs was abominable, and the 
> performance gain even on multiprocessor machines was not thought worth 
> the cost.  So the project was scrapped. 

There would be no locking in my proposal, except when accessing the 
shared memory global thread state.

I don't know, I got mail about writing a PEP. It is clear that it would 
not be accepted, because it would break the existing API. The change is 
so big that I think that it has to be called a different language.

This is the last message I will make about this matter (before actually 
starting to code it), so I'm posting this to python-list as well, 
because this is too important to be ignored. Python *needs* to be 
free-threading...

Harri