GIL in the new glossary

Thu Oct 2 15:20:46 EDT 2003

Skip, I appreciate your reply.  This is
the first time I get some information
about the issue other than a buzz-off
type response.  :-)

> -----Original Message-----
> From: Skip Montanaro [mailto:skip at pobox.com]
> Sent: Thursday, October 02, 2003 2:13 PM
> To: Luis P Caamano
> Cc: python-list at python.org
> Subject: Re: GIL in the new glossary
> 
> 
> 
>     >> The lock used by Python threads to assure that only one thread can be
>     >> run at a time. This simplifies Python by assuring that no two
>     >> processes can access the same memory at the same time. Locking the
>     >> entire interpreter makes it easier for the interpreter to be
>     >> multi-threaded, at the expense of some parallelism on multi-processor
>     >> machines.
> 
>     Luis> Some parallelism???  Wouldn't it be more accurate to say "at the
>     Luis> expense of parallelism?"  The GIL doesn't eliminate "some"
>     Luis> paralellism, it completely eliminates any chance of parallelism
>     Luis> within the same interpreter.
> 
> The global interpreter lock can explicitly be released from C code.  Lots of
> I/O code does this already.  Only one thread can be executing Python byte
> code at a time, but multiple threads can execute non-interpreter C code.
> (I'm sure someone more knowledgeable about threads will correct me if I'm
> off-base here.)

Although I understand this is possible and how it works,
it still doesn't help me at all if all I write is
python code.  My multi-threaded program works great, I take
it to a big HP SuperDome with lots of processors and it just
doesn't scale at all.  I have to go back to the drawing
board and start playing around the problem.

Eventually we won't be able to play around the problem,
a big customer will be involved and we will have enough
justification to sink in the time and resources to fix it.
Until then though ...

> 
>     >> Efforts have been made in the past to create a "free-threaded"
>     >> interpreter (one which locks shared data at a much finer
>     >> granularity), but performance suffered in the common single-processor
>     >> case.
> 
>     Luis> This is true for kernels too, which is why you see at least
>     Luis> two versions of the linux kernel in red hat distributions,
>     Luis> one with SMP enabled and one without it.
> 
>     Luis> Why not have a python_smp interpreter that allows parallelism
>     Luis> (and scalability) on SMP machines and another for uniprocessor
>     Luis> machines where paralellism is not possible?
> 
>     Luis> Yeah, yeah, I know what you're going to say ... 
> 
>     Luis> "Please submit a patch."
> 
>     Luis> Sigh, ... if only I had the time ... :-(
> 
> I think you'd need a lot more time than you think (hint: don't think in
> terms of "a patch"; think more in terms of "code fork"). ;-)

I was afraid of that.

> 
> Greg Stein (an extremely bright guy who doesn't seem to hang out in the
> Python niches much anymore - our loss) put in a fair amount of effort on
> this when 1.4 was the current Python version, so you know this is something
> people have been asking about for quite some time.  Here's a relevant
> message from Greg regarding the performance he saw:
> 
>     http://mail.python.org/pipermail/python-dev/2001-August/017099.html

Thanks for the pointer.  I just read it and it clarifies my
view of the problem.  It confirms that performance did
increase with three processors, and that it also went down
later on. 

> 
> Based upon his numbers, I'm not sure it would be worth the effort, even if
> you could get it right.  Going from 1x to 1.2x performance when adding a
> second processor hardly seems worth it.

This scenario is not new.  When I was at HP, we went
through exactly the same scenario in the HPUX kernel
with what were called "empire locks" in the initial
stages of SMP support somewhere around 1995.

Those empire locks where exactly like the GIL.  Whenever
you wanted to do some filesystem thing in the kernel,
you'd have to acquire the fs empire lock, and so on for
process stuff, vm, io, etc.

Anyway it's a long story but it's the SAME story, and
in then end, it's either justifiable or not.  If it's not
justifiable because the interpreter is too complex or we
don't have resources or whatever, then so be it, but there
is a solution.  I bet that if somebody needs scalability
really bad and has the resources for it, we would have
scalable free-threading.  It's just software.  What I
don't like is when people claim that is not solvable
or worse, not needed.  :-)

> 
> To make matters worse, Python has only gotten more complex in the time since
> Greg did his work.  In addition, a number of things added to the language or
> the library since then have quietly assumed the presence of the global
> interpreter lock - "quietly", as in there's no big old "XXX" comment in the
> code alerting would-be free threaders.  Taken together, this means there are
> many more bits of code which would need attention.  It's probably a lot
> easier to wait for the next fastest uniprocessor machine or overclock the
> one you have.

Let me put it this way ... if python supported free
threads and parallelism on multi-processor machines,
I don't think I'd ever have to write a user space
application in any other language, ever.  As it is right
now, if I need to run a server app on a big MP machine
and I need to scale accordingly, I have two choices:

a) play with IPC again (as in if we were back in
   in the 80s before pthreads) to have separate interpreters
   per processor or 

b) write the application in C or C++ where I can achieve
   parallelism.

I don't like any of those choices.  I'd like to write C code
only when writing kernel code, OS commands or utilities,
or python extensions. :-)

Once again, thanks for taking the time to reply and clarifying
my view of the issue.  Much appreciated.

--
lpc