another thread on Python threading

Mon Jun 4 07:32:42 EDT 2007

On Jun 4, 3:10 am, Josiah Carlson <josiah.carl... at sbcglobal.net>
wrote:
> cgwalt... at gmail.com wrote:
> > I've recently been working on an application[1] which does quite a bit
> > of searching through large data structures and string matching, and I
> > was thinking that it would help to put some of this CPU-intensive work
> > in another thread, but of course this won't work because of Python's
> > GIL.
>
> If you are doing string searching, implement the algorithm in C, and
> call out to the C (remembering to release the GIL).
>
> > There's a lot of past discussion on this, and I want to bring it up
> > again because with the work on Python 3000, I think it is worth trying
> > to take a look at what can be done to address portions of the problem
> > through language changes.
>
> Not going to happen.  All Python 3000 PEPs had a due-date at least a
> month ago (possibly even 2), so you are too late to get *any*
> substantial change in.
>
> > I remember reading (though I can't find it now) one person's attempt
> > at true multithreaded programming involved adding a mutex to all
> > object access.  The obvious question though is - why don't other true
> > multithreaded languages like Java need to lock an object when making
> > changes?
>
>  From what I understand, the Java runtime uses fine-grained locking on
> all objects.  You just don't notice it because you don't need to write
> the acquire()/release() calls.  It is done for you.  (in a similar
> fashion to Python's GIL acquisition/release when switching threads)

The problem is CPython's reference counting. Access to reference
counts must be synchronized.

Java, IronPython and Jython uses another scheme for the garbage
collector and do not need a GIL.

Changing CPython's garbage collection from reference counting to a
generational GC will be a major undertaking. There are also pros and
cons to using reference counts instead of 'modern' garbage collectors.
For example, unless there are cyclic references, one can always know
when an object is garbage collected. One also avoids periodic delays
when garbage are collected, and memory use can be more modest then a
lot of small temporary objects are being used.

Also beware that the GIL is only a problem for CPU bound code. IO
bound code is not slowed by the GIL. The Python runtime itself is a
bigger problem for CPU bound code.

In C or Fortran, writing parallell algorithms for multiprocessor
systems typically involves using OpenMP or MPI. Parallelizing
algorithms using manual threading should be discouraged. It is far
better to insert a compiler directive (#pragma omp) and let an OpenMP
compiler to the job.

There are a number of different options for exploiting multiple CPUs
from CPython, including:

- MPI (e.g. mpi4py or PyMPI)
- PyPar
- os.fork() on Linux or Unix
- subprocess.Popen
- C extensions that use OpenMP
- C extensions that spawn threads (should be discouraged!)

> They also have a nice little decorator-like thingy (I'm not a Java guy,
> so I don't know the name exactly) called 'synchronize', which locks and
> unlocks the object when accessing it through a method.

A similar Python 'synchronized' function decorator may look like this:

def synchronized(fun):
   from threading import RLock
   rl = RLock()
   def decorator(*args,**kwargs):
      with rl:
         retv = fun(*args,**kwargs)
      return retv
   return decorator

It is not possible to define a 'synchronized' block though, as Python
do not have Lisp macros :(

>
>   - Josiah
>
> > == Why hasn't __slots__ been successful? ==
>
> > I very rarely see Python code use __slots__.  I think there are
> > several reasons for this.  The first is that a lot of programs don't
> > need to optimize on this level.  The second is that it's annoying to
> > use, because it means you have to type your member variables *another*
> > time (in addition to __init__ for example), which feels very un-
> > Pythonic.
>
> > == Defining object attributes ==
>
> > In my Python code, one restriction I try to follow is to set all the
> > attributes I use for an object in __init__.   You could do this as
> > class member variables, but often I want to set them in __init__
> > anyways from constructor arguments, so "defining" them in __init__
> > means I only type them once, not twice.
>
> > One random idea is to for Python 3000, make the equivalent of
> > __slots__ the default, *but* instead gather
> > the set of attributes from all member variables set in __init__.  For
> > example, if I write:
>
> > class Foo(object):
> >   def __init__(self, bar=None):
> >     self.__baz = 20
> >     if bar:
> >       self.__bar = bar
> >     else:
> >       self.__bar = time.time()
>
> > f = Foo()
> > f.otherattr = 40  # this would be an error!  Can't add random
> > attributes not defined in __init__
>
> > I would argue that the current Python default of supporting adding
> > random attributes is almost never what you really want.  If you *do*
> > want to set random attributes, you almost certainly want to be using a
> > dictionary or a subclass of one, not an object.  What's nice about the
> > current Python is that you don't need to redundantly type things, and
> > we should preserve that while still allowing more efficient
> > implementation strategies.
>
> > = Limited threading =
>
> > Now, I realize there are a ton of other things the GIL protects other
> > than object dictionaries; with true threading you would have to touch
> > the importer, the garbage collector, verify all the C extension
> > modules, etc.  Obviously non-trivial.  What if as an initial push
> > towards real threading, Python had support for "restricted threads".
> > Essentially, restricted threads would be limited to a subset of the
> > standard library that had been verified for thread safety, would not
> > be able to import new modules, etc.
>
> > Something like this:
>
> > def datasearcher(list, queue):
> >   for item in list:
> >     if item.startswith('foo'):
> >       queue.put(item)
> >   queue.done()
>
> > vals = ['foo', 'bar']
> > queue = queue.Queue()
> > threading.start_restricted_thread(datasearcher, vals, queue)
> > def print_item(item):
> >   print item
> > queue.set_callback(print_item)
>
> > Making up some API above I know, but the point here is "datasearcher"
> > could pretty easily run in a true thread and touch very little of the
> > interpreter; only support for atomic reference counting and a
> > concurrent garbage collector would be needed.
>
> > Thoughts?
>
> > [1]http://submind.verbum.org/hotwire/wiki