Lockless algorithms in python (Nothing to do with GIL)

Mon Jun 28 20:30:29 EDT 2010

On Mon, 2010-06-28 at 16:46 -0700, Zac Burns wrote:
> In my experience it is far more expensive to allocate a lock in python
> then it is the types that use them. Here are some examples:
> 
> >>> timeit.timeit('Lock()', 'from threading import Lock')
> 1.4449114807669048
> 
> >>> timeit.timeit('dict()')
> 0.2821554294221187
> 
> >>> timeit.timeit('list()')
> 0.17358153222312467
> 
> 
> Granted, I know these types do not allocate a lock - but a similarly
> defined user type would need to allocate a lock to guarantee
> thread-safety.

Sure, but I think you're timing the wrong thing here.  You would only
allocate the lock relatively rarely - it's the overhead of *acquiring*
the lock that's the real problem.

   rfk at durian:~$ python -m timeit -s "from threading import Lock; l = Lock()" "l.acquire(); l.release()"
   1000000 loops, best of 3: 0.378 usec per loop

Compare to one of my favourite dict methods, setdefault:

   rfk at durian:~$ python -m timeit -s "d = dict()" "d.setdefault('hello','world')"
   10000000 loops, best of 3: 0.189 usec per loop

In the same ballpark really.

> I have not done a great deal of research into lockless algorithms
> thusfar, but would it be worthwhile to investigate implementing some
> of them in python to optimize performance critical sections of code?
> 
> Many lockless algorithms that I have looked at thusfar require a
> "Compare and Swap" operation. Does python have an equivalent to this?
> Is python high level enough that it has something better than this or
> it simply doesn't need it?

I've managed to avoid locking in some cases by using dict.setdefault()
as a kind of atomic test-and-set operation.  It's not a full
compare-and-swap but you could implement a simple locking scheme using
it like this:

  class PretendLock:
      def __init__(self):
          self.d = dict()
      def acquire(self):
          myid = threading.currentThread().ident
          while self.d.setdefault("owner",myid) != myid:
              pass
      def release(self):
          del self.d["owner"]

This depends on some peculiarities of the CPython implementation that
aren't guaranteed by the language spec - namely that thread switches
can't happen while executing C code, that dict.setdefault is implemented
in C, and that is can calculate the hash of a string key without calling
back out to python code.

Usually, it's not worth it.  The one time I use this regularly is for
lazy object initialisation, e.g. something like this:

  def get_my_object(key):
      try:
         return objects[key]
      except KeyError:
         return objects.setdefault(key,create_my_object(key))

If two threads happen to try initialising the object at the same time,
only one will wind up in the dict and being used; the other will be
discarded and garbage-collected.

  Cheers,

     Ryan

-- 
Ryan Kelly
http://www.rfk.id.au  |  This message is digitally signed. Please visit
ryan at rfk.id.au        |  http://www.rfk.id.au/ramblings/gpg/ for details

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-list/attachments/20100629/423143bf/attachment-0001.sig>