import in threads: crashes & strange exceptions on dual core machines

Klaas mike.klaas at gmail.com
Tue Oct 31 18:29:25 EST 2006


robert wrote:
> Klaas wrote:
> > It seems clear that the import lock does not include fully-executing
> > the module contents.  To fix this, just import cookielib before the
>
> What is the exact meaning of "not include fully-executing" - regarding the examples "import cookielib" ?
> Do you really mean the import statement can return without having executed the cookielib module code fully?
> (As said, a simple deadlock is not at all my problem)

No, I mean that the import lock seems to not be held while the module
contents are being executed (which would be why you are getting
partially-initialized module in sys.modules).  Perhaps it _is_ held,
but released at various points of the import process.  Someone more
knowledgable of python internals will have to answer the question of
what _should_ be occurring.

> thanks. I will probably have to do the costly pre-import of things in main thread and spread locks as I have also no other real idea so far.

Costly?

> Yet this costs the smoothness of app startup and corrupts my believe in Python capabs of "lazy execution on demand".

If you lock your code properly, you can do the import anytime you wish

> I'd like to get a more fundamental understanding of the real problems than just a general "stay away and lock and lock everything without real understanding".

Of course.  But you have so far provided no information to that
regard--not even a stack trace.  If you suspect a bug in python, have
you submitted a bug report at sourceforge?

> * I have no real explanation why the import of a module like cookielib is not thread-safe. And in no way I can really explain the real OS-level crashes on dual cores/fast CPU's. Python may throw this and that, Python variable states maybe wrong, but how can it crash on OS-level when no extension libs are (hopefully) responsible?

If you are certain (and not just hopeful) that no extension modules are
involved, this points to a bug in python.

> * The Import Lock should be a very hard lock: As soon as any thread imports something, all other threads are guaranteed to be out of any imports. A dead lock is not the problem here.

What do you mean by "should"?  Is this based on your knowledge of
python internals?

> * the things in my code patter are function local code except "opener = urlcookie_openers.get(user)" and "urlcookie_openers[user] = opener" : Simple dictionary accesses which are atomic from all my knowledge and experience. I think, I have thought about enough, what could be not thread safe. The only questionable things have to do with rare change of some globals,

It is very easy for dictionary accesses to be thread-unsafe, as they
can call into python-level __hash__ and __eq__ code.  If this happens,
a context switch is possible.  Are you sure this isn't the case?

> but this has  not at all to do with the severe problems here and could only affect e.g wrong > url2_proxy or double/unecessary re-creation of an opener, which is uncritical in my app.

Your code contains the following pattern, which can cause any number of
application errors, depending on the app:

a = getA()
if a is None:
   <lots of code>
   setA()

If duplicating the creation of an opener isn't a problem, why not just
create one for a user to begin with?

> I'm still puzzled and suspect there is a major problem in Python, maybe in win32ui or - no idea ... ?

Python does a relatively decent job of maintaining thread security for
its most basic operations, but this is no substitute for caring about
thread safety in your own application.  It is only true in the most
basic cases that a single line of code corresponds to a single opcode,
and determining that the code is correct is even more difficult than
when using explicit locking. The advantages just aren't worth it:

$ python -m timeit -s "import thread; t=thread.allocate_lock()"
"t.acquire(); t.release()"
1000000 loops, best of 3: 1.34 usec per loop

Note that this is actually less expensive than the handle of python
code that dummy_threading does:

$ python -m timeit -s "import dummy_threading; t =
dummy_threading.Lock()" "t.acquire(); t.release()"
100000 loops, best of 3: 2.05 usec per loop

Note that this _doesn't_ mean that you should "lock everything without
real understanding", but in my experience there is very little
meaningful python code that the GIL locks adequately.

As for your crashes, those should be investigated.  But without really
any hints, I don't see that happening.  If you can't reproduce it, it
seems unlikely that anyone else will be able to.

-Mike




More information about the Python-list mailing list