[Web-SIG] Threading and client-side support

Mon Oct 27 13:47:49 EST 2003

On Monday, October 27, 2003, at 08:45 AM, John J Lee wrote:
> [...]
>> urlopen_lock = threading.Lock()
>> def urlopen(url, data=None):
> [...]
>
> OK, thanks, that's basically as my vague understanding had it, but I 
> had
> the impression that there were all kinds of flavours of thread-safety,
> guaranteeing various subtly different things?  I guess I've got some
> reading to do...

Different parts of the system may be threadsafe, while others are not.  
For instance DB-API has threadsafety "levels", which is just a way of 
indicating which parts of the system are threadsafe, e.g., level 0 
means nothing is threadsafe, level 1 means connections aren't 
threadsafe so you have to use one connection for each thread, and 
higher levels mean that objects deeper in the system become threadsafe. 
  The analog of level 0 is bad, because you have to serialize all 
operations for the entire process.  Level 1 isn't so bad (it's what 
most DB-API drivers have), it just means you have to create a new 
handler/connection/whatever object for each thread (but you have to be 
very explicit about that requirement).  Or if object creation is 
expensive you have to do pooling, which is an incentive to make object 
creation cheap.

> Some thinking out loud in case anybody cares to help clear up my 
> current
> confusion:
>
> Hmm, urllib2 doesn't do what your example does, but I suppose
> OpenerDirectors don't currently have any state that could get lost in a
> race condition in that particular case.  That would change with cookie
> handling.

I'm not sure about urllib2 in particular, but anything you initialize 
at the module level doesn't have to be protected.  So in ClientCookie 
if you didn't lazily create the opener, it wouldn't be a problem.  Or, 
if it's no big deal if you recreate the object twice then it's not a 
problem -- just unnecessarily recreating an object because of a very 
specific race condition isn't a problem.  But if that meant that one of 
the objects created got lost, but maybe someone would still have a 
reference to that object (so it wasn't *completely* lost), then that 
would be a problem (and probably a very hard to debug problem if you 
encounter it).

> Am I going to have a hard time spotting all the places where I need 
> locks?
> I can't see any other place where I'd need locks other than in 
> CookieJar.
> I suppose I need to lock all access to all CookieJar methods, so that
> neither reading or writing state can happen whenever CookieJar state is
> changing?  I suppose I'd also need to just label the .cookies 
> attribute as
> non-threadsafe (or get rid of it, or add a __getattr__ to allow 
> locking it
> -- yuck).  Can I justify saying that some of this is the application's
> problem?  For example, perhaps the .filename and attribute of CookieJar
> could mess things up if altered by one thread while another thread was
> reading it in order to open a file?  Is it the application's own stupid
> fault if it fails to lock access to that attribute in cases where that
> might happen, or is it CookieJar's problem?

You can't be sure of what concurrency expectations the application has. 
  But in general reads don't have to be protected, unless someone is 
reading multiple things and expecting consistency between those reads.  
If it's a problem that you read value A, then someone changes the 
related value B in another thread, then you read B and it doesn't fit 
with A, then there's a threading issue for a read.  Andrew pointed out 
a possible example of this with cookies and expiration.

--
Ian Bicking | ianb at colorstudy.com | http://blog.ianbicking.org