Queue qsize = unreliable?

Mon Aug 9 13:56:20 EDT 2004

Ames Andreas (MPA/DF) wrote:

> Dave Brueck wrote:
>>But is it blocking in any way that you (the caller) will likely
>>notice? (probably not - the mutex is held for a relatively short
> 
> How I notice a blocking function?
> 
> 1) As a programmer:  I have to read the docs and unfortunately all too
>    often the code.

What I meant was that the fact that a lock is being acquired in the 
module implementation doesn't really affect your code (or its 
performance in most cases) in practice.

> 2) As a "caller":  I'm wasting time when blocking without a reason.

Not really because (1) the code would block only when more than one 
thread is accessing the lock at the same time, (2) there is a fairly 
small chance that more than one thread will be accessing that particular 
lock at the same time (especially in the scenarios you described 
earlier), and (3) even when there is contention for the lock it will be 
over a very brief period of time.

>>>With CPython's assertions about atomicity of certain list
>>>operations it's possible to implement a lock free queue that is
>>>even usable for the single producer or single consumer case.
>>
>>If you have the single producer and/or single consumer case, you
>>don't need a Queue at all - just use a built-in list (or in 2.4 a
>>collections.deque object).
> 
> I think I didn't make my point clear enough.  The most important word
> in the above paragraph was "usable", which means more to me than just
> thread-safe.  I'd be interested in a usable (i. e. non-polling)
> implementation of a queue without any locks.

What problem has the current implementation caused in your application? 
Or are you more concerned about what problem it may cause?

In any case, why don't you just do something like:

import Queue
class MyQ(Queue.Queue):
   def NonBlockingIsEmpty(self):
     return len(self.queue) == 0

> I just included the "else" branch to make clear that only one of
> empty's two possible outcomes is reliable.  It makes much more sense
> and it is much closer to my specific application to write:
> 
> while !q.empty():
>       doSomethingToTheWorkersOutput()
 >
 > When leaving this loop I'll go back to sleep on select.

As I said earlier: if this is your use case you don't need a queue 
object - just use a built-in list:

while workList:
   doSomethingToTheWorkersOutput()

> As to the "extremely short amount of time (relatively speaking)":  I'm
> still very impressed by Dan Kegel's c10k paper.

Yes, it is a good paper. BTW - are you really using select() as you 
mentioned above or is it some special implementation of select? (because 
if it's just an ordinary select then you're not going to be able to 
support that many connections anyway, and even if you could, using 
something else instead of select would give you much greater performance 
improvements than eliminating the Queue lock usage.

 > And following that
> one I'm not so much concerned about the amount of time that a producer
> may hold the lock (that time may very well be ignorable), when the
> single consumer tries to acquire it, but much more about the fact that
> it potentially needs at least *two* context switches per iteration
> until the one thread that feeds all others and thus that all others
> depend upon gets back to work.

So is this actually causing you a problem in practice or is this closer 
to some premature optimization? No offense intended - I ask only because 
it seems very unlikely that this is going to be one of your major 
bottlenecks.

Yes, you could end up with multiple context switches *in the worst 
case*, but you have to also consider (1) how likely the worst case 
scenario is to come up and (2) how that stacks up against other 
bottlenecks in terms of performance.

If you're using the queues to pass around socket-level work (reading and 
writing bytes) then I can tell you right now that you're not going to 
get anywhere near the 10k connection limit with that approach on 
standard CPython - the overhead of the Queue itself is too high - so 
it'd be in your best interest to profile and work on the actual 
bottlenecks first. If you're instead using the queues to pass around 
application-level work ("run this DB query", "send this response to the 
client", etc), then it's highly unlikely that the mutex acquisition in 
Queue.get(False) will have any measurable impact in overall application 
performance because - so again the best bet is to profile to find 
current, real hotspots in the code.

FWIW, Python _is_ a pretty good platform for developing relatively 
high-performance servers. I've found that, with not too much work, it's 
easy to have a clean implementation and still be on par with (and even 
beat) something like Apache both in terms of total throughput, 
connection response times, etc. BUT, if you want to approach very large 
numbers of connections you may want to base your implementation on 
Stackless Python and avoid normal Python threads (and Queues, and...) 
altogether.

-Dave