On benchmarks, heaps, priority queues

Tim Peters tim.peters at gmail.com
Thu Jan 27 12:40:27 EST 2005


[aaronwmail-usenet at yahoo.com, on
 <http://xsdb.sourceforge.net/bench/pq3.py>
]
> Yes I know in theory the insertion sort approach should be bad for
> large enough values, but the weird thing is that if you mix inserts and
> deletes (with enough deletes) even 1M elements is not a large enough
> value.  Anyway, for 10K or less the insertion sort priority queue
> implementation seems to always be better.
>
> Weird.  -- Aaron Watters

It's the distribution of queue sizes that matters here, not the total
number of elements thrown at the queue.  Your code mixes "self" and
"this" seemingly at random, and the __len__ methods in particular
often don't agree about which is in use.  If you repair that, and
instrument mixBench() to keep track of queue size statistics, you'll
find that even at 1000000, the queue at the top of the loop never
exceeds 30 entries, and has a mean size less than 3.  If you expect
your queues to have only a couple elements on average, then sure, the
simplest thing that could possibly work may be the fastest too.  And
if your queues never exceed 30 entries, then the poor O() behavior of
"interior" list insertion doesn't matter either.

The "n%5<3" part should be true about 60% of the time if n were truly
random, which is a strong bias in mixBench pushing toward keeping the
queue very small.  Can't guess how close n is to being random, but the
random-number generator is suspect (the multiplier is extremely
small).  Suspect it's more the strong bias than the dubious generator
accounting for the almost-always near-trivial queue sizes in
mixBench(), though.

Note that in 2.4, bisect.insort() is also coded in C, so moving to 2.4
gives a boost to all the builtin gimmicks used here.

On my box (with self-vs-this straightened out, and mixBench()
instrumented to track queue size stats), using Python 2.4, PQPython23
"wins" maxBench 1000000 anyway:

BENCHMARKS FOR 1000000

mixBench
    queue min 0 mean 2.517008 max 30
    __main__.PQPython23 on 1000000 elapsed 4.54699993134 got 500001
    queue min 0 mean 2.517008 max 30
    __main__.PQ0 on 1000000 elapsed 4.79699993134 got 500001
    queue min 0 mean 2.517008 max 30
    __main__.PQueue on 1000000 elapsed 6.54699993134 got 500001



More information about the Python-list mailing list