ANN: blist 1.2.0

Thu Jul 22 18:25:07 EDT 2010

On 7/21/2010 6:56 PM, Daniel Stutzbach wrote:
> On Wed, Jul 21, 2010 at 5:27 PM, Terry Reedy <tjreedy at udel.edu
> <mailto:tjreedy at udel.edu>> wrote:
>
>     These tests use random numbers with a constant, relatively high
>     density of 25%, which is favorable to radix sort. What if you do the
>     same test with a constant range of, say, 1000000000 (1 billion) or
>     even a trillion or quadrillion. Or how about sorting a thousand
>     128bit ip6 addresses? Which wins for that?

> blist switches to radix sort only if the keys contain only floats or
> only integers that fit into a C long.  If the integers get too big, it
> reverts to timsort.
>
> When using a sort key, the decorate-sort-undecorate pattern requires
> touching every object once before the sort.  The check for a consistent
> type is done inexpensively during the decorate phase.

And it is pretty cheap even when there is no decorate phase.

> Here's an example result with a density of only 1%, where the radix sort
> is around 4 times as fast:
>
> otto:~/src/blist$ python3.1 -m timeit -s 'import random' -s 'x =
> [random.randrange(10000*100) for i in range(10000)]' 'y = list(x)'
> 'y.sort(key=float)'
> 100 loops, best of 3: 12.4 msec per loop
>
> otto:~/src/blist$ python3.1 -m timeit -s 'from blist import blist' -s
> 'import random' -s 'x = [random.randrange(10000*100) for i in
> range(10000)]' 'y = blist(x)' 'y.sort(key=float)'
> 100 loops, best of 3: 3.4 msec per loop

> And a density of 0.01%:
>
> otto:~/src/blist$ python3.1 -m timeit -s 'import random' -s 'x =
> [random.randrange(10000*10000) for i in range(10000)]' 'y = list(x)'
> 'y.sort(key=float)'
> 100 loops, best of 3: 12 msec per loop
>
> otto:~/src/blist$ python3.1 -m timeit -s 'from blist import blist' -s
> 'import random' -s 'x = [random.randrange(10000*10000) for i in
> range(10000)]' 'y = blist(x)' 'y.sort(key=float)'
> 100 loops, best of 3: 3.52 msec per loop

Looks good so far. I would like to see that repeated all the way down to 
range(10) to make sure people doing millions of small sorts were not 
getting screwed.

>     list.sort is (near) linear for lists that are mostly ordered. I
>     think Tim's writeup about this in in the source. For instance, if
>     one extends a sorted with 1% random additions and resorts, list.sort
>     will skip the sorted part, sort the additions, and merge them back
>     in. Radix sort ignores pre-existing order. So either a lot of
>     comparitive profiling would be needed for auto-selection of sort
>     method, or it should be user selectable.
> I've tested exactly that scenario.  For already-ordered lists, radix
> sort and timsort tie.

Tim tested about 7 list structure scenarios with a range of lengths. If 
you post a patch, I would request that you do the same.

Have you run a patched version against test_sort.py? I believe it mostly 
tests lists of small ints, so radix methods would mostly be switched in.

If it were added and the switching were internal, new test cases would 
be needed to test test timsort.

>>     Does your radix sort meet the stability guarantee of list.sort?
> Yes.

Great. There is, of course, a test for that in the suite.

-- 
Terry Jan Reedy