ANN: blist 1.2.0

Thu Jul 22 19:02:52 EDT 2010

On 22/07/2010 23:25, Terry Reedy wrote:
> On 7/21/2010 6:56 PM, Daniel Stutzbach wrote:
>> On Wed, Jul 21, 2010 at 5:27 PM, Terry Reedy <tjreedy at udel.edu
>> <mailto:tjreedy at udel.edu>> wrote:
>>
>> These tests use random numbers with a constant, relatively high
>> density of 25%, which is favorable to radix sort. What if you do the
>> same test with a constant range of, say, 1000000000 (1 billion) or
>> even a trillion or quadrillion. Or how about sorting a thousand
>> 128bit ip6 addresses? Which wins for that?
>
>> blist switches to radix sort only if the keys contain only floats or
>> only integers that fit into a C long. If the integers get too big, it
>> reverts to timsort.
>>
>> When using a sort key, the decorate-sort-undecorate pattern requires
>> touching every object once before the sort. The check for a consistent
>> type is done inexpensively during the decorate phase.
>
> And it is pretty cheap even when there is no decorate phase.
>
>> Here's an example result with a density of only 1%, where the radix sort
>> is around 4 times as fast:
>>
>> otto:~/src/blist$ python3.1 -m timeit -s 'import random' -s 'x =
>> [random.randrange(10000*100) for i in range(10000)]' 'y = list(x)'
>> 'y.sort(key=float)'
>> 100 loops, best of 3: 12.4 msec per loop
>>
>> otto:~/src/blist$ python3.1 -m timeit -s 'from blist import blist' -s
>> 'import random' -s 'x = [random.randrange(10000*100) for i in
>> range(10000)]' 'y = blist(x)' 'y.sort(key=float)'
>> 100 loops, best of 3: 3.4 msec per loop
>
>> And a density of 0.01%:
>>
>> otto:~/src/blist$ python3.1 -m timeit -s 'import random' -s 'x =
>> [random.randrange(10000*10000) for i in range(10000)]' 'y = list(x)'
>> 'y.sort(key=float)'
>> 100 loops, best of 3: 12 msec per loop
>>
>> otto:~/src/blist$ python3.1 -m timeit -s 'from blist import blist' -s
>> 'import random' -s 'x = [random.randrange(10000*10000) for i in
>> range(10000)]' 'y = blist(x)' 'y.sort(key=float)'
>> 100 loops, best of 3: 3.52 msec per loop
>
> Looks good so far. I would like to see that repeated all the way down to
> range(10) to make sure people doing millions of small sorts were not
> getting screwed.
>
>> list.sort is (near) linear for lists that are mostly ordered. I
>> think Tim's writeup about this in in the source. For instance, if
>> one extends a sorted with 1% random additions and resorts, list.sort
>> will skip the sorted part, sort the additions, and merge them back
>> in. Radix sort ignores pre-existing order. So either a lot of
>> comparitive profiling would be needed for auto-selection of sort
>> method, or it should be user selectable.
>> I've tested exactly that scenario. For already-ordered lists, radix
>> sort and timsort tie.
>
> Tim tested about 7 list structure scenarios with a range of lengths. If
> you post a patch, I would request that you do the same.
>
> Have you run a patched version against test_sort.py? I believe it mostly
> tests lists of small ints, so radix methods would mostly be switched in.
>
> If it were added and the switching were internal, new test cases would
> be needed to test test timsort.
>
>>> Does your radix sort meet the stability guarantee of list.sort?
>> Yes.
>
> Great. There is, of course, a test for that in the suite.
>

Can I please book front row tickets for the shoot out between the 
reigning heavyweight champ Timbot and the challenger, the up and coming 
Danbot? :)

Kindest regards.

Mark Lawrence.