sorting many arrays from one...

Alex Martelli aleax at aleax.it
Tue Jul 9 10:50:35 EDT 2002


Shagshag13 wrote:

> 
> "Duncan Booth" <duncan at NOSPAMrcp.co.uk>
> Alex Martelli <aleax at aleax.it>
> 
> thanks to you two !
> 
> so if i want to do this in a generic function, i should use something like
> :
> 
> def sortManyFromOne(driven, compare = None, *others):
> 
>  aux = zip(driven, *others)
>  if compare is None:
>   aux.sort()
>  else:
>   aux.sort(compare)
>  driven, *others = map(list, zip(*aux))
> 
> but this last line don't work and i don't know how to fix it... unless i
> replace it by :
> 
> return map(list, zip(*aux))

Better -- let your caller do the unpacking as needed.


> is it a good idea if we consider that my "driven" and "others" could
> contain more than 300,000 elements each ? (i'm thinking of using
> array.array instead of list)

Then ABSOLUTELY forget the idea of passing a compare function to aux.sort!!!
This would drag the sorting down to a HORRIBLE crawl.  COnsider (python 
2.2.1, -O, on 1.4 GHz Athlon with DDR RAM to burn):

>>> import random
>>> k3 = range(300000)
>>> random.shuffle(k3)
>>> xx = k3[:]
>>> a=time.clock(); xx(300000).sort(); b=time.clock(); print b-a
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: 'list' object is not callable
>>> a=time.clock(); xx.sort(); b=time.clock(); print b-a
0.81
>>> xx = k3[:]
>>> def f(a,b): return cmp(a,b)
...
>>> a=time.clock(); xx.sort(f); b=time.clock(); print b-a
10.97
>>>

even for this super-simplified case where the sequence to sort is so
clean and the comparison function does nothing more than delegate to
the same cmp that would be used anyway, you pay well more than an
order of magnitude for the privilege of passing a compare function
to the sort method!!!

Don't do it.  Use D-S-U, as AMPLY explained in the searching and
sorting chapter of the Python Cookbook (which I'm told you can
order already, though physically I think it's going to be launched
at OSCON).  I.e., if your calling code needs flexibility, let
that flexibility be obtained by passing a suitably modified version
of the 'driven' array, NOT with a comparison-function for sort.

This is one of the top few crucial performance tips in Python:

1. do NOT put together a big string by concatenating many small
   ones with + or += -- prepare a list and ''.join it when done

2. do NOT do lots of O(N) searches and anywhere-but-the-end
   insertions and deletions in lists -- consider dictionaries

3. do NOT pass a comparison function to .sort for lists of any
   substantial size -- use decorate/sort/undecorate (DSU)

4. there is no point 4 (well, there ARE many other small
   issues, but none as frequent and important as the first 3)


Alex




More information about the Python-list mailing list