What about an EXPLICIT naming scheme for built-ins?

Alex Martelli aleaxit at yahoo.com
Sat Sep 4 05:00:29 EDT 2004


Carlos Ribeiro <carribeiro at gmail.com> wrote:
   ...
> Today, sorted() returns a brand new sorted sequence. So it already
> takes the full memory and time penalty while doing so. However, **if
> sorted() was implemented as a generator**, it would have some
> advantage in special cases:
> 
> -- sorting long sequences to use only a few of the first elements. For
> example, if you need to sort a long list to get the top 10 elements. A
> generator would yield only those elements, saving a lot of time that
> would go into sorting everything else.

Ridiculous!  Since sorted doesn't get TOLD you will only get the first
10 elements, it DOES have to prepare to do everything anyway.  If it
used a heap sort algorithm "just in case" you only needed the first 10,
it would lose big in the much more common cases in which you needed the
whole list.

Maybe sorted should grow an optional keyword parameter saying you will
only need the first N items at most, but if it doesn't, any attempt to
save resources based on that hypothesis are doomed.

If you need to loop on the first few smallest items of a long sequence,
you'll find good solutions in the cookbook (and better ones in the 2nd
printed edition I'm supposed to be coediting rather than surfing
c.l.py...:-), but the sorted builtin is no use for that.

> 
> -- overall responsiveness in applications where sort is frequently
> called. Instead of waiting for the full sort at each sorted() call,
> the running time of the sorting method would be divided between the

Again, you're assuming (implicitly) heapsort and ignoring the huge
performance benefit of Peters' "natural mergesort" as implemented in the
current sorted built-in (and list.sort method).  In all the many cases
in which the whole sorted sequence is wanted, natural mergesort shines,
particularly in the common practical cases in which the input sequence
already has some pieces ordered, while heapsort gets no prize at all.

> calls to the generator. Interactive and multithreaded applications can
> benefit of this approach (I'm not sure if sorted() grabs the GIL or
> not, but if it does, then it's definitely something to look at). (but
> then, again, it may be a poor design choice to use sorted() in these
> scenarios -- a heap or some other similar structure would be better
> suited. But I digress here).

You don't digress at all!  If you don't need to minimize total resources
spent in sorting, DO consider a heap -- heapq is wonderful (in 2.4 most
particularly) and makes heapsort easy as pie.  But if you DO need to
minimize total resources, sorted is quite another ball of wax.

> 
> Modifying the sorting algorithm to work as a generator is not as hard
> as it seems. For example, Quicksort can be easily adapted. It's just a
> matter to yield the results as soon as the "left" partition is sorted.

That would be a horrid choice.  heapsort would perform MUCH better in
this case.  A quicksort with systematic precedence to the left partition
will have some very bad cases (which heapsort doesn't) and be quite hard
to program (heapsort IS a snap thanks to heapq).

> p.s. I opted to send a copy of this answer to the list, I think it's
> good for the discussion. I hope you don't mind.

If you hadn't responded publically I'd never would have got a chance to
object, so of course I don't mind!-)


Alex



More information about the Python-list mailing list