What about an EXPLICIT naming scheme for built-ins?

Alex Martelli aleaxit at yahoo.com
Mon Sep 6 04:14:03 EDT 2004


Tim Peters <tim.peters at gmail.com> wrote:
   ...
\> >> nsmallest() functions to heapq for 2.4.  Because heapq implements
> >> a min-heap, only nlargest() is really "natural" for this module, but
> >> nsmallest() does what it can ...
> 
> [Tim Peters]
> > Hmmm, am I reading this wrong...?  getting the min N times seems
> > easy, wouldn't that be nsmallest...?
> 
> You're reading it right, but it's a bit paradoxical.  Suppose you want

Ah, VERY instructive, thanks!  One more thing that should become a
Cookbook recipe...

> > heapq _should_ have reversed= like list.sort, to make a maxheap
> > almost as easy as a minheap (and key= too of course), but that's
> > another fight^H^H^H^H^H debate...
> 
> At that point I'd regret not making these things classes.  They were
> meant to be simple.  Maybe fancier stuff can go in the new-in-2.4
> collections module (which, again thanks to Raymond, has a new C-coded
> deque type with constant-time insert and remove from both ends, and
> regardless of access pattern (no "bad cases") -- and, e.g.,
> Queue.Queue uses collections.deque as its container type now).

Yep, collections.heap would definitely appear to be warranted (as no
doubt would others) but I fear it may be a little bit late to shoehorn
it into 2.4, what with the first beta being mere weeks away, sigh.


> > And what about the heapsorts...?  It probably depends on what you
> > mean by 'usually', but...:
   ...
> > on my old-ish iBook with Python 2.4 alpha 3 I see...:
> > 
> > $ python ~/cb/timeit.py -s'import s' 's.top3_heapq(s.x1)'
> > 1000 loops, best of 3: 656 usec per loop
> > $ python ~/cb/timeit.py -s'import s' 's.top3_sort(s.x1)'
> > 100 loops, best of 3: 4e+03 usec per loop
> 
> The heapq methods are practical (as you just demonstrated).  The

Yep (even the "unnatural" nsmallest...).

> recursive-generator mergesorts really aren't.  Besides just the

Right, and I didn't mean to claim they were.

> overhead of Python, they've got the overhead of creating ~= len(list)
> generators, suspending and resuming them incessantly.  By the time the
> list is big enough to overcome all those time overheads, chances are
> you'll start running out of RAM.

I've used a mergesort with generators once, back in 2.2 times, on a
gigabyte-plus binary file at a time when I couldn't possibly have fit it
in RAM -- a pretty naive N-tape mergesort (where the tapes were actually
temporary diskfiles, of course), where each "tape" got sorted as a
Python list before getting written out to disk on the first pass.  No
doubt a very different scenario from the one being discussed here...


Alex



More information about the Python-list mailing list