What about an EXPLICIT naming scheme for built-ins?

Sat Sep 4 16:35:09 EDT 2004

Ok, I feel we're almost over on this one. I'll try summarize and reply
to a lot of comments in a single message - it's the best I can do now.

This thread started with a proposal for a explicit naming scheme for
builtins. It ended up being pretty much a discussion about sorted()
and reversed(); why are they named like that, and what options are
available to change it. As it evolved, it turned out into a discussion
about the merits of a sorting generator as a solution for some
particular scenarios. What follows here is my analysis of each part of
the problem. However, for the unpatient and faint of heart I will
offer my conclusion in advance. If you want to read my reasoning,
please be welcome.

* I believe that the current naming (sorted() and reversed()) is not
good enough, and can be improved;

* Naming the current builtins sorted() and ireversed() would make
things more consistent, while reserving a few options for future
implementation;

* A sorting generator can be a useful in some scenarios. A cookbook
solution (in pure Python, using the heapq library as a backend) is a
good proof of concept and may be enough for most needs.

--------------------------------------------------------------

1. (Re)Naming sorted() and reversed()
-------------------------------------

I'm not totally satisfied with the current choice of names,
consistency-wise. The differences between sorted() and reversed() are
not immediatelly evident from their names alone. In my opinion:

-- sorted() and reversed() should work similarly, returning lists.
**This is not to say that a builtin to return a reversed list would be
useful**. Clearly, returning an iterator for the reverse list is the
best and most useful choice -- only the name that is not good enough.

-- I've previously proposed naming the iterated versions xsorted() and
xreversed(), but that was a bad idea -- the correct choice would be
isorted() and ireversed(), to keep with itertools naming scheme.
However, **it does not automatically means that a isorted() builtin
would be useful** (more on this later).

In short, I believe we have a (small) problem, and naming the current
builtins sorted() and ireversed() would solve it.

2. Deciding whether isorted() and reversed() would make sense
-------------------------------------------------------------

Implicit in the analysis of the item (1) is the assumption that there
are four possible variations regarding sort and reverse, iterated and
list-returning versions. Of those, two are already accepted as
builtins, based on the actual real-world need to solve common
situations:

-- sorted() returns a sorted list
-- ireversed() returns a iterator

(please note that I'm following my own proposed naming scheme)

Two variations are not currently implemented as builtins:

-- isorted() -- would return a iterator
-- reversed() -- would return a reversed list

Although all variations are possible, it does not mean that they are
useful or desirable. The missing variations can be easily written in
terms of the existing ones. Currently, sorted() returns a list that
can be iterated over. However, if only a part of the sorted list is
needed, then sorted() incurs a penalty -- it always sorts the entire
list (more on this later). As for reversed(), a simple idiom can be
used to return a new, reversed list:

reverse_list = [x for x in reversed(mylist)]

To put the question bluntly, **is there any reason to implement either
isorted() or reversed()**? The arguments (pro and against) are as
follows:

PRO:
-- it's possible, and relatively easy to implement
-- completeness
-- consistency
-- native implementation would perform better in some cases
-- the existing idioms may not be immediately clear to the novice (see
example above)

AGAINST:
-- just because it's possible does not means that it's a good idea
-- more bloat in the standard library
-- more builtins to pollute the namespace
-- more thing for a novice to learn
-- quoting Alex Martelli: practicality beats purity

Given the arguments, I agree that it's better to leave it as it is
(but changing the name of reversed() to ireversed(), as proposed on
(1)).

3. The case for isorted()
-------------------------

Over the last few posts of the thread the topic degenerated into a
discussion about the relative merits of isorted() -- the generator
version of sorted(). To sum up what was said:

Q1: Is it possible to implement a sorting generator?

To summarize what was said,

-- a sorting algorithm *can* be adapted to work as a generator, **BUT**
-- the current sorting algorithm used internally in Python is not
adequate for this particular hack, **AND**
-- this particular problem can be easily solved using heaps (that are
now part of the standard library).

Q2: Is it useful in the real world?

A sorting generator is useful IF:

-- you know you aren't going to need the entire sorted list;
-- just a few elements will do.

A particular situation is where you don't know in advance exactly how
many elements you need; in this case, a sorting generator is probably
the best approach.

My gut feeling is that such a builtin would be useful, and in fact,
could help to simplify existing code that uses a much more complicated
and verbose approach. But then, I don't have Guido's track record when
it comes to instinctive decisions on language design.

My best bet, now, is writing a cookbook solution to implement a
"pseudo-sort-generator" using the heapq library. It's a good proof of
concept, and may help to illuminate the question with a practical
tool.

p.s.:

[Alex Martelli]
> If you hadn't responded publically I'd never would have 
> got a chance to object, so of course I don't mind!-)

It's always a pleasure to have you participating, even after you have
effectively killed most of my arguments :-) I just feel honored ;-)

-- 
Carlos Ribeiro
blog: http://pythonnotes.blogspot.com
mail: carribeiro at gmail.com