[Python-ideas] Fast sum() for non-numbers

Stephen J. Turnbull stephen at xemacs.org
Fri Jul 19 06:12:14 CEST 2013


Sergey writes:

 > >> Do you like having many broken tools?
 > > 
 > > And please stop this.  sum() is not broken, any more than a
 > > screwdriver is broken just because it is rather inefficient when used
 > > to pound in nails.
 > 
 > No, that would be the case if I was using sum for something that it's
 > not intended to do, [...]

I'll say it one last time: this kind of answer does not help your case
at all.  I assure you "proof by repeated assertion" doesn't work here.
The "intent" of sum() is clearly documented: it computes the sum of an
iterable of numbers.  From the library reference for 2.6 (it hasn't
changed up to 3.4.0a, except to refer to itertools.chain() and remove
the reference to reduce()):

 sum(iterable[, start])

    Sums start and the items of an iterable from left to right and
    returns the total. start defaults to 0. The iterable‘s items are
    normally numbers, and are not allowed to be strings. The fast,
    correct way to concatenate a sequence of strings is by calling
    ''.join(sequence). Note that sum(range(n), m) is equivalent to
    reduce(operator.add, range(n), m) To add floating point values
    with extended precision, see math.fsum().

True, it does not deny that sum() *could* be used for certain non-
numbers, but *intent* is clear: sum() adds up a sequence of numbers.
Generalizing it to efficiently handle concatenation of sequences is an
enhancement, not a bugfix.

Your argument is simply that we *could* use sum() for anything that
provides the __add__ method, and with __iadd__ it can be efficient in
time and space in many cases.  You point to the fact that some
programmers do use sum() inefficiently, and suggest that we remove
this pitfall by making sum() efficient for as many cases as possible.
Such enhancements are certainly of interest to python-dev, but that's
not sufficient.

The counterclaim that matters is that "sum" is a *bad name* for
functions that aggregate iterables, unless the type of the elements is
(or can be coerced to) numerical.  It follows that use of that name
makes programs hard to read, and it should be deprecated in favor of
readable idioms.[1]

Until you successfully address that counterclaim, you are going to
fail to persuade enough of the people who matter.



Footnotes: 
[1]  Note that backward compatibility, not weakness of the "bad name"
argument, is why we compromise by deprecating in words rather than
making it impossible to use sum on "wrong" types.



More information about the Python-ideas mailing list