[Python-ideas] Another attempt at a sum() alternative: the concatenation protocol

Oscar Benjamin oscar.j.benjamin at gmail.com
Tue Jul 16 12:21:14 CEST 2013


On 16 July 2013 07:50, Nick Coghlan <ncoghlan at gmail.com> wrote:
> I haven't been following the sum() threads fully, but something Ron
> suggested gave me an idea for a concatenation API and protocol. I
> think we may also be able to use a keyword-only argument to solve the
> old string.join vs str.join problem in a more intuitive way.

The sum() threads have highlighted one and only one problem which is
that people are often using (or at least suggesting to use) sum() in
order to concatenate sequences even though it has quadratic
performance for this. The stdlib already has a solution for this:
chain. No one in the sum threads has raised any issue with using chain
(or chain.from_iterable) except to argue that it is not widely used.

If people are using sum() to concatenate lists then this should be
taken not as evidence that a new solution needs to be found but as
evidence that chain is not sufficiently well-known. The obvious
solution to that is not to implement a new protocol but to make the
existing solution more well known i.e. move chain.from_iterable to
builtins and rename it (the obvious choice being concat).

>     def concat(start, iterable, *, interleave=None):
>         try:
>             build = start.__concat__
>         except AttributeError:
>             result = start
>             if interleave is None:
>                 for x in iterable:
>                     result += x
>             else:
>                 for x in iterable:
>                     result += interleave
>                     result += x
>         else:
>             result = build(iterable, interleave=interleave)

That doesn't seem like a very nice signature e.g.:

   concat(lines[0], lines[1:], interleave='\n')

is not as good as

    '\n'.join(lines)

It's worse with an iterator:

    it = iter(iterable)
    try:
        start = next(it)
    except StopIteration:
        result = ''
    else:
        result = concat(start, it, interleave=sep)

Or have I misunderstood?

> If implementing this as a third party API you'd use a tool like
> functools.singledispatch (which has a backport available on PyPI)
> rather than defining a new protocol. Registering implementations for
> the immutable builtin types like str, bytes and tuple would then allow
> those to be handled efficiently, just as if they provided appropriate
> __concat__ implementations.

Since they all expose the iterator protocol and can be built from
iterators, chain already solves the problem for tuple, list and many
more non-string type sequences in an easily extensible way. String
type sequences have different constructor signatures so they use join
methods instead. There's no point in special casing chain (as happens
in sum) to check for str/bytes/etc since it clearly doesn't do what
you wanted:

>>> from itertools import chain
>>> str(chain(['123', '456']))
'<itertools.chain object at 0x7f44ddda88d0>'
>>> bytes(chain(['123', '456']))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object cannot be interpreted as an integer
>>> bytearray(chain(['123', '456']))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: an integer is required

> A simple "use sum for numbers, concat for containers" approach is
> simpler and clearer than trying to coerce sum into being fast for both
> when its assumptions are thoroughly grounded in manipulating numbers
> rather than containers.

Use sum for numbers, join for strings, and chain for other sequences
(even though the equivalent operation can be invoked with + or += in
all cases).


Oscar


More information about the Python-ideas mailing list