[Python-ideas] Start argument for itertools.accumulate() [Was: Proposal: A Reduce-Map Comprehension and a "last" builtin]

Nick Coghlan ncoghlan at gmail.com
Sun Apr 8 00:14:52 EDT 2018


On 8 April 2018 at 13:17, Tim Peters <tim.peters at gmail.com> wrote:
> [Nick Coghlan <ncoghlan at gmail.com>]
>> So I now think that having "start" as a parameter to one but not the
>> other, counts as a genuine API discrepancy.
>
> Genuine but minor ;-)

Agreed :)

>> Providing start to accumulate would then mean the same thing as
>> providing it to sum(): it would change the basis point for the first
>> addition operation, but it wouldn't change the *number* of cumulative
>> sums produced.
>
> That makes no sense to me.  `sum()` with a `start` argument always
> returns a single result, even if the iterable is empty.
>
>>>> sum([], 42)
> 42

Right, but if itertools.accumulate() had the semantics of starting
with a sum() over an empty iterable, then it would always start with
an initial zero.

It doesn't - it starts with "0+first_item", so the length of the
output iterator matches the number of items in the input iterable:

    >>> list(accumulate([]))
    []
    >>> list(accumulate([1, 2, 3, 4]))
    [1, 3, 6, 10]

That matches the output you'd get from a naive O(n^2) implementation
of cumulative sums:

    data = list(iterable)
    for stop in range(1, len(iterable)):
        yield sum(data[:stop])

So if the new parameter were to be called start, then I'd expect the
semantics to be equivalent to:

    data = list(iterable)
    for stop in range(1, len(iterable)):
        yield sum(data[:stop], start=start)

rather than the version Raymond posted at the top of the thread (where
setting start explicitly also implicitly increases the number of items
produced).

That concern mostly goes away if the new parameter is deliberately
called something *other than* "start" (e.g. "prepend=value", or
"first=value"), but it could also be addressed by offering a dedicated
"yield_start" toggle, such that the revised semantics were:

        def accumulate(iterable, func=operator.add, start=0, yield_start=False):
            it = iter(iterable)
            total = start
            if yield_start:
                yield total
            for element in it:
                total = func(total, element)
                yield total

That approach would have the advantage of making the default value of
"start" much easier to document (since it would just be zero, the same
as it is for sum()), and only the length of the input iterable and
"yield_start" would affect how many partial sums were produced.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-ideas mailing list