[Python-ideas] collections.Counter should implement __mul__, __rmul__

Tim Peters tim.peters at gmail.com
Wed Apr 18 16:55:01 EDT 2018


[Tim]
>> Counter supports a wonderfully weird mix of methods driven by use
>> cases, not by ideology.
>>
>>      + (binary)
>>      - (binary)
>>      |
>>      &
>>
>> have semantics driven by viewing a Counter as a multiset
>> implementation.  That's why they discard values <= 0.  They
>> correspond, respectively, to "the standard" multiset operations of sum
>> (disjoint union), difference, union, and intersection.

[Serhiy Storchaka <storchaka at gmail.com>]
> This explains only why binary "-" discards non-positive values and "&"
> discards keys that are only in one Counter. Multisets contain only positive
> counts.

As I said later, if Raymond had it to do over again, I'd suggest that
only "-" special-case values <= 0.  We have what we have now.  Perhaps
he had other use cases in mind too - I don't know about that.


>> Nothing else in Counter is trying to cater to the multiset view, but
>> to other use cases.  And that's why "*" and "/" should do what
>> everyone _expects_ them to do ;-)  There are no analogous multiset
>> operations to justify them caring at all what the values are.

> Isn't everyone expect that x*2 == x + x?

As shown in earlier messages, it's already the case that, e.g., "x -
y" isn't always the same as "x + -y" for multisets now.  It's already
too late to stress about satisfying "obvious" formal identities ;-)
Again, Counter isn't driven by ideology, but by use cases, and it
caters to all kinds of use cases now.


> Isn't this the definition of multiplication?

In some algebraic structures, yes.  Same as, e.g., "x - y" can be
"defined by" "x + -y".


> And when we have a multiplication, it can be generalized to division.

In some algebraic structures, yes.

>> But there there's no good reason for "*" or "/" to care at all.  They
>> don't make sense for multisets.

> I disagree. "+" and "*" are defined for sequences, and these operations can
> be defined for multisets in terms of sequences of their elements.

Ya, but you're just making that up because it suits your current
argument.  The mathematical definition of multisets says nothing at
all about "sequences".  I used "the standard" earlier as shorthand for
"use Google to find a standard account"; e.g., here:

    http://planetmath.org/operationsonmultisets

Counter implements all and only the multiset operations spelled out
there (or in any number of other standard accounts).


>>  After, e.g.,
>>
>>      c /= sum(c.values())
>>
>> it's sane to expect that the new sum(c.values()) is close to 1
>> regardless of the numeric types or signs of the original values.
>> Indeed, normalizing values so that their sum _is_ close to 1 is a
>> primary use case motivating the current change.

> If there are negative values, then their sum can be very small, and the
> relative error of the sum can be large.

So?

> Dividing by it can results in values with large magnitude, significantly larger
> than 1, and large errors.

Likewise:  so what?  There's no reason to assume that the values
aren't, e.g.. fractions.Fractions, where arithmetic is exact.  If
they're floats, then _of course_ all kinds of numeric surprises are
possible.  But unless you want to claim that float surprises go away
if values <= 0 are thrown away, it's just irrelevant to the case you
_were_ arguing.

Do you seriously want to argue that

    c /= sum(c.values())

should include negative values in the sum, but then throw away keys
with quotients <= 0 when the division is performed?  That's pretty
much incomprehensible.


> What is the use case for division a Counter with negative values by the sum
> of its values?

Avoiding the incomprehensible behavior noted just above - for which
I'd like to see a use case too ;-)

But, seriously, no, I don't have a good use for that.  It _follows_
from the obvious implementation Peter gave in the thread's first
message, which is in fact obvious to just about everyone else so far.
I can't count _against_ it that

    c /= sum(c.values())
    assert sum(c.values()) == 1

would succeed if the values support exact arithmetic and the original
sum isn't 0.


More information about the Python-ideas mailing list