[Python-ideas] Fwd: grouping / dict of lists

Fri Jul 6 15:26:41 EDT 2018

On Thu, Jul 5, 2018 at 1:23 AM, Chris Barker via Python-ideas
<python-ideas at python.org> wrote:
> On Wed, Jul 4, 2018 at 6:34 AM, David Mertz <mertz at gnosis.cx> wrote:
>
>>
>> You've misunderstood part of the discussion. There are two different
>> signatures being discussed/proposed for a grouping() function.
>>
>> The one you show we might call grouping_michael(). The alternate API we
>> might call grouping_chris(). These two calls will produce the same result
>> (the first output you show)
>>
>>   grouping_michael(words, keyfunc=len)
>>   grouping_chris((len(word), word) for word in words)
>>
>> I happen to prefer grouping_michael(), but recognize they each make
>> slightly different things obvious.
>
>
> I starting thinking grouping_chris was the obvious and natural thing to do,
> but his discussion has made it clear that grouping_michael is more natural
> for some kinds of data.
>
> and in some cases, it really comes down to taste, after all, who's to say
> which of these is "better"
>
> map(func, iterable)
>
> or
>
> (expression for item in iterable)
>
> given that map existed in Python when comprehensions were added, I tend to
> see the latter as more "Pythonic" but that's just me.
>
>
> So I'm currently lobbying for both :-)
>
> The default is iterable of (key. value) pairs, but the use can specify a key
> function is they want to do it that way.
>
> While a bit of a schizophrenic  API, it makes sens (to me), because
> grouping_mikael isn't useful with a default key function anyway.
>
> The other enhancement I suggest is that an (optional) value function be
> added, as there are use cases where that would be really helpful.

I use this kind of function in several different projects over the
years, and I rewrote it many times as needed.

I added several options, such as:
- key function
- value function
- "ignore": Skip values with these keys.
- "postprocess": Apply a function to each group after completion.
- Pass in the container to store in. For example, create an
OrderedDict and pass it in. It may already hold items.
- Specify the container for each group.
- Specify how to add to the container for each group.

Then I cut it down to two optional parameters:
- key function. If not provided, the iterable is considered to have
key-value pairs.
- The storage container.

Finally, I removed the key function, and only took pairs and an
optional container. However, I don't remember why I removed the key
function. It may be that I was writing throwaway lambdas, and I
decided I might as well just write the transformation into the
comprehension. I think a key function is worth having.

One thing I didn't do is create a class for this container. I used
defaultdict, then used a default dict but converted it to a plain
dict, and finally used to a plain dict.

Aside: I also wrote a lazy grouping function, which returned a lazy
container of generators. Asking for next(grouping[k]) will cause the
container to iterate through the original iterable until it had
something to add to group k. I have never used it, but it was fun to
try and make it.

    class IterBuckets(dict):
        def __init__(self, pairs):
            self.pairs = iter(pairs)
        def __missing__(self, key):
            return self.setdefault(key, IterBucket(self.advance))
        def advance(self):
            k, v = next(self.pairs)
            self[k].append(v)

    class IterBucket(collections.deque):
        def __init__(self, more):
            self.more = more
        def __iter__(self):
            return self
        def __next__(self):
            while not self:
                self.more()
            return self.popleft()