[Python-ideas] Where should grouping() live

Chris Barker chris.barker at noaa.gov
Mon Jul 9 12:13:56 EDT 2018


On Fri, Jul 6, 2018 at 5:13 PM, Michael Selik <mike at selik.org> wrote:

> On Tue, Jul 3, 2018 at 10:11 PM Chris Barker via Python-ideas <
> python-ideas at python.org> wrote:
>
>> * There are types of data well suited to the key function approach, and
>> other data not so well suited to it. If you want to support the not as well
>> suited use cases, you should have a value function as well and/or take a
>> (key, value) pair.
>>
>> * There are some nice advantages in flexibility to having a Grouping
>> class, rather than simply a function.
>>
>
> The tri-grams example is interesting and shows some clever things you can
> do. The bi-grams example I wrote in my draft PEP could be extended to
> handle tri-grams with just a key-function, no value-function.
>

hmm, I'll take a look -- 'cause I found that I was really limited to only a
certain class of problems without a way to get "custom" values.

Do you mean the "foods" example?

>>> foods = [
...     ('fruit', 'apple'),
...     ('vegetable', 'broccoli'),
...     ('fruit', 'clementine'),
...     ('vegetable', 'daikon')
... ]
>>> groups = grouping(foods, key=lambda pair: pair[0])
>>> {k: [v for _, v in g] for k, g in groups.items()}
{'fruit': ['apple', 'clementine'], 'vegetable': ['broccoli', 'daikon']}


Because that one, I think, makes my point well. To get what you want, you
have to post-processthe Grouping with a (somewhat complex) comprehension.
If someone is that adept with comprehensions, and want to do it that way,
the grouping function isn't really buying them much at all, over
setdefault, or defaultdict, or roll your own.

Contrast this with:

groups = grouping(foods,
                  key=lambda pair: pair[0],
                  value=lambda pair: pair[1])

and you're done.

or:

groups = grouping(foods,
                  key=itemgetter(0),
                  value=itemgetter0))


Or even better:

groups = grouping(foods)

:-)

However, because this example is fun it may be distracting from the core
> value of ``grouped`` or ``grouping``.
>

Actually, I think it's the opposite -- it opens up the concept to be more
general purpose -- I guess I'm thinking of this a "dict with lists as the
values" that has many purposes beyond strictly "groupby". Maybe that's
because I'm a general python programmer, and not a database guy, but if
something is going to be added to the stdlib, why not add a more general
purpose class?


> I don't think we need a nicer API for complex grouping tasks. As the tasks
> get increasingly sophisticated, any general-purpose API will be less nice
> than something built for that specific task.
>

I guess this is where we disagree -- I think we've found an API that is
general purpose, and cleanly supports multiple tasks.

Instead, I want the easiest possible interface for making groups for
> every-day use cases. The wide range of situations that ``sorted`` covers
> with just a key-function suggests that ``grouped`` should follow the same
> pattern.
>

not at all -- sorted() is about, well, sorting -- which means rearranging
items. I certainly don't expect it to break up the items for me.

Again, this is a matter of perspective -- if you you start with "groupby"
as a concept, then I can see how you see the parallel with sorted -- you
are rearranging the items, but this time into groups.

But if you start with "a dict of lists", then you take a wider perspective:

- It can naturally an easily be used to group things
- It can do another nifty things
- And as a "dict of something", it's natural to think of keys AND values,
and to want a dict-like API -- i.e. pass in (key, value) pairs.

I do think that the default, key=None, could be set to handle (key, value)
> pairs.
>

OK, so for my part, if you provide the (key, value) pair API, then you
don't really need a value_func. But as the "pass in a function to process
the data" model IS well suited to some tasks, and some people simply like
the style, why not?

And it creates an asymetry: or you have a (key, the_item) problem, you can
use either the key function API or the (key, value) API -- but if you have
a (key, value) problem, you can only use the (key, value) API

But I'm still reluctant to break the standard of sorted, min, max, and
> groupby.
>

This is the power of Python's keyword parameters -- anyone coming to this
from a perspective of "I expect this to be like sorted, min, max, and
groupby" can simply ignore the value parameter :-)

One more argument :-)

There have been comments a bout how maybe some of the classes in
collections are maybe not needed -- Counter, in particular. I tend to
agree, but i think the reason Counter is not-that-useful is because it
doesn't do enough -- not that it isn't useful -- it's just such a thin
wrapper around a dict, that I hardly see the point.

Example:

In [12]: c = Counter()

In [13]: c['f'] += 1

In [14]: c['g'] = "some random thing"

In [15]: c
Out[15]: Counter({'f': 1, 'g': 'some random thing'})

Is that really that useful? I need to do the counting by hand, and can
easily use the regular dict interface to make a mess of it.

it has a handy constructor, but that's about it.

Anyway, I think we've got this nailed down to a handful of options /
decisions

1) a dict subclass vs a function that constructs a dict-of-lists

 - I think a dict subclass offers some real value -- but it comes down a
bit to goals: Do we want a general purpose special dict? or a function to
perform the "usual" groupby operation?

2) Do we have a value function keyword parameter?

  - I think this adds real value without taking anything away from the
convenience of the simpler key only API

3) Do we accept an iterable of (key, value) pairs if no key function is
provided?

  - I think yes, also because why not? a default of the identity function
for key and value is pretty useless.

So it comes down to what the community thinks.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180709/76f6ce94/attachment.html>


More information about the Python-ideas mailing list