[Python-ideas] Allow a group by operation for dict comprehension

pylang pylang3 at gmail.com
Thu Jun 28 21:02:06 EDT 2018


There are a few tools that can accomplish these map-reduce/transformation
tasks.
See Options A, B, C below.

# Given
    >>> import itertools as it
    >>> import collections as ct

    >>> import more_itertools as mit


    >>> student_school_list = [
    ...     ("Albert", "Prospectus"), ("Max", "Smallville"), ("Nikola",
"Shockley"), ("Maire", "Excelsior"),
    ...     ("Neils", "Smallville"), ("Ernest", "Tabbicage"), ("Michael",
"Shockley"), ("Stephen", "Prospectus")
    ... ]


    >>> kfunc = lambda x: x[1]
    >>> vfunc = lambda x: x[0]
    >>> sorted_iterable = sorted(student_school_list, key=kfunc)


# Example (see OP)
    >>> student_by_school = ct.defaultdict(list)
    >>> for student, school in student_school_list:
    ...    student_by_school[school].append(student)
    >>> student_by_school
    defaultdict(list,
                {'Prospectus': ['Albert', 'Stephen'],
                 'Smallville': ['Max', 'Neils'],
                 'Shockley': ['Nikola', 'Michael'],
                 'Excelsior': ['Maire'],
                 'Tabbicage': ['Ernest']})

---

# Options

# A: itertools.groupby
    >>> {k: [x[0] for x in v] for k, v in it.groupby(sorted_iterable,
key=kfunc)}
    {'Excelsior': ['Maire'],
    'Prospectus': ['Albert', 'Stephen'],
    'Shockley': ['Nikola', 'Michael'],
    'Smallville': ['Max', 'Neils'],
    'Tabbicage': ['Ernest']}

# B: more_itertools.groupby_transform
    >>> {k: list(v) for k, v in mit.groupby_transform(sorted_iterable,
keyfunc=kfunc, valuefunc=vfunc)}
    {'Excelsior': ['Maire'],
     'Prospectus': ['Albert', 'Stephen'],
     'Shockley': ['Nikola', 'Michael'],
     'Smallville': ['Max', 'Neils'],
     'Tabbicage': ['Ernest']}

# C: more_itertools.map_reduce
    >>> mit.map_reduce(student_school_list, keyfunc=kfunc, valuefunc=vfunc)
    defaultdict(None,
                {'Prospectus': ['Albert', 'Stephen'],
                 'Smallville': ['Max', 'Neils'],
                 'Shockley': ['Nikola', 'Michael'],
                 'Excelsior': ['Maire'],
                 'Tabbicage': ['Ernest']})

---

# Summary

- Option A: standard library, sorted iterable, some manual value
transformations (via list comprehension)
- Option B: third-party tool, sorted iterable, accepts a value
transformation function
- Option C: third-party tool, any iterable, accepts transformation
function(s)

I have grown to like `itertools.groupby`, but I understand it can be odd at
first.
Perhaps something like the `map_reduce` tool (or approach) may help?  It's
simple,
 does not require a sorted iterable as in A and B, and you have control
over how
 you want your keys, values and aggregated/reduced values to be (see docs
for more details).


# Documentation

- Option A:
https://docs.python.org/3/library/itertools.html#itertools.groupby
- Option B:
https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.groupby_transform
- Option C:
https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.map_reduce

On Thu, Jun 28, 2018 at 8:37 PM, Chris Barker - NOAA Federal via
Python-ideas <python-ideas at python.org> wrote:

> > On Jun 28, 2018, at 5:30 PM, Chris Barker - NOAA Federal <
> chris.barker at noaa.gov> wrote:
> >
> > So maybe a solution is an accumulator special case of defaultdict — it
> uses a list be default and appends by default.
> >
> > Almost like counter...
>
> Which, of course, is pretty much what your proposal is.
>
> Which makes me think — a new classmethod on the builtin dict is a
> pretty heavy lift compared to a new type of dict in the collections
> module.
>
> -CHB
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180628/5d520dd2/attachment-0001.html>


More information about the Python-ideas mailing list