[Python-ideas] Allow a group by operation for dict comprehension
David Mertz
mertz at gnosis.cx
Thu Jun 28 16:34:30 EDT 2018
I agree with these recommendations. There are excellent 3rd party tools
that do what you want. This is way too much to try to shoehorn into a
comprehension.
I'd add one more option. You want something that behaves like SQL. Right in
the standard library is sqlite3, and you can create an in-memory DB to hope
the data you expect to group.
On Thu, Jun 28, 2018, 3:48 PM Wes Turner <wes.turner at gmail.com> wrote:
> PyToolz, Pandas, Dask .groupby()
>
> toolz.itertoolz.groupby does this succinctly without any
> new/magical/surprising syntax.
>
> https://toolz.readthedocs.io/en/latest/api.html#toolz.itertoolz.groupby
>
> From https://github.com/pytoolz/toolz/blob/master/toolz/itertoolz.py :
>
> """
> def groupby(key, seq):
> """ Group a collection by a key function
> >>> names = ['Alice', 'Bob', 'Charlie', 'Dan', 'Edith', 'Frank']
> >>> groupby(len, names) # doctest: +SKIP
> {3: ['Bob', 'Dan'], 5: ['Alice', 'Edith', 'Frank'], 7: ['Charlie']}
> >>> iseven = lambda x: x % 2 == 0
> >>> groupby(iseven, [1, 2, 3, 4, 5, 6, 7, 8]) # doctest: +SKIP
> {False: [1, 3, 5, 7], True: [2, 4, 6, 8]}
> Non-callable keys imply grouping on a member.
> >>> groupby('gender', [{'name': 'Alice', 'gender': 'F'},
> ... {'name': 'Bob', 'gender': 'M'},
> ... {'name': 'Charlie', 'gender': 'M'}]) #
> doctest:+SKIP
> {'F': [{'gender': 'F', 'name': 'Alice'}],
> 'M': [{'gender': 'M', 'name': 'Bob'},
> {'gender': 'M', 'name': 'Charlie'}]}
> See Also:
> countby
> """
> if not callable(key):
> key = getter(key)
> d = collections.defaultdict(lambda: [].append)
> for item in seq:
> d[key(item)](item)
> rv = {}
> for k, v in iteritems(d):
> rv[k] = v.__self__
> return rv
> """
>
> If you're willing to install Pandas (and NumPy, and ...), there's
> pandas.DataFrame.groupby:
>
>
> https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html
>
>
> https://github.com/pandas-dev/pandas/blob/v0.23.1/pandas/core/generic.py#L6586-L6659
>
>
> Dask has a different groupby implementation:
>
> https://gist.github.com/darribas/41940dfe7bf4f987eeaa#file-pandas_dask_test-ipynb
>
>
> https://dask.pydata.org/en/latest/dataframe-api.html#dask.dataframe.DataFrame.groupby
>
>
> On Thursday, June 28, 2018, Chris Barker via Python-ideas <
> python-ideas at python.org> wrote:
>
>> On Thu, Jun 28, 2018 at 8:25 AM, Nicolas Rolin <nicolas.rolin at tiime.fr>
>> wrote:
>>>
>>> I use list and dict comprehension a lot, and a problem I often have is
>>> to do the equivalent of a group_by operation (to use sql terminology).
>>>
>>
>> I don't know from SQL, so "group by" doesn't mean anything to me, but
>> this:
>>
>>
>>> For example if I have a list of tuples (student, school) and I want to
>>> have the list of students by school the only option I'm left with is to
>>> write
>>>
>>> student_by_school = defaultdict(list)
>>> for student, school in student_school_list:
>>> student_by_school[school].append(student)
>>>
>>
>> seems to me that the issue here is that there is not way to have a
>> "defaultdict comprehension"
>>
>> I can't think of syntactically clean way to make that possible, though.
>>
>> Could itertools.groupby help here? It seems to work, but boy! it's ugly:
>>
>> In [*45*]: student_school_list
>>
>> Out[*45*]:
>>
>> [('Fred', 'SchoolA'),
>>
>> ('Bob', 'SchoolB'),
>>
>> ('Mary', 'SchoolA'),
>>
>> ('Jane', 'SchoolB'),
>>
>> ('Nancy', 'SchoolC')]
>>
>>
>> In [*46*]: {a:[t[0] *for* t *in* b] *for* a,b *in* groupby(sorted(student_school_list,
>> key=*lambda* t: t[1]), key=*lambda* t: t[
>>
>> ...: 1])}
>>
>> ...:
>>
>> ...:
>>
>> ...:
>>
>> ...:
>>
>> ...:
>>
>> ...:
>>
>> ...:
>>
>> Out[*46*]: {'SchoolA': ['Fred', 'Mary'], 'SchoolB': ['Bob', 'Jane'],
>> 'SchoolC': ['Nancy']}
>>
>>
>> -CHB
>>
>>
>> --
>>
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR&R (206) 526-6959 voice
>> 7600 Sand Point Way NE (206) 526-6329 fax
>> Seattle, WA 98115 (206) 526-6317 main reception
>>
>> Chris.Barker at noaa.gov
>>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180628/906d799f/attachment.html>
More information about the Python-ideas
mailing list