[Tutor] Lists of duplicates
Peter Otten
__peter__ at web.de
Thu Mar 9 03:55:56 EST 2017
Alan Gauld via Tutor wrote:
> On 08/03/17 19:56, Sri Kavi wrote:
>
>> It’s about making a function that returns a list of lists, with each list
>> being all of the elements that are the same as another element in the
>> original list.
>
> This is one of those problems where there is probably no
> definitive "correct" answer just different compromises.
>
> My first reaction was to sort the initial list then iterate
> over it creating a new sublist for each change of value.
> But, it only works for homogenous lists. For mixed types
> I'd probably opt for a dictionary:
>
> def f(lst):
> res = {}
> for item in lst:
> res.setdefault(item,[]).append(item)
> return list(res.values())
>
> But I've no idea how that performs compared to your methods.
Performance is probably comparable to the Counter() solution.
Digression:
There is one important difference that I see as an advantage of Alan's
approach -- it keeps the actual values (below I use a defaultdict instead of
the setdefault() method):
>>> from collections import defaultdict
>>> def group(items):
... groups = defaultdict(list)
... for item in items:
... groups[item].append(item)
... return groups
...
>>> group([1, 2, 3, True, 2.0, 3.0])
defaultdict(<class 'list'>, {1: [1, True], 2: [2, 2.0], 3: [3, 3.0]})
Let's see if we can get output that is a bit more readable:
>>> def show(dd):
... for k, v in sorted(dd.items()): print(k, "-->", v)
...
>>> show(group([1, 2, 3, True, 2.0, 3.0]))
1 --> [1, True]
2 --> [2, 2.0]
3 --> [3, 3.0]
Now it is easy to go one step further and adapt the code to group data by an
arbitrary property of item:
>>> def groupby(items, key=None):
... if key is None:
... pairs = ((v, v) for v in items)
... else:
... pairs = ((key(v), v) for v in items)
... groups = defaultdict(list)
... for k, v in pairs:
... groups[k].append(v)
... return groups
...
>>> show(groupby(["a", "bb", "BB", "CC", "ccc", "bb"]))
BB --> ['BB']
CC --> ['CC']
a --> ['a']
bb --> ['bb', 'bb']
ccc --> ['ccc']
>>> show(groupby(["a", "bb", "BB", "CC", "ccc", "bb"], key=str.lower))
a --> ['a']
bb --> ['bb', 'BB', 'bb']
cc --> ['CC']
ccc --> ['ccc']
>>> show(groupby(["a", "bb", "BB", "CC", "ccc", "bb"], key=len))
1 --> ['a']
2 --> ['bb', 'BB', 'CC', 'bb']
3 --> ['ccc']
>>> show(groupby(["a", "bb", "BB", "CC", "ccc", "bb"], key=lambda s:
s[:1].lower()))
a --> ['a']
b --> ['bb', 'BB', 'bb']
c --> ['CC', 'ccc']
Efficient, flexible -- and you may even sometimes use it for real-world
problems.
PS: There is a very similar function, itertools.groupby(), in the standard
library. In return for the restriction that only adjacent values can be in
the same group it works lazily and uses very little memory.
More information about the Tutor
mailing list