determining which value is the first to appear five times in a list?

Terry Reedy tjreedy at udel.edu
Sat Feb 6 17:40:53 EST 2010


On 2/6/2010 3:25 PM, Wolodja Wentland wrote:
> On Sat, Feb 06, 2010 at 14:42 -0500, Terry Reedy wrote:
>> On 2/6/2010 2:09 PM, Wolodja Wentland wrote:
>
>>> I think you can use the itertools.groupby(L, lambda el: el[1]) to group
>>> elements in your *sorted* list L by the value el[1] (i.e. the
>>> identifier) and then iterate through these groups until you find the
>>> desired number of instances grouped by the same identifier.
>
>> This will generally not return the same result. It depends on
>> whether OP wants *any* item appearing at least 5 times or whether
>> the order is significant and the OP literally wants the first.
>
> Order is preserved by itertools.groupby - Have a look:

Sorting does not.
>
>>>> instances = [(1, 'b'), (2, 'b'), (3, 'a'), (4, 'c'), (5, 'c'), (6, 'c'), (7, 'b'), (8, 'b')]
>>>> grouped_by_identifier = groupby(instances, lambda el: el[1])
>>>> grouped_by_identifier = ((identifier, list(group)) for identifier, group in grouped_by_identifier)
>>>> k_instances = (group for identifier, group in grouped_by_identifier if len(group) == 2)
>>>> for group in k_instances:
> ...     print group
> ...
> [(1, 'b'), (2, 'b')]
> [(7, 'b'), (8, 'b')]
>
> So the first element yielded by the k_instances generator will be the
> first group of elements from the original list whose identifier appears
> exactly k times in a row.
>
>> Sorting the entire list may also take a *lot* longer.
> Than what?

Than linearly scanning for the first 5x item, as in my corrected version 
of the original code.

Terry Jan Reedy




More information about the Python-list mailing list