min max from tuples in list

Chris Angelico rosuav at gmail.com
Thu Dec 12 03:18:41 EST 2013


On Thu, Dec 12, 2013 at 6:25 PM, Robert Voigtländer
<r.voigtlaender at gmail.com> wrote:
> I need to find a -performant- way to transform this into a list with tuples (a[0],[a[0][1]min],[a[0][1]max]).
>
> Hard to explaint what I mean .. [0] of the first three tuples is 52. [1] is 193,193 and 192.
> What I need as result for these three tuples is: (52,192,193).
>
> For the next five tuples it is (51,188,193).
>
>
> Extra challenges:
> - This list is sorted. For performance reasons I would like to keep it unsorted.
> - There may be tuples where min=max.
> - There my be tupples where [0] only exists once. So mix is automatically max

Yep, I see what you mean! Apart from the first of the challenges,
which is ambiguous: do you mean you'd rather be able to work with it
unsorted, or is that a typo, "keep it sorted"?

This is a common task of aggregation. Your list is of (key, value)
tuples, and you want to do some per-key statistics. Here are three
variants on the code:

# Fastest version, depends on the keys being already grouped
# and the values sorted within each group. It actually returns
# the last and first, not the smallest and largest.
def min_max_1(lst):
    prev_key = None
    for key, value in lst:
        if key != prev_key:
            if prev_key is not None: yield prev_key, value, key_max
            key_max = value
    if prev_key is not None: yield prev_key, value, key_max

# This version depends on the keys being grouped, but
# not on them being sorted within the groups.
def min_max_2(lst):
    prev_key = None
    for key, value in lst:
        if key != prev_key:
            if prev_key is not None: yield prev_key, key_min, key_max
            key_min = key_max = value
        else:
            key_min = min(key_min, value)
            key_max = min(key_max, value)
    if prev_key is not None: yield prev_key, key_min, key_max

# Slowest version, does not depend on either the keys
# or the values being sorted. Will iterate over the entire
# list before producing any results. Returns tuples in
# arbitrary order, unlike the others (which will retain).
def min_max_3(lst):
    data = {}
    for key, value in lst:
        if key not in data:
            data[key]=(value, value)
        else:
            data[key][0] = min(data[key][0], value)
            data[key][1] = min(data[key][1], value)
    for key, minmax in data.items():
        yield key, minmax[0], minmax[1]

Each of these is a generator that yields (key, min, max) tuples. The
third one needs the most memory and execution time; the others simply
take the input as it comes. None of them actually requires that the
input be a list - any iterable will do.

ChrisA



More information about the Python-list mailing list