min max from tuples in list

Steven D'Aprano steve+comp.lang.python at pearwood.info
Thu Dec 12 06:44:32 EST 2013


On Wed, 11 Dec 2013 23:25:53 -0800, Robert Voigtländer wrote:

> Hi,
> 
> I have a list like this:
> 
> a = [(52, 193), (52, 193), (52, 192), ...
> 
> 
> I need to find a -performant- way to transform this into a list with
> tuples (a[0],[a[0][1]min],[a[0][1]max]).

I'm afraid I don't know what you mean by "performant". It doesn't appear 
to be an English word, so far as I can tell. Do you mean efficient?

 
> Hard to explaint what I mean .. [0] of the first three tuples is 52. [1]
> is 193,193 and 192. What I need as result for these three tuples is:
> (52,192,193).
> 
> For the next five tuples it is (51,188,193).
> 
> 
> Extra challenges:
> - This list is sorted. For performance reasons I would like to keep it
> unsorted. 

Normally when people talk about "performance", they mean to *maximise* 
it, not minimise it. If you can guarantee that your data is already pre-
sorted, then the resulting algorithm is likely to be much more efficient 
than one which doesn't require sorted data.

In any case, sorting in Python is amazingly fast. You may be pleasantly 
surprised that a version that sorts your data, while nominally 
O(N log N), may be much faster than an O(N) solution that doesn't require 
sorted data. If I were a betting man, I'd be willing to wager a shiny new 
dollar[1] that sorting works out faster for reasonable sized sets of data.

One general piece of advice for speeding up Python code: the more work 
you can delegate to Python built-ins, like sort, min, max and so forth, 
the faster your code is likely to be.


> - There may be tuples where min=max.
> - There my be tupples where [0] only exists once. So mix is
> automatically max

Neither of these should be a problem.


The first approach I'd try would be:

- sort the list, which will group the tuples by their first item;

- conveniently, it will also ensure that the first tuple in each group 
will have the minimum second item, and the last tuple of that group the 
maximum value;

- use itertools.groupby to extract each group;

- grab the first tuple from the group, and take both its items;

- grab the last tuple from the group (if any), and take its second item;

- yield those three items.


Something like this should work, I expect:

from collections import deque
from itertools import groupby
from operator import itemgetter

def collect(data):
    data = sorted(data)
    groups = groupby(data, itemgetter(0))
    d = deque([], maxlen=1)
    for key, subiter in groups:
        smallest = largest = next(subiter)[1]
        d.extend(subiter)
        try:
            largest = d.pop()[1]
        except IndexError:
            pass
        yield (key, smallest, largest)
    


And in use:

py> data = [(1, 4), (5, 3), (2, 7), (2, 3), (1, 3), (2, 5), 
...         (5, 3), (4, 9)]
py> list(collect(data))
[(1, 3, 4), (2, 3, 7), (4, 9, 9), (5, 3, 3)]




[1] Betting man or not, nobody ever accused me of being a spend-thrift.



-- 
Steven



More information about the Python-list mailing list