Feature request: sorting a list slice

George Sakkis george.sakkis at gmail.com
Thu May 18 18:24:04 EDT 2006


Fredrik Lundh wrote:

> George Sakkis wrote:
>
>> It would be useful if list.sort() accepted two more optional
>> parameters
>
>
> useful for what?  what's the use case ?
>
> </F>
>

Although there is a use case (see below), I don't think it's strictly
necessary in this case. Arguments to add it:
- Don't Repeat Yourself.
- It doesn't introduce new syntax, builtin or standard library
function; only two optional keywords in an existing method.
- These keywords are already used in several list and string methods
(index,find,..) and their meaning is instantly obvious.
- Increased (even if small) efficiency. Yes, I know that in practice,
and especially in apps that involve I/O, network, DB connections and so
on, it's unlikely to be the bottleneck but this isn't a general
argument for being sloppy. The fact that having a "for i in
xrange(100000):pass" at the top of every file costs me only a few msec
is not a reason to keep it there.

IMO the only reason not to add it would be if it makes sort's
implementation much more complicate that it already is. I don't know
Timsort's details, but I doubt that this is the case; please correct me
if I am wrong.

And here's (a cut-down version of) the use case: a recursive
generalization of groupby. On each recursive call, the argument list is
sorted and regrouped. If sort() accepted a (start,end) range I could
pass this over instead of creating a slice of the input each time. It's
not hard to see that there can be an exponential number of calls wrt to
the input's length, so cutting down the average cost of each call may
be worthwhile for medium-largish sized inputs (of course it's still
exponential asymptotically, so the constant factor does not make much
difference for really large inputs).


import itertools as it

def groupBy(iterable, *keys):
    return _groupBy(list(iterable), keys, len(keys))

def _groupBy(lst, keys, depth):
    if depth == 0:
        return lst
    key = keys[-depth]
    # sort group before subgrouping
    lst.sort(key=key)
    # group by key and then recursively do the same for each subgroup
    return [(k, _groupBy(list(subgroup),keys,depth-1))
            for k,subgroup in it.groupby(lst,key)]


#==== example ========================================

class Struct(object):
    def __init__(self, args):
        self.args = args
    def __getitem__(self,index):
        return self.args[index]

data = map(Struct, [
    ('a', 2, True),
    ('b', 1, False),
    ('a', 1, True),
    ('a', 1, False),
    ('b', 2, True),
    ('b', 2, True),
    ('a', 2, True),
])

from pprint import pprint
from operator import itemgetter
pprint(groupBy(data, itemgetter(2), itemgetter(0), itemgetter(1)))


#======= output =================================

[(False,
  [('a', [(1, [<__main__.Struct object at 0xb7dc128c>])]),
   ('b', [(1, [<__main__.Struct object at 0xb7dc124c>])])]),
 (True,
  [('a',
    [(1, [<__main__.Struct object at 0xb7dc126c>]),
     (2,
      [<__main__.Struct object at 0xb7dc122c>,
       <__main__.Struct object at 0xb7dc12ec>])]),
   ('b',
    [(2,
      [<__main__.Struct object at 0xb7dc12ac>,
       <__main__.Struct object at 0xb7dc12cc>])])])]


George




More information about the Python-list mailing list