[perl-python] generic equivalence partition

Fri Feb 25 15:19:40 EST 2005

David Eppstein <eppstein at ics.uci.edu> writes:

> In article <1109245733.261643.219420 at f14g2000cwb.googlegroups.com>,
>  "Xah Lee" <xah at xahlee.org> wrote:
>
>> parti(aList, equalFunc)
>> 
>> given a list aList of n elements, we want to return a list that is a
>> range of numbers from 1 to n, partition by the predicate function of
>> equivalence equalFunc. (a predicate function is a function that
>> takes two arguments, and returns either True or False.)
>
> In Python it is much more natural to use ranges from 0 to n-1.
> In the worst case, this is going to have to take quadratic time 
> (consider an equalFunc that always returns false) so we might as well do 
> something really simple rather than trying to be clever.

As you say, with the spec as it stands, you can't do better than
quadratic time (although it's O(n*m) where m is the number of
partitions, rather than O(n^2)).

You can do a lot better if you can use a "key" function, rather than
an "equivalence" function, much as list.sort has a "key" argument, and
itertools.groupby (which is pretty close in function to this
partitioning problem) uses a key argument.

In fact, I'd have difficulty thinking of an example where I'd want a
partition function as specified, in Python. In Perl, it makes a lot of
sense, as Perl's array indexing operations lend themselves to slinging
round lists of indices like this. But in Python, I'd be far more
likely to use list.sort followed by itertools.groupby - sort is stable
(so doesn't alter the relative order within equivalence classes), and
groupby then picks out the equivalence classes:

>>> elements = [['x', 'x', 'x', '1'],
...             ['x', 'x', 'x', '2'],
...             ['x', 'x', 'x', '2'],
...             ['x', 'x', 'x', '2'],
...             ['x', 'x', 'x', '3'],
...             ['x', 'x', 'x', '4'],
...             ['x', 'x', 'x', '5'],
...             ['x', 'x', 'x', '5']]

>>> # No need to sort here, as the elements are already sorted!

>>> from pprint import pprint
>>> pprint([(k, list(v)) for k, v in groupby(elements, itemgetter(3))])
[('1', [['x', 'x', 'x', '1']]),
 ('2', [['x', 'x', 'x', '2'], ['x', 'x', 'x', '2'], ['x', 'x', 'x', '2']]),
 ('3', [['x', 'x', 'x', '3']]),
 ('4', [['x', 'x', 'x', '4']]),
 ('5', [['x', 'x', 'x', '5'], ['x', 'x', 'x', '5']])]

If you avoid the sort, the whole thing is highly memory efficient, as
well, because by using iterators, we don't ever take a copy of the
original list.

Having cleverly redefined the question so that it fits the answer I
wanted to give, I'll shut up now :-)

Paul.
-- 
To attain knowledge, add things every day; to attain wisdom, remove
things every day. -- Lao-Tse