[Tutor] sorting into lists

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Fri, 20 Sep 2002 22:07:37 -0700 (PDT)


On Fri, 20 Sep 2002, Tim Wilson wrote:

> > Is this grouping identical elements?  Do we have to worry about what
> > happens when the list isn't in order?  More information on the problem
> > might allow us to write more appropriate programs.
>
> Danny,
>
> Thanks for the quick response. You were right to suspect that my real
> problem is not quite as straight-forward as the example I provided.
> Actually, I've got a list of objects that are returned from a SQL query
> in Zope. Each of those objects contains a DateTime field and is already
> sorted on that field by the SQL query. (In other words, I can assume
> that the original list is sorted.)
>
> The query results correspond to events in a larger database of events.
> My goal is to produce an "Upcoming events" table on my Web site with the
> events grouped by day. For example, "Today," "Tomorrow," "Mon. 9/23,"
> etc. The reason I need the list of lists is because it's the most
> obvious way I can think of to iterate over the events and print the date
> at the beginning of each inner loop. (Think nested for loops.)


Hmmm... Let's assume, for the moment, that our list of events has a method
named "date()".  (If they don't, perhaps we can add it in as a method.)
Then we can adjust the dictionary approach we used earlier so that it
groups the events by date --- we can use the date as keys into the
dictionary:

###
def formGroups(events):
    groups = {}
    for event in events:
        groups.setdefault(event.date(), []).append(event)
    return groups.values()
###

If you notice, this is almost exactly like the approach with the integers.


In fact, the only difference here is the way that we consider element
items "equivalent".  In the first case, we assumed they were in the same
group if they were the same number.  In the second, they're in the same
group if they have the same date.


We can apply this generalization if we modify formGroups() to take in a
definition of sameness as a second parameter!

###
def identity(x): return x

def formGroups(elements, signature_function=identity):
    groups = {}
    for item in elements:
        groups.setdefault(signature_function(item),
                          []).append(item)
    return groups.values()
###


This version of formGroups() is just slightly more complex now, but with
the advantage that we can reuse it to form many different types of groups:

###
>>> formGroups([1, 1, 2, 3, 2, 42])
[[1, 1], [2, 2], [3], [42]]
>>> def parity(x):
...     if x % 2 == 0: return "even"
...     else: return "odd"
...
>>> formGroups([1, 1, 2, 3, 2, 42], parity)
[[2, 2, 42], [1, 1, 3]]
###


Hmmm.... wait a sec...

###
>>> import lovin_stemmer
>>>
>>>     ## see http://hkn.eecs.berkeley.edu/~dyoo/python/lovin_stemmer
>>>     ## for this module.  It's still a bit experimental.
>>>
>>> stem = lovin_stemmer.stem
>>> words = [x.strip() for x in open("/usr/share/dict/words").readlines()]
>>> word_groups = formGroups(words, stem)
>>> word_groups[0]
['fawn', 'fawned', 'fawning', 'fawns']
>>> word_groups[1]
['orthogonal', 'orthogonality', 'orthogonally']
>>> len(word_groups)
21364
###

Very cool!  *grin*  I like this formGroups() function.



> Does that help or are you sorry you asked? :-)

Are you kidding?  *grin* I'm glad you asked; this was a fun problem.


I hope this helps!