[Tutor] sorting objects on two attributes
Eric Abrahamsen
eric at abrahamsen.com
Tue Mar 4 03:50:25 CET 2008
Well I expected to learn a thing or two, but I didn't expect not to
understand the suggestions at all! :) Thanks to everyone who
responded, and sorry for the typo (it was meant to be object_id
throughout, not content_type).
So far Michael's solution works and is most comprehensible to me – ie
I stand a fighting chance of figuring it out.
I didn't realize you could make direct use of SortedDicts in django,
so that's worth a try, too.
Itertools.groupby is totally impenetrable to me, but works as well! Is
there any consensus about which might go faster? I will go right now
and google until my brain is full.
Thanks again,
Eric
On Mar 4, 2008, at 12:56 AM, Michael H. Goldwasser wrote:
>
>
> Hello Eric,
>
> Your basic outlook is fine, but you can do it much more efficiently
> with a single sort. Here's the way I'd approach the task
> (untested):
>
> # --------------------------------------------------
> # first compute the latest date for each id group; uses O(n) time
> newest = {}
> for q in queryset:
> id,date = q.object_id, q.submit_date
> if id not in newest or date > newest[id]:
> newest[id] = date
>
> # now sort based on the following decorator as a key
> data.sort(reverse=True, key=lambda x: (newest[x.object_id],
> x.object_id, x.submit_date))
> # --------------------------------------------------
>
> In essence, you compute the max date within each group, but your
> approach (i.e., building explicit sublists and then repeatedly
> calling max on those sublists) is far more time-consuming than the
> above dictionary based approach.
>
> Note well that using a tuple as a decorator key is more efficient
> than calling sort separately for each subgroup. The
> lexicographical order of the following
>
> (newest[x.object_id], x.object_id, x.submit_date)
>
> should produce the order that you desire. The reason for the middle
> entry is to ensure that items groups by object_id in the case that
> two different groups achieve the same maximum date. It wasn't clear
> to me whether you wanted elements within groups from oldest to
> newest or newest to oldest. I believe that the code I give produces
> the ordering that you intend, but you may adjust the sign of the
> decorate elements if necessary.
>
> With regard,
> Michael
>
> +-----------------------------------------------
> | Michael Goldwasser
> | Associate Professor
> | Dept. Mathematics and Computer Science
> | Saint Louis University
> | 220 North Grand Blvd.
> | St. Louis, MO 63103-2007
>
>
> On Monday March 3, 2008, Eric Abrahamsen wrote:
>
>> I have a grisly little sorting problem to which I've hacked
>> together a
>> solution, but I'm hoping someone here might have a better
>> suggestion.
>>
>> I have a list of objects, each of which has two attributes,
>> object_id
>> and submit_date. What I want is to sort them by content_type,
>> then by
>> submit_date within content_type, and then sort each content_type
>> block
>> according to which block has the newest object by submit_date.
>> (This
>> sequence of sorting might not be optimal, I'm not sure). I'm
>> actually
>> creating a list of recent comments on blog entries for a python-
>> based
>> web framework, and want to arrange the comments according to blog
>> entry (content_type), by submit_date within that entry, with the
>> entries with the newest comments showing up on top.
>>
>> I don't believe a single cmp function fed to list.sort() can do
>> this,
>> because you can't know how two objects should be compared until you
>> know all the values for all the objects. I'd be happy to be proven
>> wrong here.
>>
>> After some false starts with dictionaries, here's what I've got.
>> Queryset is the original list of comments (I'm doing this in
>> django),
>> and it returns a list of lists, though I might flatten it
>> afterwards.
>> It works, but it's ghastly unreadable and if there were a more
>> graceful solution I'd feel better about life in general:
>>
>>
>> def make_com_list(queryset):
>> ids = set([com.object_id for com in queryset])
>> xlist = [[com for com in queryset if com.object_id == i] for
>> i in
>> ids]
>> for ls in xlist:
>> ls.sort(key=lambda x: x.submit_date)
>> xlist.sort(key=lambda x: max([com.submit_date for com in x]),
>> reverse=True)
>> return xlist
>>
>> I'd appreciate any hints!
>>
>> Thanks,
>> Eric
>>
>
More information about the Tutor
mailing list