how to find the longst element list of lists

Steven D'Aprano steve at REMOVE.THIS.cybersource.com.au
Mon Jan 8 05:29:30 EST 2007


On Sun, 07 Jan 2007 20:55:19 -0500, Dan Sommers wrote:

> On Sun, 07 Jan 2007 22:23:22 +0100,
> "Michael M." <michael at mustun.ch> wrote:
> 
>> How to find the longst element list of lists?
>> I think, there should be an easier way then this:
> 
>>   s1 = ["q", "e", "d"]
>>   s2 = ["a", "b"]
>>   s3 = ["a", "b", "c", "d"]
> 
> [ snip ]
> 
> One more thing to think about:  if your list of lists grows (i.e., if
> you end up with thousands of lists instead of just three), then sorting
> may not be the way to go.  Assuming that list_of_lists is your list of
> lists, then something like this:
> 
>     longest_list, longest_length = list_of_lists[ 0 ], len( longest_list )
>     for a_list in list_of_lists[ 1 : ]:
>         a_length = len( a_list )
>         if a_length > longest_length:
>             longest_list, longest_length = a_list, a_length
> 
> will run faster than sorting the list just to pick off one element (O(n)
> vs. O(n log n) for all of you Big-Oh notation fans out there; you know
> who you are!).

But your O(n) code is running in relatively slow Python, while the sort
method, while O(n log n), is some of the fastest, most highly optimized C
code out there. Unless your list is truly gigantic, chances are the sort
version will win.

Here's my timing code:


import timeit

def getlongest1(list_of_lists):
    longest_list = list_of_lists[ 0 ]
    longest_length = len( longest_list )
    for a_list in list_of_lists[ 1 : ]:
        a_length = len( a_list )
        if a_length > longest_length:
            longest_list, longest_length = a_list, a_length
    return longest_list

def getlongest2(list_of_lists):
    list_of_lists.sort(key=len)
    return list_of_lists[0]

def make_list_of_lists(length):
    return [[None]*i for i in xrange(length)]

t1 = timeit.Timer("getlongest1(L)", "from __main__ import getlongest1, L")
t2 = timeit.Timer("getlongest2(L)", "from __main__ import getlongest2, L")



Note that my test list_of_lists grows very big quite fast, like O(n**2).
Assuming Python pointers are eight bytes, a mere length=10000 will
require over 760MB just for the pointers. More realistic data may allow
more extensive testing.

And here are my timing results:

>>> L = make_list_of_lists(1)
>>> print t1.timeit(1000), t2.timeit(1000)
0.00209903717041 0.00367403030396

>>> L = make_list_of_lists(10)
>>> print t1.timeit(1000), t2.timeit(1000)
0.00871086120605 0.00775289535522

>>> L = make_list_of_lists(100)
>>> print t1.timeit(1000), t2.timeit(1000)
0.121382951736 0.0518100261688

>>> L = make_list_of_lists(1000)
>>> print t1.timeit(1000), t2.timeit(1000)
0.809508085251 0.508343935013

>>> L = make_list_of_lists(10000)
>>> print t1.timeit(100), t2.timeit(100)
0.906499147415 0.732254981995

>>> L = make_list_of_lists(20000)
>>> print t1.timeit(100), t2.timeit(100)
1.83560800552 1.58732700348

For a list of 1 item, sorting is 1.8 times SLOWER; 
For a list of 10 items, sorting is 1.1 times FASTER;
For 100 items, sorting is 2.3 times faster;
For 1000 items, sorting is 1.6 times faster;
For 10,000 items, sorting is 1.2 times faster;
For 20,000 items, sorting is 1.1 times faster.


The precise results depend on the version of Python you're running, the
amount of memory you have, other processes running, and the details of
what's in the list you are trying to sort. But as my test shows, sort has
some overhead that makes it a trivial amount slower for sufficiently small
lists, but for everything else you're highly unlikely to beat it. 



-- 
Steven.




More information about the Python-list mailing list