how to find the longst element list of lists

Peter Otten __peter__ at web.de
Mon Jan 8 07:55:40 EST 2007


Steven D'Aprano wrote:

> On Sun, 07 Jan 2007 20:55:19 -0500, Dan Sommers wrote:
> 
>> On Sun, 07 Jan 2007 22:23:22 +0100,
>> "Michael M." <michael at mustun.ch> wrote:
>> 
>>> How to find the longst element list of lists?
>>> I think, there should be an easier way then this:
>> 
>>>   s1 = ["q", "e", "d"]
>>>   s2 = ["a", "b"]
>>>   s3 = ["a", "b", "c", "d"]
>> 
>> [ snip ]
>> 
>> One more thing to think about:  if your list of lists grows (i.e., if
>> you end up with thousands of lists instead of just three), then sorting
>> may not be the way to go.  Assuming that list_of_lists is your list of
>> lists, then something like this:
>> 
>>     longest_list, longest_length = list_of_lists[ 0 ], len( longest_list
>>     ) for a_list in list_of_lists[ 1 : ]:
>>         a_length = len( a_list )
>>         if a_length > longest_length:
>>             longest_list, longest_length = a_list, a_length
>> 
>> will run faster than sorting the list just to pick off one element (O(n)
>> vs. O(n log n) for all of you Big-Oh notation fans out there; you know
>> who you are!).
> 
> But your O(n) code is running in relatively slow Python, while the sort
> method, while O(n log n), is some of the fastest, most highly optimized C
> code out there. Unless your list is truly gigantic, chances are the sort
> version will win.
> 
> Here's my timing code:
> 
> 
> import timeit
> 
> def getlongest1(list_of_lists):
>     longest_list = list_of_lists[ 0 ]
>     longest_length = len( longest_list )
>     for a_list in list_of_lists[ 1 : ]:
>         a_length = len( a_list )
>         if a_length > longest_length:
>             longest_list, longest_length = a_list, a_length
>     return longest_list
> 
> def getlongest2(list_of_lists):
>     list_of_lists.sort(key=len)
>     return list_of_lists[0]
> 
> def make_list_of_lists(length):
>     return [[None]*i for i in xrange(length)]
> 
> t1 = timeit.Timer("getlongest1(L)", "from __main__ import getlongest1, L")
> t2 = timeit.Timer("getlongest2(L)", "from __main__ import getlongest2, L")
> 
> 
> 
> Note that my test list_of_lists grows very big quite fast, like O(n**2).
> Assuming Python pointers are eight bytes, a mere length=10000 will
> require over 760MB just for the pointers. More realistic data may allow
> more extensive testing.
> 
> And here are my timing results:
> 
>>>> L = make_list_of_lists(1)
>>>> print t1.timeit(1000), t2.timeit(1000)
> 0.00209903717041 0.00367403030396
> 
>>>> L = make_list_of_lists(10)
>>>> print t1.timeit(1000), t2.timeit(1000)
> 0.00871086120605 0.00775289535522
> 
>>>> L = make_list_of_lists(100)
>>>> print t1.timeit(1000), t2.timeit(1000)
> 0.121382951736 0.0518100261688
> 
>>>> L = make_list_of_lists(1000)
>>>> print t1.timeit(1000), t2.timeit(1000)
> 0.809508085251 0.508343935013
> 
>>>> L = make_list_of_lists(10000)
>>>> print t1.timeit(100), t2.timeit(100)
> 0.906499147415 0.732254981995
> 
>>>> L = make_list_of_lists(20000)
>>>> print t1.timeit(100), t2.timeit(100)
> 1.83560800552 1.58732700348
> 
> For a list of 1 item, sorting is 1.8 times SLOWER;
> For a list of 10 items, sorting is 1.1 times FASTER;
> For 100 items, sorting is 2.3 times faster;
> For 1000 items, sorting is 1.6 times faster;
> For 10,000 items, sorting is 1.2 times faster;
> For 20,000 items, sorting is 1.1 times faster.
> 
> 
> The precise results depend on the version of Python you're running, the
> amount of memory you have, other processes running, and the details of
> what's in the list you are trying to sort. But as my test shows, sort has
> some overhead that makes it a trivial amount slower for sufficiently small
> lists, but for everything else you're highly unlikely to beat it.

Try again with tN.timeit(1) and a second list that is random.shuffle()d and
copied to L before each measurement. list.sort() treats already sorted
lists specially.

Peter




More information about the Python-list mailing list