how to find the longst element list of lists
Peter Otten
__peter__ at web.de
Mon Jan 8 07:55:40 EST 2007
Steven D'Aprano wrote:
> On Sun, 07 Jan 2007 20:55:19 -0500, Dan Sommers wrote:
>
>> On Sun, 07 Jan 2007 22:23:22 +0100,
>> "Michael M." <michael at mustun.ch> wrote:
>>
>>> How to find the longst element list of lists?
>>> I think, there should be an easier way then this:
>>
>>> s1 = ["q", "e", "d"]
>>> s2 = ["a", "b"]
>>> s3 = ["a", "b", "c", "d"]
>>
>> [ snip ]
>>
>> One more thing to think about: if your list of lists grows (i.e., if
>> you end up with thousands of lists instead of just three), then sorting
>> may not be the way to go. Assuming that list_of_lists is your list of
>> lists, then something like this:
>>
>> longest_list, longest_length = list_of_lists[ 0 ], len( longest_list
>> ) for a_list in list_of_lists[ 1 : ]:
>> a_length = len( a_list )
>> if a_length > longest_length:
>> longest_list, longest_length = a_list, a_length
>>
>> will run faster than sorting the list just to pick off one element (O(n)
>> vs. O(n log n) for all of you Big-Oh notation fans out there; you know
>> who you are!).
>
> But your O(n) code is running in relatively slow Python, while the sort
> method, while O(n log n), is some of the fastest, most highly optimized C
> code out there. Unless your list is truly gigantic, chances are the sort
> version will win.
>
> Here's my timing code:
>
>
> import timeit
>
> def getlongest1(list_of_lists):
> longest_list = list_of_lists[ 0 ]
> longest_length = len( longest_list )
> for a_list in list_of_lists[ 1 : ]:
> a_length = len( a_list )
> if a_length > longest_length:
> longest_list, longest_length = a_list, a_length
> return longest_list
>
> def getlongest2(list_of_lists):
> list_of_lists.sort(key=len)
> return list_of_lists[0]
>
> def make_list_of_lists(length):
> return [[None]*i for i in xrange(length)]
>
> t1 = timeit.Timer("getlongest1(L)", "from __main__ import getlongest1, L")
> t2 = timeit.Timer("getlongest2(L)", "from __main__ import getlongest2, L")
>
>
>
> Note that my test list_of_lists grows very big quite fast, like O(n**2).
> Assuming Python pointers are eight bytes, a mere length=10000 will
> require over 760MB just for the pointers. More realistic data may allow
> more extensive testing.
>
> And here are my timing results:
>
>>>> L = make_list_of_lists(1)
>>>> print t1.timeit(1000), t2.timeit(1000)
> 0.00209903717041 0.00367403030396
>
>>>> L = make_list_of_lists(10)
>>>> print t1.timeit(1000), t2.timeit(1000)
> 0.00871086120605 0.00775289535522
>
>>>> L = make_list_of_lists(100)
>>>> print t1.timeit(1000), t2.timeit(1000)
> 0.121382951736 0.0518100261688
>
>>>> L = make_list_of_lists(1000)
>>>> print t1.timeit(1000), t2.timeit(1000)
> 0.809508085251 0.508343935013
>
>>>> L = make_list_of_lists(10000)
>>>> print t1.timeit(100), t2.timeit(100)
> 0.906499147415 0.732254981995
>
>>>> L = make_list_of_lists(20000)
>>>> print t1.timeit(100), t2.timeit(100)
> 1.83560800552 1.58732700348
>
> For a list of 1 item, sorting is 1.8 times SLOWER;
> For a list of 10 items, sorting is 1.1 times FASTER;
> For 100 items, sorting is 2.3 times faster;
> For 1000 items, sorting is 1.6 times faster;
> For 10,000 items, sorting is 1.2 times faster;
> For 20,000 items, sorting is 1.1 times faster.
>
>
> The precise results depend on the version of Python you're running, the
> amount of memory you have, other processes running, and the details of
> what's in the list you are trying to sort. But as my test shows, sort has
> some overhead that makes it a trivial amount slower for sufficiently small
> lists, but for everything else you're highly unlikely to beat it.
Try again with tN.timeit(1) and a second list that is random.shuffle()d and
copied to L before each measurement. list.sort() treats already sorted
lists specially.
Peter
More information about the Python-list
mailing list