[Tutor] lists+sort

Steven D'Aprano steve at pearwood.info
Tue Jan 5 06:03:53 EST 2016


On Mon, Jan 04, 2016 at 12:34:57PM -0500, Pooja Bhalode wrote:
> Hi, I wanted to check if this program can be used to merge the lists
> together and sort them. This seems to work, but i wanted to check if there
> are drawbacks in writing it in this manner.
> 
> My solution:
> 
> def linear_merge(list1, list2):
>     for num in list2:
>         list1.append(num)
>     list1.sort()
>     return list1

One small change:

def linear_merge(list1, list2):
    list1.extend(list2)
    list1.sort()
    return list1

is easier to write and faster. For use in an actual program, this method 
is perfectly acceptible, although this version modifies the first list 
in place rather than make a copy. If you want a copy:

def linear_merge(list1, list2):
    result = list1[:]
    result.extend(list2)
    result.sort()
    return result


Or more compact, but not quite as efficient:

def linear_merge(list1, list2):
    return sorted(list1 + list2)

You should play around with these and see if you can understand how they 
differ.

For a practice exercise, none of the above might be acceptible. Usually, 
when people talk about "merging" lists, they mean to avoid calling sort. 
Sorting, on average, takes time proportional to N*log N, where N is the 
number of items. So if you have 1000 items, sorting will take time 
proportional to 1000*(log 1000), or 3000. But "merging" should take time 
proportional to just N, or 1000. So in theory, merging may be faster.

So for practice exercises, it is often expected that you do not just 
add the two lists together and sort as we did above.

Let's look at their code:

> def linear_merge(list1, list2):
>   result = []
>   # Look at the two lists so long as both are non-empty.
>   # Take whichever element [0] is smaller.
>   while len(list1) and len(list2):
>     if list1[0] < list2[0]:
>       result.append(list1.pop(0))
>     else:
>       result.append(list2.pop(0))
>   # Now tack on what's left
>   result.extend(list1)
>   result.extend(list2)
>   return result

As you can see, there is no call to sort. (This does assume that each 
list is itself sorted, but that is normal for this type of question.) So 
the function starts with a new, empty, list and then it keeps looking at 
the two lists. It looks at the first item from each list, picks the 
smaller, and moves it to the new list. When one or the other list has 
run out of items, the function then does a quick copy of the rest of the 
items from the other.


However, while this might technically be a merge, it is actually *very* 
inefficient. It will actually take time proportional to N**2, due to the 
list.pop calls. Each time you pop an item from the *beginning* of a 
list, all the other items have to be moved back one space. So if you 
have 10 items in the list, and remove the first one, the remaining 9 
have to move; then you remove the first, now the remaining 8 have to 
move; and so on. If you don't understand this, don't worry about it too 
much, the important thing is that the answer given will perform very 
slowly for large lists with millions of items. Here's a better version:


def merge(lista, listb):
    # Assumes that both lista and listb are already sorted.
    new = []
    i = 0  # Index of an item in lista
    j = 0  # Index of an item in listb
    while i < len(lista) and j < len(listb):
        if lista[i] < listb[j]:
            new.append(lista[i])
            i += 1
        else:
            new.append(listb[j])
            j += 1
    new.extend(lista[i:])
    new.extend(listb[j:])
    return new



-- 
Steve


More information about the Tutor mailing list