Best way to insert sorted in a list

Chris Torek nospam at torek.net
Fri Jun 17 17:48:24 EDT 2011


In article <c63e771c-8968-4d7a-9c69-b7fa6ff34e09 at 35g2000prp.googlegroups.com>
SherjilOzair  <sherjilozair at gmail.com> wrote:
>There are basically two ways to go about this.
>One is, to append the new value, and then sort the list.
>Another is to traverse the list, and insert the new value at the
>appropriate position.
>
>The second one's complexity is O(N), while the first one's is O(N *
>log N).

This is not quite right; see below.

>Still, the second one works much better, because C code is being used
>instead of pythons.
>
>Still, being a programmer, using the first way (a.insert(x);
>a.sort()), does not feel right.
>
>What has the community to say about this ? What is the best (fastest)
>way to insert sorted in a list ?

In this case, the "best" way is most likely "don't do that at all".

First, we should note that a python list() data structure is actually
an array.  Thus, you can locate the correct insertion point pretty
fast, by using a binary or (better but not as generally applicable)
interpolative search to find the proper insertion point.

Having found that point, though, there is still the expense of
the insertion, which requires making some room in the array-that-
makes-the-list (I will use the name "a" as you did above):

    position = locate_place_for_insert(a, the_item)
        # The above is O(log n) for binary search,
        # O(log log n) for interpolative search, where
        # n is len(a).

    a.insert(position, the_item)
        # This is still O(log n), alas.

Appending to the list is much faster, and if you are going to
dump a set of new items in, you can do that with:

    # wrong way:
    # for item in large_list:
    #    a.append(item)
    # right way, but fundamentally still the same cost (constant
    # factor is much smaller due to built-in append())
    a.append(large_list)

If len(large_list) is m, this is O(m).  Inserting each item in
the "right place" would be O(m log (n + m)).  But we still
have to sort:

    a.sort()

This is O(log (n + m)), hence likely better than repeatedly inserting
in the correct place.

Depending on your data and other needs, though, it might be best
to use a red-black tree, an AVL tree, or a skip list.  You might
also investigate radix sort, radix trees, and ternary search trees
(again depending on your data).
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)      http://web.torek.net/torek/index.html



More information about the Python-list mailing list