Best way to insert sorted in a list

Chris Torek nospam at torek.net
Fri Jun 17 20:46:44 EDT 2011


In article <itgi3801bra at news2.newsguy.com> I wrote, in part:
>>Appending to the list is much faster, and if you are going to
>>dump a set of new items in, you can do that with: [...]

In article <mailman.96.1308348643.1164.python-list at python.org>
Ethan Furman <ethan at stoneleaf.us> wrote:

>>     a.append(large_list)
>         ^- should be a.extend(large_list)

Er, right.  Posted in haste (had to get out the door).  I also
wrote:

>> If len(large_list) is m, this is O(m).  Inserting each item in
>> the "right place" would be O(m log (n + m)).  But we still
>> have to sort:
>>
>>     a.sort()

In article <mailman.98.1308353648.1164.python-list at python.org>,
Ian Kelly  <ian.g.kelly at gmail.com> wrote:
>> This is O(log (n + m)), hence likely better than repeatedly inserting
>> in the correct place.

>Surely you mean O((n + m) log (n + m)).

Er, "maybe"?  (It depends on the relative values of m and n, and
the underlying sort algorithm to some extent. Some algorithms are
better at inserting a relatively small number of items into a
mostly-sorted large list.  As I recall, Shell sort does well with
this.)  But generally, yes.  See "posted in haste" above. :-)

There are a lot of other options, such as sorting just the list of
"items to be inserted", which lets you do a single merge pass:

    # UNTESTED
    def merge_sorted(it1, it2, must_copy = True):
        """
        Merge two sorted lists/iterators it1 and it2.
        Roughly equivalent to sorted(list(it2) + list(it2)),
        except for attempts to be space-efficient.

        You can provide must_copy = False if the two iterators
        are already lists and can be destroyed for the purpose
        of creating the result.
        """

        # If it1 and it1 are deque objects, we don't need to
        # reverse them, as popping from the front is efficient.
        # If they are plain lists, popping from the end is
        # required.  If they are iterators or tuples we need
        # to make a list version anyway.  So:
        if must_copy:
            it1 = list(it1)
            it2 = list(it2)

        # Reverse sorted lists (it1 and it2 are definitely
        # lists now) so that we can pop off the end.
        it1.reverse()
        it2.reverse()

        # Now accumulate final sorted list.  Basically, this is:
        # take first (now last) item from each list, and put whichever
        # one is smaller into the result.  When either list runs
        # out, tack on the entire remaining list (whichever one is
        # non-empty -- if both are empty, the two extend ops are
        # no-ops, so we can just add both lists).
        #
        # Note that we have to re-reverse them to get
        # them back into forward order before extending.
        result = []
        while it1 and it2:
            # Note: I don't know if it might be faster
            # to .pop() each item and .append() the one we
            # did not want to pop after all.  This is just
            # an example, after all.
            last1 = it1[-1]
            last2 = it2[-1]
            if last2 < last1:
                result.append(last2)
                it2.pop()
            else:
                result.append(last1)
                it1.pop()
        it1.reverse()
        it2.reverse()
        result.extend(it1)
        result.extend(it2)
        return result

So, now if "a" is the original (sorted) list and "b" is the not-yet-
sorted list of things to add:

    a = merge_sorted(a, sorted(b), must_copy = False)

will work, provided you are not required to do the merge "in place".
Use the usual slicing trick if that is necessary:

    a[:] = merge_sorted(a, sorted(b), must_copy = False)

If list b is already sorted, leave out the sorted() step.  If list
b is not sorted and is particularly long, use b.sort() to sort in
place, rather than making a sorted copy.
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)      http://web.torek.net/torek/index.html



More information about the Python-list mailing list