timing puzzle

Neil Cerutti horpner at yahoo.com
Fri Nov 16 14:18:10 EST 2007


On 2007-11-16, Robin Becker <robin at reportlab.com> wrote:
> I'm trying to get my head round a timing puzzle I find whilst
> optimizing A Kuchling's version of the Knuth line splitting
> algorithm from the oedipus project. The puzzle is as follows in
> highly abstract form (here active_nodes is a list of objects
> kept in a sorted order, but there are no special methods on the
> objects for comparison etc)
>
>     for i in xrange(m):
>         .......
>         for A in active_nodes[:]:
>            ......
>            if cond:
>                active_nodes.remove(A)
>            ......
>         .....
>
> is slow my test takes 2.7 seconds; profiling reveals we're
> calling A.remove thousands of times and it's taking a lot of
> time. There appear to be no other usages of active_nodes in the
> for A loop.
>
> On a hunch I changed the above to
>
>      removed = []
>      aremoved = removed.append
>      for i in xrange(m):
>         .......
>         for iA,A in enumerate(active_nodes):
>            ......
>            if cond:
>                aremoved(iA)
>            ......
>         if removed:

'if removed' is reduntant, as the loop body below will not be
executed if removed is empty.

>             for iA in reversed(removed):
>                 del active_nodes[iA]
>                 del removed[:]
>         ......
>
> this latter is 5 times faster, but I can't really see why. 

You are no longer making m copies of active_nodes.

> My only guess is that appending to the removed list is
> relatively easy compared with deletes and the reverse order
> delete is somehow moving less data about. I think in the
> original Knuth the deletion would have been from some real kind
> of list and O(1) whereas here deletions are O(n/2) on average.
> On the other hand we have to keep this list in sorted order.
> What data structure should I be using?

When you have to make many deletions from the middle of a
sequence, you would normally choose a linked list. Python doesn't
provide much support for linked lists, unfortunately.

Instead, filter your list. It looks like you can't use filter
directly, so just do it manually.

   for i in xrange(m):
       .......
       saved_nodes = []
       for A in active_nodes[:]:
          ......
          if not cond:
              saved_nodes.append(A)

          ......
       active_nodes = saved_nodes
       .....

-- 
Neil Cerutti
Sermon Outline: I. Delineate your fear II. Disown your fear III. Displace your
rear --Church Bulletin Blooper



More information about the Python-list mailing list