Why is heapify linear?

Mon Nov 1 20:44:26 EST 2004

>>>>> "Scott" == Scott David Daniels <Scott.Daniels at Acm.Org> writes:

    Scott> I am sorry, but in the Python 2.4 description of "heapify",
    Scott> I find the description of "Transform list x into a heap,
    Scott> in-place, in linear time," unbelievable.  I understand the
    Scott> hand-wave that makes dictionary building linear (though I
    Scott> have a hard time with even that).  Could somebody tell me
    Scott> what twist of logic makes "heapify" linear, except in the
    Scott> sense that linear is coming to mean "very fast?"

I'll try to illustrate the idea without any complex maths, but instead
with real ugly ascii art.

The basic building block of "heapify" is a simple procedure which I'd
call "heapify_node".  It looks at a node that is not a leaf, and among
that non-leaf node and its two children, which is largest.  Then it
swap that largest node to the top, so that the node becomes the
largest among the three---satisfy the "heap property".  Clearly,
constant time.

To heapify the whole tree, we will heapify it from the bottom (so
"bottom up").  Heapifying a node at the "bottom" (i.e., next to a
leaf) is trivial; heapifying a node that is not bottom might cause the
heap property of nodes below to be violated.  In that case we will
have to heapify the node below as well, which might cause a node one
more level below to violate heap property.  So to heapify a node at
level n, we might need to call heapify_node (height-1) times.

Now we show that a whole run of heapify of a tree with n=2^k-1 nodes
will never call heapify_node more than n times.  We do it by finding a
way to charge the calls to heapify_node so that each node is never
charged more than once.  Note that we simplify things by always
considering full binary trees (otherwise, the claim has to be a bit
more complicated).

Let's consider the bottom layer.  To heapify a node with two children,
we need one call to heapify_node.  It is charged to left leaf.  So we
have this:

After heapify     O
                 / \
                X   O

where O shows a node that is not charged yet, and X show a node which
is already charged.  Once all the next-to-bottom nodes have been
heapified, we start to heapify the next-to-next-to-bottom nodes.
Before heapifying a node there, we have

Before heapify     O
                 _/ \_
                O     O
               / \   / \
              X   O X   O

To heapify this node, we might need two calls to heapify_node.  We
charge these two calls to the two uncharged nodes of the left subtree.
So we get:

After heapify      O
                 _/ \_
                X     O
               / \   / \
              X   X X   O

So none of the nodes is charged more than once, and we still have some
left for the next level, where before heapify the picture is:

Before heapify     O
              ____/ \____
             O           O     
           _/ \_       _/ \_   
          X     O     X     O  
         / \   / \   / \   / \ 
        X   X X   O X   X X   O

Heapifying at this level requires at most 3 calls to heapify_node,
which are charged again to the left branch.  So after that we get

After heapify      O
              ____/ \____
             X           O
           _/ \_       _/ \_
          X     X     X     O
         / \   / \   / \   / \
        X   X X   X X   X X   O

We note a pattern: the path towards the right-most leaf is never
charged.  When heapifying a level, one of the branches will always
have enough uncharged nodes to pay for the "expensive" heapify at the
top, while the other branch will still be uncharged to keep the
pattern.  So this pattern is maintained until the end of the heapify
procedure, making the number of steps to be at most n-k = 2^k - k - 1.
This is definitely linear to n.

Regards,
Isaac.