Termination and type systems

Mon Jun 26 18:23:25 EDT 2006

Dirk Thierbach wrote:
> David Hopwood <david.nospam.hopwood at blueyonder.co.uk> wrote:
>>Marshall wrote:
>>>David Hopwood wrote:
> 
>>>>A type system that required an annotation on all subprograms that
>>>>do not provably terminate, OTOH, would not impact expressiveness
>>>>at all, and would be very useful.
> 
>>>Interesting. I have always imagined doing this by allowing an
>>>annotation on all subprograms that *do* provably terminate. 
> 
> Maybe the paper "Linear types and non-size-increasing polynomial time
> computation" by Martin Hofmann is interesting in this respect.

<http://citeseer.ist.psu.edu/hofmann98linear.html>

> From the abstract:
> 
>   We propose a linear type system with recursion operators for inductive
>   datatypes which ensures that all definable functions are polynomial
>   time computable. The system improves upon previous such systems in
>   that recursive definitions can be arbitrarily nested; in particular,
>   no predicativity or modality restrictions are made.
> 
> It does not only ensure termination, but termination in polynomial time,
> so you can use those types to carry information about this as well.

That's interesting, but linear typing imposes some quite severe
restrictions on programming style. From the example of 'h' on page 2,
it's clear that the reason for the linearity restriction is just to
ensure polynomial-time termination, and is not needed if we only want
to prove termination.

>>If the annotation marks not-provably-terminating subprograms, then it
>>calls attention to those subprograms. This is what we want, since it is
>>less safe/correct to use a nonterminating subprogram than a terminating
>>one (in some contexts).
> 
> That would be certainly nice, but it may be almost impossible to do in
> practice. It's already hard enough to guarantee termination with the
> extra information present in the type annotation. If this information
> is not present, then the language has to be probably restricted so
> severely to ensure termination that it is more or less useless.

I think you're overestimating the difficulty of the problem. The fact
that *in general* it can be arbitrarily hard to prove termination, can
obscure the fact that for large parts of a typical program, proving
termination is usually trivial.

I just did a manual termination analysis for a 11,000-line multithreaded
C program; it's part of a control system for some printer hardware. This
program has:

 - 212 functions that trivially terminate. By this I mean that the function
   only contains loops that have easily recognized integer loop variants,
   and only calls other functions that are known to trivially terminate,
   without use of recursion. A compiler could easily prove these functions
   terminating without need for any annotations.

 - 8 functions that implement non-terminating event loops.

 - 23 functions that intentionally block. All of these should terminate
   (because of timeouts, or because another thread should do something that
   releases the block), but this cannot be easily proven with the kind of
   analysis we're considering here.

 - 3 functions that read from a stdio stream into a fixed-length buffer. This
   is guaranteed to terminate when reading a normal file, but not when reading
   from a pipe. In fact the code never reads from a pipe, but that is not
   locally provable.

 - 2 functions that iterate over a linked list of network adaptors returned
   by the Win32 GetAdaptorsInfo API. (This structure could be recognized as
   finite if we were calling a standard library abstraction over the OS API,
   but we are relying on Windows not to return a cyclic list.)

 - 1 function that iterates over files in a directory. (Ditto.)

 - 2 functions that iterate over a queue, implemented as a linked list. The queue
   is finite, but this is not locally provable. Possibly an artifact of the lack
   of abstraction from using C, where the loop is not easily recognizable as an
   iteration over a finite data structure.

 - 1 function with a loop that terminates for a non-trivial reason that involves
   some integer arithmetic, and is probably not expressible in a type system.
   The code aready had a comment informally justifying why the loop terminates,
   and noting a constraint needed to avoid an arithmetic overflow.
   (Bignum arithmetic would avoid the need for that constraint.)

 - 2 functions implementing a simple parsing loop, which terminates because the
   position of the current token is guaranteed to advance -- but this probably
   would not be provable given how the loop is currently written. A straightforward
   change (to make the 'next token' function return the number of characters to
   advance, asserted to be > 0, instead of a pointer to the next token) would make
   the code slightly clearer, and trivially terminating.

So for this program, out of 254 functions:

 - 83.5% are trivially terminating already.

 - 12.2% would need annotations saying that they are intended to block or not
   to terminate. This is largely an artifact -- we could remove the need for
   *all* of these annotations by adopting an event-loop concurrency model, and
   only attempting to prove termination of functions within a "turn" (one
   iteration of each top-level loop).

 - 2.4% would need annotations saying that termination is too hard to prove --
   but only because they interact with the operating system.
   This is obviously not a type system problem; fixing it requires careful
   design of the language's standard library.

 - 0.8% would need annotations only in C; in a language with standard libraries
   that provide data structures that are guaranteed to be finite, they probably
   would not.

 - 1.2% would need changes to the code, or are genuinely too difficult to prove
   to be terminating using a type system.

This is obviously not a large program (although I wouldn't want to do a manual
analysis like this for a much larger one), and it may be atypical in that it has
almost no use of recursion (or maybe that *is* typical for C programs). It would
be interesting to do the same kind of analysis for a functional program, or one
that does a more "algorithmically intensive" task.

-- 
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>