PEP 3107 and stronger typing (note: probably a newbie question)

Sat Jul 14 03:02:30 EDT 2007

Ben Finney <bignose+hates-spam at benfinney.id.au> writes:
> This seems to make the dangerous assumption that the programmer has
> the correct program in mind, and needs only to transfer it correctly
> to the computer.

Well, I hope the programmer can at least state some clear
specficiations that a correct program should implement, e.g. it should
not freeze or crash.

> I would warrant that anyone who understands exactly how a program
> should work before writing it, and makes no design mistakes before
> coming up with a program that works, is not designing a program of any
> interest.

I'm not sure what the relevance of this is.  Programmers generally
have an idea of what it is that they're trying to write.  They then
have to make a bunch of design and implementation decisions as they go
along.  Soemtimes those decisions are subtly incorrect, and the more
errors the computer can identify automatically, the fewer errors are
likely to remain in the program.  

> Certainly. Which is why the trend continues toward developing programs
> such that mistakes of all kinds cause early, obvious failure -- where
> they have a much better chance of being caught and fixed *before*
> innocent hands get ahold of them.

Here is what Edward Lee wrote about developing a multi-threaded Java
app (http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf):

    A part of the Ptolemy Project experiment was to see whether
    effective software engineering practices could be developed for an
    academic research setting. We developed a process that included a
    code maturity rating system (with four levels, red, yellow, green,
    and blue), design reviews, code reviews, nightly builds,
    regression tests, and automated code coverage metrics [43]. The
    portion of the kernel that ensured a consistent view of the
    program structure was written in early 2000, design reviewed to
    yellow, and code reviewed to green. The reviewers included
    concurrency experts, not just inexperienced graduate students
    (Christopher Hylands (now Brooks), Bart Kienhuis, John Reekie, and
    myself were all reviewers). We wrote regression tests that
    achieved 100 percent code coverage. The nightly build and
    regression tests ran on a two processor SMP machine, which
    exhibited different thread behavior than the development machines,
    which all had a single processor.  The Ptolemy II system itself
    began to be widely used, and every use of the system exercised
    this code. No problems were observed until the code deadlocked on
    April 26, 2004, four years later.

    It is certainly true that our relatively rigorous software
    engineering practice identified and fixed many concurrency
    bugs. But the fact that a problem as serious as a deadlock that
    locked up the system could go undetected for four years despite
    this practice is alarming. How many more such problems remain? How
    long do we need test before we can be sure to have discovered all
    such problems? Regrettably, I have to conclude that testing may
    never reveal all the problems in nontrivial Multithreaded code.

Now consider that the code you test today on a 2-core processor may be
running on a 100-core processor in 4 years, making any concurrency
errors that much worse.  You can't test on the 100-core cpu today
because it doesn't exist yet, but your code still has to run on it
correctly once it's out there.  Heck, if you're writing in CPython (no
parallelism at all, due to the GIL) you might be in for a surprise as
soon as real parallelism arrives on even 2 cores.  It gets worse,
maybe you're writing operating system or language runtime code
(e.g. parallel garbage collector for Java).  That means there will be
hostile code on the same machine, written by diabolical maniacs who
know every detail of your implementation, and who deliberately do
stuff (e.g. write code to hit specific cache lines or cause page
faults at fiendishly precisely chosen moments) specifically to cause
weird timing conditions and trigger bugs.  Testing is not enough to
ensure that such attacks won't succeed.

I remember an old programming book (might even have been Knuth vol. 3)
recommending using the Shell sorting algorithm (a very simple exchange
sort that runs in about O(n**1.4) time) instead of an O(n log n)
algorithm because Shell's algorithm (being simpler) was less likely to
end up with implementation errors.  Today I think an experienced
programmer needing to code a sort routine wouldn't hesitate to code a
good algorithm, because our languages and coding/testing methods are
better than they were in the Fortran and assembly language era when
that book was written.  But there are still more complex problems
(like shared memory multiprocessing) that we're advised to stay away
from because they're so fraught with hazards.  We instead use
approaches that are simpler but take perforamnce penalties.  (As a
non-concurrency example, we might implement geometric search through
some kind of list scanning algorithm instead of using more efficient
but complicated k-d trees).  With higher reliability languages we'll
hopefully be in better shape to tackle those complex problems instead
of retreating from them, without expecting to spend eternity hunting
down mysterious bugs.