status of Programming by Contract (PEP 316)?

Paul Rubin http
Fri Aug 31 05:46:44 EDT 2007


aleax at mac.com (Alex Martelli) writes:
> Yeah, good question indeed, and I'm asking myself that -- somebody who
> posts to this group in order to attack the reliability of the language
> the group is about (and appears to be supremely ignorant about its use
> in air-traffic control and for high-reliability mission-critical
> applications such as Google's "Production Systems" software) might well
> be considered not worth responding to.  OTOH, you _did_ irritate me
> enough that I feel happier for venting in response;-)

Hi Alex, I'm a little confused: does Production Systems mean stuff
like the Google search engine, which (as you described further up in
your message) achieves its reliability at least partly by massive
redundancy and failover when something breaks?  In that case why is it
so important that the software be highly reliable?  Is a software
fault really worse than a hardware fault, especially if it's
permissible to sometimes let a transaction (like a search query) go
uncompleted (e.g. by displaying a "try again later" message)?  If you
get 1 billion queries in a month and a half dozen of them don't
complete (e.g. they give empty or incorrect results when there are
some good hits they should display) but the server is never actually
down, can you still claim 100% uptime?

There's a philosophy in Erlang described as "let it crash",
i.e. programmers are told NOT to program defensively such as by
checking inputs for validity.  Instead they should just rely on the
fault tolerance and process restart stuff to get things going again if
their process fails.  Similarly if the Google search software hits
some fatal condition once in a while, maybe it's enough to just treat
it as a crashed box and let the failover mechanisms handle the
problem.  Of course then there's a second level system to manage the
restarts that has to be very reliable, but it doesn't have to deal
with much weird concocted input the way that a public-facing internet
application has to.

Therefore I think Russ's point stands, that we're talking about a
different sort of reliability in these highly redundant systems, than
in the systems Russ is describing.



More information about the Python-list mailing list