Does Python really follow its philosophy of "Readability counts"?

Thu Jan 22 10:10:04 EST 2009

Paul Rubin <http://phr.cx@NOSPAM.invalid> writes:

> Also, the application area matters.  There is a difference between
> programming for one's own enjoyment or to do a personal task, and
> writing programs whose reliability or lack of it can affect other
> people's lives.  I've never done any safety-critical programming but I
> do a fair amount of security-oriented Internet programming.

I do quite a lot of that too.  But I don't think it's necessary to have
the kinds of static guarantees that a statically-typed language provides
in order to write programs which are robust against attacks.

Many actual attacks exploit the low-level nature and lack of safety of C
(and related languages): array (e.g., buffer) overflows, integer
overflows, etc.  A language implementation can foil these attacks in one
of two (obvious) ways.  Firstly, by making them provably impossible --
which would lay proof obligations on the programmer to show that he
never writes beyond the bounds of an array, or that arithmetic results
are always within the prescribed bounds.  (This doesn't seem practical
for most programmers.)  Secondly, by introducing runtime checks which
cause the program to fail safely, either by signalling an exception or
simply terminating, when these bad things happen.  In the case of array
overflows, many `safe' languages implement these runtime checks, and
they now seem to be accepted as a good idea.  The case of arithmetic
errors seems less universal: Python and Lisp fail gracefully to
unbounded integers when the machine's limits are exceeded; Java and C#
silently give incorrect results[1].

Anyway, Python is exempt from these problems (assuming, at any rate,
that the implementation is solid; but we've got to start somewhere).

There's a more subtle strain of logical errors which can also be
exploited.  It's possible that type errors lead to exploitable
weaknesses.  I don't know of an example offhand, but it seems
conceivable that a C program has a bug where an object of one type is
passed to a function expecting an object of a different type (maybe due
to variadic argument handling, use of `void *', or a superfluous
typecast); the contents of this object cause the function to misbehave
in a manner convenient to the adversary.  In Python, objects have types,
and primitive operations verify that they are operating on objects of
suitable types, signalling errors as necessary; but higher level
functions may simply assume (`duck typing') that the object conforms to
a given protocol and expecting a failure if this assumption turns out to
be false.  It does seem possible that an adversary might arrange for a
different object to be passed in, which seems to obey the same protocol
but in fact misinterprets the messages.  (For example, the function
expects a cleaning object, and invokes ob.polish(cup) to make the cup
become shiny; in fact, the object is a nationality detector, and returns
whether the cup is Polish; the function proceeds with a dirty cup!)
Static type systems can mitigate these sorts of `ugly duckling' attacks
somewhat, but it's not possible to do so entirely.  The object in
question may in fact implement the protocol in question (implement the
interface, in Java, or be an instance of an appropriate type-class in
Haskell) but do so in an unexpected manner.

And beyond these kinds of type vulnerabilities are other mistakes which
are very unlikely to be caught by even a sophisticated type system;
e.g., a function for accepting input to a random number generator, which
actually ignores the caller's data!

[1] Here, I don't mean to suggest that the truncating behaviour of Java
    or C# arithmetic can't be used intentionally.  Rather, I mean that,
    in the absence of such an intention, arithmetic in these languages
    simply yields results which are inconsistent with the usual rules of
    integer arithmetic.

> Finally, your type-A / type-B comparison works best regarding programs
> written by one programmer or by a few programmers who communicate
> closely.

Possibly; but I think that larger groups can cooperate reasonably within
a particular style.

> I'm working on a Python program in conjunction with a bunch of people
> in widely dispersed time zones, so communication isn't so fluid, and
> when something changes it's not always easy to notice the change or
> understand the reason and deal with it.

I'll agree that dynamic languages like Python require a degree of
discipline to use effectively (despite the stereotype of type B as
ill-disciplined hackers), and that includes communicating effectively
with other developers about changes which might affect them.  Statically
typed languages provide a safety-net, but not always a complete one.
One might argue that the static-typing safety-net can lead to
complacency -- a risk compensation effect.  (I don't have any evidence
for this so I'm speculating rather than arguing.  I'd be interested to
know whether there's any research on the subject, though.)

Even so, I don't think I'd recommend Python for a nontrivial project to
be implemented by a team of low-to-average-competence programmers.
That's not a criticism of Python: I simply don't believe in
one-size-fits-all solutions.  I'd rather write in Python; I'd probably
recommend that the above team use C#.  (Of course, I'd rather have one
or two highly skilled programmers and use Python, than the low-to-
average team; but industry does like its horde-of-monkeys approach.)

> There have been quite a few times when some hassle would have been
> avoided by the static interfaces mandated in less dynamic languages.
> Whether the hassle saved would have been outweighed by the extra
> verbosity is not known.

This is another question for which it'd be nice to have answers.  But,
alas, we're unlikely to unless dynamic typing returns to academic
favour.

> I've found Haskell's type system to work pretty well for the
> not-so-fancy things I've tried so far.  It takes some study to
> understand, but it's very uniform and beautiful.

It can be very effective, but I think I have a dynamically-typed
brain -- I keep on running into situations where I need more and more
exotic type-system features in order to do things the way I naturally
want to.  It's easier to give up and use Lisp...

> I'm having more trouble controlling resource consumption of programs
> that are otherwise semantically correct, a well known drawback of lazy
> evaluation.  

I always had difficulty curbing my (natural?) inclination towards tail-
recursion, which leads to a lot of wasted space in a normal-order
language.

> The purpose of unsafePerformIO is interfacing with C programs and
> importing them into Haskell as pure functions when appropriate.

Yes, that's what I was doing.  (I was trying to build a crypto interface
using a C library for the underlying primitives, but it became too
unwieldy and I gave up.)  In particular, I used unsafePerformIO to
provide a functional view of a hash function.

> The ML family avoids some of Haskell's problems, but is generally less
> advanced and moribund.

Standard ML seems dead in the water; OCaml looks like it's got some
momentum behind it, though, but isn't going in the same direction.

> Pretty soon I think we will start seeing practical successor languages
> that put the best ideas together.

Perhaps...

-- [mdw]