does lack of type declarations make Python unsafe?

Thu Jun 19 05:56:37 EDT 2003

<posted & mailed>

David Abrahams wrote:
   ...
>> Sometimes X might perhaps be of the form "a is of type B", but
>> that's really a very rare and specific case.  Much more often it
>> will be "container c is non-empty", "sequence d is sorted", "either
>> x<y or pred(a[z]) for some x>=z>=y", and so on, and so forth.
> 
> Those assertions are *chock full* of type information:

Rather, they IMPLY what you choose to call "type information" (which
has little to do with the typesystem of, say, C++ or Java), as well
as other information (which isn't sensibly mappable to "type" in any
nontrivial way).  E.g.:

>       the type of c is a container

Rather, "c is a container".  Choosing to express "c is a container"
as "the type of c is the type of a container" is just the kind of
useless, redundant boilerplate that negatively impacts productivity.

And in neither C++ nor Java can you express "c is a container", either
directly or indirectly, except in comments -- the type-systems of
either language (the two most-widespread languages today that use
compiletime typing) just don't match the concepts that you're trying
to smuggle in as "type information" .  In _Haskell_ you might collect
the characteristics that define a type's containerhood as an appropriate
typeclass (*NOT* type!); in Java you're simply out of luck; in C++ I
guess you (you personally, as a template-wizard, but not 99%+ of C++'s
users;-) might possibly define a template that "by convention" (not in
a language-supported way as in Haskell) lets you assert containerhood --
"informally", albeit reasonably effectively (better than a comment...).

In Python one generally identifies (just as informally) a container as
"an object which has a length" (using "length", perhaps a suboptimal
choice of wording, to mean "number of items currently contained") and
simultaneously express both 'c is a container' and 'that container is
not empty' by

    assert len(c)

This will be checked at runtime rather than at compile-time -- but
the concept that there's something utterly precious in being able to
check "function len can be meaningfully called with c as an argument"
prematurely with respect to "the result of the call will be non-0" is
pretty bogus.  Python checks both at once and raises TypeError or
AssertionError if either half of the assertion is invalid, period.

There may be worth in making this notion of "protocol" more formal
(desperately trying to avoid nouns which have been hijacked for many
vaguely related meanings in the past, such as "type", "interface",
"category" -- C++ uses "concept" similarly, I believe, in the generic
parts of the language and library).  I think there is -- PEP 246 being
a (rather defect-laden) early attempt, PyProtocols a rather nicer and
newer one (with enhancements wrt PEP 246 but broadly consonant with it).
But IMHO that worth is connected to the concept of *adapting* an
object to a protocol, far more than on insurances about an object
"already conforming" to a protocol without adaptation (the concept
of conformance can even be modeled as a limit case of that of
adaptation, though that's admittedly silly hairsplitting;-).

> Type declarations don't have to identify concrete types; they can
> identify concepts, constraints, and relationships.

In some theoretical world, perhaps.  In practice, again, that's
bogus.  You can't meaningfully have a type declaration for "an
odd integer not divisible by 3" in C++ nor in Java.  The vague
theoretical possibility that a be-all, end-all future language
would let you capture "constraints and relationships" this way is
being used strictly as a soft-soap for justifying the use of
languages whose actual typesystems are enormously different, and
strongly focused on implementation issues.  I'd be quite willing
to fight against the hypothetical "perfect static typesystem" as
"nearly as useless" if a working prototype of it was presented,
but it's really a waste of time.  Let's talk about the reality of
C++ and Java instead, shall we?  The theoretical problem with
types embedding all sorts of "constraints and relationships" is
that as soon as you try to meaningfully operate on them your
compiler's possibilities to ensure compiletime type safety tend
to disappear (which is why languages with a strong theoretical
basis in type theory eschew that route).  Given type ZT as "odd
integer not divisible by 3", in theory you might infer that the
sum of ZT's can't possibly be a ZT, the product must be, but what
about, say, the division-with-truncation?  And who's going to
draw all the deductions about the various arithmetic operation
restricted to ZT down from Z, etc, so that when the forbidden
prime factors in ZT are 2 and 3 you can draw certain deductions
but when they're 3 and 5 instead you can't any more (the sum of
two of THOSE might still be valid, etc, etc)?

>> Type declarations would have extraordinary "explanatory power" if
>> and only if "a is of type B" was extraordinarily more important than
>> the other kinds of assertions, and it just isn't -- even though, by
>> squinting just right, you may end up seeing it that way by a sort of
>> "Stockholm Syndrome" applied to the constraints your compiler forces
>> upon you.
> 
> Suggesting that I've grown to love my shackles is a little bit
> insulting.  I have done significant programming in Python where I
> didn't have static typing; I've gotten over my initial reactions to
> the lack of static checks and grown comfortable with the language.
> Purely dynamic typing works fine for a while.  I have seen real
> problems develop in my code that static type checking would have
> prevented.

And (e.g.) Robert Martin has not.  Now, one can explain this in
many ways -- perhaps Uncle Bob and you are using different approaches
to developing your programs, and his (test-driven development, these
days) avoids the "real problems" that develop in your code; or perhaps
he's unable to see what you're able to see so clearly.  Any hypothesis
with explanatory power about this cannot fail to be offensive to either
you or him, if one wants to take it as insulting.  Since my experience
matches Uncle Bob's quite closely, and my reflections and musings on
the matter end up supporting this side of the argument just as well as
my practice does, I think I can be justified in expressing my opinion
on the subject.  Human beings are extremely good at rationalizing their
experiences (and their cognitive dissonances): I have experienced this
personally, too.  If it's insulting to say that somebody's exhibiting
a typically human trait about it, then I can give a long list of cases
which would be insulting against myself.  For example, as a fervid Pascal
user I was extremely keen on range-types, declaring all over the place
variables that were "integer from -3 to 17 included" and the like, and
considered this a great boon of the typesystem.  It took quite a while
to realize that these were not in fact statically checkable constraints
except in the most trivial of cases, and that moving to (e.g.) C or C++
and losing the ability to express this "exquisite informational power"
didn't give me any real practical problem and in fact saved me the time
wasted in trying to pinpoint each variable's range in the first place.
A few "assert(x>=-3)" *where it MATTERED* worked far, far better than
some silly type declaration way up there at the start about "x will
never be less than -3" -- the documentation was more accessible to the
reader, the occurrence of checking more explicit and thus obvious.
Etc, etc.

> It's not an illusion that static types help (a lot) with certain
> things.  The type information is at least half of what you've written
> in each of those assertions.  I use runtime assertions, too, though
> often I use type invariants to constrain the state of things --
> because it makes reasoning about my code *much* easier.  An
> ultra-simple case: it's great to be able to use an unsigned type and
> not have to think about asserting x >= 0 everywhere.

I would be ecstatically glad to see a nice fight, no holds barred,
between a proponent of this latest sub-thesis and a designer of
statically typed languages which do NOT support the concept of
"unsigned type".  In Java, "unsigned" means "without a signature"
(thus, a risky applet;-).  But I think that the issue may be taken
as a good example of the above, wider thesis.  *IF* having an "x
that's always >= 0" *WAS* indeed such a precious concept, "makes
reasoning about your code *much* easier", etc etc, then *why* has
nobody ever widened that concept to *FLOATING POINT* types for x?
Where's the "unsigned float" in the next C++ release...?

In practice, as I'm sure you know, unsigned types in C/C++ are
tricky indeed (that's why Java removed them -- they deemed them
too tricky to use reliably by ordinary programmers).  E.g. cfr
http://sandbox.mc.edu/~bennet/cs220/c_code/uns_c.html and the like.
They were born from *IMPLEMENTATION* considerations, historically,
which only apply to integral types.  The "makes reasoning easier"
justification is a post-facto rationalization, belied by the
glaring absence of the same facility where implementation would
not gain, as in floating-point numbers.

>> Types are about implementation
> 
> No, they're about a relationship between an interface and semantics.
> Most people leave out the semantics part when thinking about them,
> though.

In practice, the typesystem of languages such as C++ and Java is
mostly about implementation.  "The semantics" is simply left out
of these languages' typesystems -- relegated to comments (which
might just as well be there in Python or any other language and
aren't going to be compiler-checked anyway).  The famous exception
to this is Eiffel with its notion of contract -- but notice that
contracts are checked, if at all, *AT RUNTIME* -- thus, claiming
they're part of COMPILE-TIME notions of typing is a falsification.

The notion of "type" is kept fuzzy (deliberately or not) so that
when arguing about typing one can freely swing between incompatible
aspects, claiming compile-time checkability on one side while
handwaving about semantics (contracts) that are run-time checks,
claiming "modern notions of types" while in fact talking about
languages such as C++ and Java which support mostly implementation
oriented type issues.

>> and one should "program to an interface, not to an implementation"
>> -- therefore, "a is of type B" is rarely what one SHOULD be focusing
>> on.
> 
> In a modern type system, "a is of type B" mostly expresses an
> interface for a and says nothing about implementation per se.  It does

Is C++'s type system modern?  How does "a is an int" or "a is unsigned"
``say nothing about implementation per se''??!!  "a is of some type
that supports interface X" is very often a RUNTIME notion (expressed
in Java by a "(X)a" cast, in C++ by dynamic_cast), not necessarily a
statically checkable one.  The part that's sure to be statically checked
(because it lets the compiler generate faster code!-) is the *actual*
typing -- the implementation-related part thereof.

> say something about the effects of using that interface, though that
> part is harder to formalize.

Eiffel formalizes it -- with RUNTIME checks, of course:-).  Languages
that don't formalize it in any way cannot sensibly claim to have that
notion in their typesystems, whether the latter are claimed to be
"modern" or not.

> However, with a nod to "practicality beats purity:"
> 
>      People don't usually think in these abstract terms about most of
>      their code, and rigorously documenting code in terms of interface
>      requirements is really difficult, so most people never do it.

This could also be expressed as: rigorously documenting code (in just 
about any terms) has pretty low productivity-returns, compared to less
rigorous and formal documentation that is runtime-checkable (contracts
in Eiffel, assertions in Java/C++/Python).  Quite sensibly, people focus
on programming practices that DO have good productivity returns.

>      It's a poor investment anyway because *most* (not all) Python
>      code is never used generically: common interfaces for polymorphic
>      behavior are generally captured in base classes and a great deal
>      of code is just operating on concrete types anyway.

"Common interfaces for polymorphic behavior" are *almost never*
"captured in base classes" -- "file-like objects" are a typical
example of this, or, if you want a more focused one, consider
"iterable objects".

>      The result is that there are usually no expression of interface
>      requirements at all in a function's interface/docs, where in the
>      *vast* majority of cases a simple (non-generic) type declaration
>      would've done the trick.  [Without the expression of interface
>      requirements, the possibility to use the function generically is
>      lost, for all intents and purposes]

It seems that the quality of documentation for the functions I've
been using in Python is substantially higher than that for those
which YOU have been using.  The lack of *formal*, rigorous docs is
generally not a serious problem; functions that e.g. take a filelike
object argument ARE generally kind enough to mention that fact (as
the filelike interface is so fat, it IS typically underdocumented
what fraction of it is actually in use -- but I don't see any simple
nongeneric type declaration that would help fix this at all).

> So, while I buy "program to an interface" in theory, in practice it
> is only appropriate in a small fraction of code.

On the contrary, I think it's the need to fix specific types that
might be appropriate only very rarely.

>> Of course, some languages blur the important distinction
>> between a type and a typeclass (or, a class and an interface, in
>> Java terms -- C++ just doesn't distinguish them by different
>> concepts
> 
> Nor does most of the type theory I've seen.

Does Haskell count?  typeclasses vs types?

>> so, if you think in C++, _seeing_ the crucial distinction may be
>> hard;-).
> 
> I know what typeclasses and variants are all about.

I'm sure you do, as a general issue.  But the language you're using
to think and work about a specific problem still colors how easy or
hard is it to conceptualize in a certain way wrt another.

>> "e provides such-and-such an interface" IS more often interesting, but,
>> except in Eiffel, the language-supplied concept of "interface" is too
>> weak for the interest to be sustained -- it's little more than the sheer
>> "signature" that you can generally infer easily.  E.g.:
>>
>> my procedure receiving argument x
>>
>>     assert "x satisfies an interface that provides a method Foo which
>>             is callable without arguments"
>>
>>     x.Foo()
>>
>> the ``assert'' (which might just as well be spelled "x satisfies
>> interface Fooable", or, in languages unable to distinguish "being
>> of a type" from "satisfying an interface", "x points to a Fooable")
>> is ridiculously redundant, the worse sort of boilerplate.
> 
> Only if you think that only syntax (and not semantics) count.  It's
> not just important that you can "Foo()" x, but that Fooing it means
> what you think it does.

But, to repeat, "the language-supplied concept of "interface" is too
weak" to ensure this, except perhaps by contracts (runtime checks) in
Eiffel.  So, the static type checking does *NOT* help at all here.

What the compiler can check for you is ONLY that x provides a
method Foo callable without arguments -- the "signature" part of
things, which Python checks at runtime instead.  Whether Foo writes
useful information to disk or sends a letter of insults to your
cousin is way beyond the compiler's ability to determine;-).

>> Many, _many_ type declarations are just like that, particularly if
>> one follows a nice programming style of many short
>> functions/methods.  At least in C++ you may often express the
>> equivalent of
>>
>> "just call x.Foo()!"
>>
>> as
>>
>> template <type T>
>> void myprocedure(T& x)
>> {
>>     x.Foo();
>> }
> 
> In most cases it's evil to do this without a rigorous concept
> definition (type constraint) for T in the documentation.  Pretty much
> all principled template code (other than special cases like the lambda
> library which are really just for forwarding syntax) does this, and
> it's generally acknowledged as a weakness in C++ that there's no way
> to express the type constraints in code.
> 
>> where you're basically having to spend a substantial amount of
>> "semantics-free boilerplate" to tell the compiler and the reader
>> "x is of some type T"
> 
> Where are you claiming the expression of the type of x is in the code
> above?  I don't see it.

the "(T& x)" part says "x is a reference to the type which we'll call T
here", i.e. "x is of some type T".

The concepts that it's better to add comments to explain what is going
on is quite commendable -- but just as applicable in any language, of
course, and just-as-of-course the compiler cannot check them anyway (as
you put it recently, they can go out of date).  There's no distinction
here between statically-typed and dynamically-typed languages, and thus
it's distracting and irrelevant to a comparison of the two categories.

As my main thesis is that the constraints that really matter can hardly
ever be expressed as statically checkable type constraints, the "generally
acknowledged weakness of C++" is (if a weakness at all) common to all
alternatives.  E.g., the Java alternative where a Fooable interface is
defined and method myprocedure receives a "Fooable x" argument is just
as semantics-free -- comments and/or other forms of doc will be around,
sure, but they're not "compile-time statically checkable" anyway, and
may just as well be around in any language.

>> (surprise, surprise; I'm sure this has huge explanatory power,
>> doesn't it -- otherwise the assumption would have been that x was of
>> no type at all...?)
> 
> This kind of sneering only makes me doubt the strength of your
> argument even more.  I know you're a smart guy; I ask you to treat my
> position with the same respect with which I treat yours.

If you're familiar at all with my writing style, you know I always
jest -- generally "with a straight face" a la Buster Keaton -- when
I come upon jestworthy sub-issues (and not always only then).  If you
can only debate in a sombre tone, then I can but suggest we drop this
(not a big loss, to be sure -- these debates have been held a zillion
times and have never convinced anybody of anything -- and besides, I'm
about to leave for a month's worth of business trips -- pypy sprint,
Europython, OSCON, ... -- so this debate can't continue anyway).

>> while letting them both shrewdly infer that type T, whatever it
>> might be, had better provide a method Foo that is callable without
>> arguments (e.g. just the same as the Python case, natch).
> 
> Only if you consider the implementation of myprocedure to be its
> documentation.

Documentation is comments, docstrings, and other venues yet, which
are not statically compile-time checkable and are pretty irrelevant
to the debate about the worth of such checks, quite obviously.  The
"meat", the part that doesn't "go out of date", is what can be
"checked" at compile- or run-time, and is language-relevant.

>> You do get the "error diagnostics 2 seconds earlier" (while compiling
>> in preparation to running unit-tests, rather than while actually
>> running the unit-tests) if and when you somewhere erroneously call
>> myprocedure with an argument that *doesn't* provide the method Foo
>> with the required signature.  But, how can it surprise you if Robert
>> Martin claims (and you've quoted me quoting him as if I was the
>> original source of the assertion, in earlier posts)
> 
> Hey, sorry, I just let Gnus do its job.  If the quote attributions
> were messed up then someone messed them up before me.

Nope, you (not Gnus) just cut out the attribution of the quote, which
was on a separate line from the snippet you picked out of the quote.

>> that this just isn't an issue...?
> 
> It doesn't surprise me in the least that some people in the Python
> community claim that their way is unambiguously superior.  It's been

Is "Uncle Bob" Robert Martin "in the Python community"?  His books
all use C++ and Java, I believe, as does the great mass of his
articles, and I believe the journal for which he was the editor was
titled "C++ Report", not "Python Report", wasn't it?

So, I surmise that (probably subconsciously) you're "relegating"
Robert Martin to the role of a "person in the Python community" just
so you can AVOID "being surprised" at his claims (which are about
dynamic typing in general -- he mentions Ruby and I think Smalltalk
as well as Python) -- perhaps for the same reason you cut out the
little detail that I was quoting him...?-)

> going on for years.  I wanted to believe that, too.  My experience
> contradicts that idea, unfortunately.

Have you given TDD a chance?  Beck's new book on TDD by example is
excellent -- and Beck is another person I would hardly class as "in
the Python community" (he's quite scathing in criticizing explicit
"self" -- rather a lithmus test in this matter;-).  Martin's own
latest book about agile development is more general but may be more
directly applicable to C++ (Beck uses Java and Python in his book's
examples -- not that it matters all that much, but still...).

>> If the compilation takes 3 seconds, then getting the error
>> diagnostics 2 seconds earlier is still a loss of time, not a gain,
>> compared to just running the tests w/o any compilation;-)...
> 
> Comprehensive test suites can't always run in a few seconds (the same
> applies to compilations, but I digress).  In a lot of the work I've

Say 3 minutes and 2 minutes, then -- as long as the ratio stays
the same the unit of measure doesn't matter:-)

> done, testing takes substantially longer, unavoidably.  A great deal
> of this work is exactly the sort of thing I like to use Python for, in
> fact (but not because of the lack of type declarations).  If
> compilation is reasonably fast and I have been reasonably
> conscientious about my type invariants, though, I *can* detect many
> errors with a static type system.

But compilation need not be fast, as you just "digressed" yourself;-).
And unit-tests need not be slow -- surely not a subset of unit-tests
whose job is only catching obvious bloomers (I'm probably a heretic
within the "test-driven community" by liking the idea of several
'layers' of unit-tests -- so far I've not gone beyond two -- but,
it's now ME who's digressing;-).

> But more importantly, I can come back to my code months later and
> still figure out what's going on, or work with someone else's code
> without losing my way.  Isn't that why we're all using Python instead
> of Perl?

Having the information that "x is an int" stated out like that
rather than (say) implicitly where it matters is hardly crucial
to "figuring out what's going on".  And having the compiler
statically test that fact, rather than testing it by a small core
of unit-tests that can also check out several other crucial aspects
not expressible in static-typing terms, is even less important.

>> I do, at some level, want a language where I CAN (*not* MUST) make
>> assertions about what I know to be true at certain points:
>> 1. to help the reader in a way that won't go out of date (the assert
>>    statement does that pretty well in most cases)
>> 2. to get the compiler to to extra checks & debugging for me (ditto)
>> 3. to let the compiler in optimizing mode deduce/infer whatever it
>>    wants from the assertions and optimize accordingly (and assert is
>>    no use here, at least as currently present in C, C++, Python)
> 
> Those are all the same things I want, and for the same reasons.  What
> are we arguing about again?

A. about CAN vs MUST -- I'd rather not have these possibilities at
   all, than be FORCED to use them even where I judge their impact
   on my productivity would be negative (forcing me to waste my
   time, and the code reader's attention, on reams of boilerplate)

B. about the importance of checks occurring at compile-time vs run-time,
   which I think is miniscule and NOT worth distorting the language
   in any way [note that my point 2 would be perfectly well satisfied
   if what the compiler did in most cases was inserting error-checking
   code to be executed at runtime, cfr again Eiffel contracts]

C. about whether "x belongs to type A" is a sensible way to express
   most important assertions for purposes of 1/2/3 -- I claim it isn't,
   you claim it is -- partly based on subtle confusions about "belong
   to type" MEANS (in type-theory vs programming-practice e.g. in C++)

as well as debating-style (sombre vs jestful), Robert Martin's role
as a member of the "Python community", whether it's reasonable at all
to cut out a quote's attribution, and sundry minor tangential points.

>> But even if and when I get such a language I strongly doubt most of
>> my assertions will be similar to "type declarations" anyway...
> 
> Oh, there it is.  Well, if the language has a weak notion of type,
> then you're probably right.

I claim no language can possibly have a notion of "statically checkable
compile-time type" that would make me "probably wrong" in this respect.
Perhaps a small number of somewhat-trivial issues, conceptually only
runtime checkable, might give rise to earlier diagnostics thanks to
some constant-propagation and a very smart compiler indeed, e.g, one
able to deduce that:

    x = 3
    y = x*k
    z = y/3
    assert z > k

will always inevitably cause the assertion to fail.  I think (but
cannot be sure) that the importance of such "optimization of checks"
is rather minor -- that most interesting checks used and useful in
practical programming don't yield any substantial dividend from such
optimization attempts.  And I certainly don't want to use a language
distorted to facilitate such optimizations at the cost of any of
the important dimensions, all connected to programmer productivity...

Alex