What's the best way to minimize the need of run time checks?

Chris Angelico rosuav at gmail.com
Sun Aug 28 01:29:48 EDT 2016


On Sun, Aug 28, 2016 at 2:30 PM, Steve D'Aprano
<steve+python at pearwood.info> wrote:
> But the author of this piece ignores that standard distinction and invents
> his own non-standard one: to him, classes are merely different
> representations of the same data. E.g. his example of complex numbers,
> shown as Cartesian (x, y) values or polar (r, θ) values. These aren't two
> different "kinds of things", but merely two different ways of representing
> the same entity.
>
> That's not a good way to think about (say) Python lists and Python bools.
> Lists and bools are in no way the same kind of entity (except in the most
> general category of "they're both objects").
>
> It's not even a very good way of thinking about complex numbers.

It might be a good way of thinking about points on a Cartesian plane,
though. Rectangular and polar coordinates truly are just different
ways of expressing the same information. (How well 2D coordinates map
to complex numbers is a separate question.)

> In static typing, I somehow associate the name "x" with a tag that
> says "this may only be used with ints". Perhaps I have to declare it first,
> like in C or Pascal, or perhaps the compiler can infer the type, like in
> Haskell, but either way, "x" is now forever tagged as an int, so that the
> compiler can flag errors like:
>
> x = 1
> # ... code can run here
> x.upper()
>
> The compiler knows that ints don't have a method "upper" and can flag this
> as an error. In such static languages, it is invariably an error to try to
> change the type of the variable (unless it has been tagged as "Anything"
> or "Duck Typed").

So far, I completely agree with you; whether you declare "x takes
integers only" or the compiler infers "x has been assigned to point to
an integer" or any other form of it, attempting to call .upper() on
the integer 1 is an error.

> x = 1
> x = "hello"  # a type error, at compile time
>
>
> But in dynamic typing, the type information isn't associated with the
> name "x", but with the value 1 currently assigned to it. Change the
> assignment, and the type changes. As a consequence, it is necessary to move
> the type checks from compile time to runtime, but that's not the
> fundamental difference between the two.

This is where I'm less sure. Sometimes a variable's type should be
broader than just one concrete type - for instance, a variable might
hold 1 over here, and 1.5 over there, and thus is storing either "int
or float" or "any number". If you have a complex hierarchy of types,
how do you know that this variable should be allowed to hold anything
up to a certain level in the hierarchy, and no further?

If what the compiler's doing is identifying what *is* assigned, then
it's easy. You've given it an int over here and a float over there,
and that's legal; from that point on, the compiler knows that this
contains either an int or a float. (Let's assume it can't know for
sure which, eg it has "if (cond) x=1; else x=1.5" where the condition
can't be known till run-time.) But for your example of x="hello" to be
a compilation error, it has to either assume that the first object
given determines the type completely, or be told what types are
permitted.

So, for example, I could make a declaration in Pike that says:

int|float x = 1;

and then x can have either an integer (which, like in Python, is a
bignum) or a float (IEEE 64-bit, again like Python), but not a string.
I could equally say:

string(8bit)|int x(0..) = 12345;

which would allow x to store a byte-string (an eight-bit string, as
opposed to a Unicode string which stores text) or a non-negative
integer. A type inference system that can't handle variables like this
is limited; but if it _can_ handle something like this, how can it
flag an error at compile time? It'd just infer a more complicated
type.

How is this resolved in type-inferring languages? (Genuine question,
not rhetorical. I haven't used type-inferring languages in this way.)

> As further evidence that the author has missed the forest for all the trees,
> consider languages which actually do have only a single type:
>
> - in assembly language, everything is just bytes or words;
>
> - in Forth, similarly, everything is just a 16-bit or 32-bit word;
>
> - in Hypertalk, every value is stored internally as a string;
>
> to say nothing of more esoteric languages like Oook, Whitespace and
> BrainF*ck.

Turing Tarpits (Ook, Brain*, etc) tend to be like assembly language,
treating everything as cells (assembly language might call those cells
either "bytes" or "words"). REXX is like Hypertalk - everything truly
is a string. Shell scripting languages generally treat everything as
strings, too (although bash has arrays too). I can't imagine any
untyped language using anything other than bytes/words or strings, but
I'm sure someone's done it somewhere.

ChrisA



More information about the Python-list mailing list