What's the best way to minimize the need of run time checks?

Sun Aug 28 04:33:40 EDT 2016

On Sunday 28 August 2016 15:29, Chris Angelico wrote:

> On Sun, Aug 28, 2016 at 2:30 PM, Steve D'Aprano
> <steve+python at pearwood.info> wrote:
>> But the author of this piece ignores that standard distinction and invents
>> his own non-standard one: to him, classes are merely different
>> representations of the same data. E.g. his example of complex numbers,
>> shown as Cartesian (x, y) values or polar (r, θ) values. These aren't two
>> different "kinds of things", but merely two different ways of representing
>> the same entity.
>>
>> That's not a good way to think about (say) Python lists and Python bools.
>> Lists and bools are in no way the same kind of entity (except in the most
>> general category of "they're both objects").
>>
>> It's not even a very good way of thinking about complex numbers.
> 
> It might be a good way of thinking about points on a Cartesian plane,
> though. Rectangular and polar coordinates truly are just different
> ways of expressing the same information. 

That's exactly my point, and that's why you shouldn't implement them as 
different classes. If you do, that's a limitation of your code and/or the 
language.

(There may be *implementation specific* reasons why you are forced to, for 
example to avoid rounding errors due to finite precision: say, my polar number 
(1, 45°) may not evaluate as *exactly* (sqrt(2)/2, sqrt(2)/2) in Cartesian 
coordinates due to rounding. But that's a case of a leaky abstraction.)

> (How well 2D coordinates map
> to complex numbers is a separate question.)

Mathematically speaking, they map together perfectly well. 

[...]
> This is where I'm less sure. Sometimes a variable's type should be
> broader than just one concrete type - for instance, a variable might
> hold 1 over here, and 1.5 over there, and thus is storing either "int
> or float" or "any number". If you have a complex hierarchy of types,
> how do you know that this variable should be allowed to hold anything
> up to a certain level in the hierarchy, and no further?

This depends on the sophistication of the type system and support (or lack of 
support) for polymorphism:

https://en.wikipedia.org/wiki/Polymorphism_%28computer_science%29

Type punning is normally considered a way to subvert or bypass the type system, 
and is normally considered a bad but necessary thing:

https://en.wikipedia.org/wiki/Type_punning

In general, primitive type systems like that used by Pascal (and C?) don't deal 
well, or at all, with the scenario you describe. Often the built-in functions 
can hard-code support for multiple numeric types, automatically promoting one 
type to another as necessary, but the same effect is almost impossible to 
achieve in (say) standard Pascal.

Other type-checkers can deal better with polymorphism.

But there's a trade-off: the more kinds of things a value or variable might be, 
the less certain you are of what is allowed ahead of time. That's why dynamic 
typed languages traditionally skipped *all* ahead-of-time type checking and 
instead relied entirely on runtime type errors, while traditional compilers 
restrict what you can do as the trade-off for catching more errors ahead of 
time.

(That's where the reputation for flexibility of dynamic typing comes from: you 
never need to fight the compiler to do something you know will be okay, like 
passing an int to a function that expects a float.)

I might be able to tell the compiler that x is Union[int, str] (a number, or a 
string) but that limits the ability of the compiler to tell what is and what 
isn't safe. If I declare that x is either an int or a str, what can we say 
about x.upper()? Is it safe? If x happens to be an int at the time we call 
x.upper(), will the language raise a runtime exception or will it blindly try 
to execute some arbitrary chunk of memory as the upper() method?

This is why static and dynamic typing are slowly converging: statically typed 
languages are slowly gaining dynamic features, like C++ and vtables:

https://en.wikipedia.org/wiki/Virtual_method_table

while dynamically typed languages are slowly gaining smarter compilers capable 
of doing some "best effort" compile-time type-checking. Or at least allowing 
external type-checkers/linters to do so.

The bottom line is that if a human reader can read the source code and deduce 
that x.upper() is safe because in this branch of the code, x must be a string 
rather than an int, then *in principle* a type-checker could do the same. 
Possibly better than a human, or possibly worse. Depends on the intelligence of 
the type-checker and the code being checked.

A good enough type-checker can find infinite loops:

http://perl.plover.com/yak/typing/notes.html

> If what the compiler's doing is identifying what *is* assigned, then
> it's easy. You've given it an int over here and a float over there,
> and that's legal; from that point on, the compiler knows that this
> contains either an int or a float. (Let's assume it can't know for
> sure which, eg it has "if (cond) x=1; else x=1.5" where the condition
> can't be known till run-time.) But for your example of x="hello" to be
> a compilation error, it has to either assume that the first object
> given determines the type completely, or be told what types are
> permitted.

I'm not aware of any 

[about Pike]
> string(8bit)|int x(0..) = 12345;
> 
> which would allow x to store a byte-string (an eight-bit string, as
> opposed to a Unicode string which stores text) or a non-negative
> integer. 

Okay. What happens when you say:

if random() < 0.5:
    x = 1234
else:
    x = "surprise!"

y = 3*(x + 1)
z = x.find("p")  # or however Pike does string functions/methods

What is y? What is z?

There are solutions to this conundrum. One is weak typing: 3*("surprise!" + 1) 
evaluates as 3*(0 + 1) or just 3, while (1234).find("p") coerces 1234 to the 
string "1234".

Another is runtime exceptions.

A third is "don't do that, if you do, you can deal with the segmentation 
fault".

A fourth would be that the type-checker is smart enough to recognise that only 
one of those two assignments is valid, the second must be illegal, and flag the 
whole thing. That's what a human would do -- I don't know if any type systems 
are that sophisticated.

-- 
Steve