What's the best way to minimize the need of run time checks?

Sun Aug 28 22:43:32 EDT 2016

On Sun, 28 Aug 2016 07:28 pm, Chris Angelico wrote:

> On Sun, Aug 28, 2016 at 6:33 PM, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
>> On Sunday 28 August 2016 15:29, Chris Angelico wrote:
>>> It might be a good way of thinking about points on a Cartesian plane,
>>> though. Rectangular and polar coordinates truly are just different
>>> ways of expressing the same information.
>>
>> That's exactly my point, and that's why you shouldn't implement them as
>> different classes. If you do, that's a limitation of your code and/or the
>> language.
[... snip digression over possible leaky abstractions due to floating point
rounding ...]

> class Complex(complex):
>     @property
>     def r(self):
>         return abs(self)
>     @property
>     def phi(self):
>         return cmath.phase(self)
> 
> One value, two ways of looking at it. One type. One class.

I can't tell if you're saying this to agree with me or to disagree with me.

>> I might be able to tell the compiler that x is Union[int, str] (a number,
>> or a string) but that limits the ability of the compiler to tell what is
>> and what isn't safe. If I declare that x is either an int or a str, what
>> can we say about x.upper()? Is it safe? If x happens to be an int at the
>> time we call x.upper(), will the language raise a runtime exception or
>> will it blindly try to execute some arbitrary chunk of memory as the
>> upper() method?
> 
> That's fine if you *tell* the compiler this.

I trust you don't actually mean that it is fine for a high-level
(non-assembly) language to blindly execute some arbitrary memory address.

> My question came from 
> your statement that a type *inference* system can detect errors of
> assignment - not method/operator usage.

Ignore the question of inference versus declaration. At least for simple
cases, there's no real difference between the C-like declaration and
assignment:

int x = 5;

and the assignment:

x = 5;

It doesn't require super-human, or even human, intelligence to infer that if
x is assigned the value 5, x must be an int. So a *simple* inference engine
isn't very sophisticated.

Your question seems to be, what happens if you follow that with an
assignment to a different type?

x = 5
some_code(x)
x = "hello world"

Will the type-checker consider that an error ("you're assigning a str to an
int") or will it infer that x is the Union[int, str]?

Surely that depends on the type-checker! I don't think there's any hard and
fast rule about that, but as far as I know, all statically typed languages
consider than an error. Remember, that's the definition of *static typing*:
the variable carries the type, not the value. So x is an int, and "hello
world" isn't an int, so this MUST be an error.

In dynamically typed languages... I don't know. I think that (again) I would
expect that *by default* the checker should treat this as an invalid
assignment. The whole point of a static type-checker is to bring some
simulacrum of static typing to a dynamic language, so if you're going to
enthusiastically infer union types every time you see an unexpected type
assignment, it sort of defeats the purpose...

"x was declared int, and you then call function foo() which is declared to
return an int, but sometimes returns a str... oh, they must have meant a
union, so that's okay..."

> A dynamic type inference 
> system could easily cope with this:
> 
> x = 5
> y = x**2
> x = "five"
> z = x.upper()
> 
> and correctly deduce that, at the time of y's assignment,
> exponentiation of x was legal, and at the time of z's, uppercasing
> was. 

Could it? Do you have an example of a language or type-checker that can do
that? This isn't a rhetorical question.

My understanding is that all the standard algorithms for checking types are
based on the principle that a variable only has a single type in any one
scope. Global x and local x may be different, but once you explicitly or
implicitly set x to an int, then you can't set it to a str. So I'm not sure
that existing compile-time type-checkers could deal with that, even in a
dynamic language. At best they might say "oh well, x is obviously
duck-typed, so don't bother trying to check it".

At least that's my understanding -- perhaps I'm wrong.

I daresay that what you want is *possible*. If the human reader can do it,
then an automated checker should be able to. The question is not "is this
possible?" but "has anyone done it yet?".

> But you said that the type system could flag the third line as an 
> error, saying "hey, I'm expecting this to be integers only". Here's
> what you said:
> 
>> x = 1
>> x = "hello"  # a type error, at compile time
>
> If I were doing type inference, with my limited knowledge of the
> field, I would do one of two things:
> 
> 1) Infer that x holds only integers (or maybe broaden it to
> "numbers"), and then raise an error on the second line; this basically
> restricts the type system to be union-free

No, it only means that the system won't *infer* type unions just from
assignment. And that's probably what we want: the whole point is to catch
type errors. One type error is to 

x = 1  # type:int
# later...
x = spam()
do_something_with(x)  # that expects an int

the type-checker should check that spam() will always return an int. If
there are any circumstances where spam() will actually return (say) a str,
this should be an error, not just infer that it could be either an int or a
str. That would mean you could never catch type errors in assignments.

There are other ways that you can infer a union type. The most obvious is to
declare it, but MyPy can infer that x is Union[int, str] following:

if isinstance(x, int) or isinstance(x, str):
    ...

http://mypy-lang.blogspot.com.au/2016/07/mypy-043-released.html

But it really depends on how smart the type inference is. A simple-minded
one might be no smarter than like Pascal type-checking, except you don't
need the initial var declaration. The MyPy one is pretty smart.

> 2) Infer that x holds Union[int, str] in PEP 484 notation, or
> int|string in Pike notation, and permit it to carry either type.

Unless the type-checker was *really* smart, that would probably defeat the
purpose.

> Which way is it? Do you get errors, as per your example, and thus are
> never allowed to have union types? And if so, what happens with
> compatible types (notably, int and float)?

Who says they're compatible?

py> (23).bit_length()
5
py> (23.0).bit_length()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'float' object has no attribute 'bit_length'

Some languages (Erlang?) are so strict that they forbid mixed arithmetic and
require you to explicitly coerce one type to another. That's not a bad
idea... consider adding an int n to a float x, can that ever raise?

-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.