What's the best way to minimize the need of run time checks?

Sun Aug 28 03:31:28 EDT 2016

Chris Angelico writes:

> On Sun, Aug 28, 2016 at 4:13 PM, Jussi Piitulainen wrote:
>>> This is where I'm less sure. Sometimes a variable's type should be
>>> broader than just one concrete type - for instance, a variable might
>>> hold 1 over here, and 1.5 over there, and thus is storing either
>>> "int or float" or "any number". If you have a complex hierarchy of
>>> types, how do you know that this variable should be allowed to hold
>>> anything up to a certain level in the hierarchy, and no further?
>>
>> It's not just literal values that give potential type information in
>> a dynamically typed language. Another source is functions that the
>> compiler knows, and this information propagates back and forth in the
>> analysis of the control flow.
>>
>> For example, below the compiler might infer that x must be a number
>> but not a complex number, then generate one type check (which it
>> might be able to prove redundant) and calls to specialized versions
>> of ceiling and floor.
>>
>>     d = ceiling(x) - floor(x)
>>
>> Also known is that the results of the calls are numbers and the
>> difference of numbers is a number, so d gets assigned a number.
>> Perhaps ceiling and floor in the language always return an int. Then
>> d is known to be an int. And so on.
>
> Right, and I understand this concept. Consider this code:
>
> x = 5;
> ...
> if (some_condition)
>     x = "five";
> else
>     x = [0, 0, 0, 0, 0];
>
> (adjust syntax to whatever language you like)
>
> Does this mean that the type of x is int|string|list, or will this be
> an error? Assuming the condition can't be known until run time (eg it
> involves user input), there's no way for a static analyzer to
> differentiate between this code and the form that Steven put forward:

I'm thinking of a dynamically typed language, so the type of x is the
type of a value, so:

Before the conditional, x is known to be 5 (an int).

After the conditional, x is known to be "five" or [0,0,0,0,0] (a string
or a list of int; not an int).

If the next statement is to return -x, *that* is an error, because -x
does not make sense after either branch of the conditional.

It the next statement is to return x.swapcase(), the compiler can
replace the conditional with

   if (some_condition)
      return "FIVE" # assuming local x
   else
      raise Objection("list don't have no .swapcase() method")

In no case would I say that a mere assignment to a variable is a type
error in a dynamically typed language.

>> x = 1
>> x = "hello"  # a type error, at compile time
>
> Simple type inference would either see this as meaning that x is
> int|string, or possibly it'd say "x is an int up to that second line,
> and a string thereafter" (which is basically like dynamic typing but
> statically checked - it's the value, not the variable, that has a
> type, and checks like x.upper() would take note of that). But if it
> flags it as an error, that would basically mean that the type system
> is (probably deliberately) simplistic and restrictive, requiring that
> x be EITHER an integer variable OR a string variable, and not both.

I'd say that the compiler of a dynamically typed language has different
information about the type of (the value of) x after the first statement
and after the second statement.

If that is considered an error instead, as the comment says, then the
language is statically typed (the type pertains to the variable).

> Which is a perfectly viable stance, but I'm just not sure if it's (a)
> what is done, or (b) ideal. Particularly since it'd end up requiring
> some annoying rules, like "integers and floats are compatible, but
> nothing else, including user-defined types" or "integers and floats
> are fundamentally different things, and if you want your variable ever
> to contain a float, you have to always use 1.0 instead of just 1",
> neither of which I like.

I suppose statically typed type-inferencing languages do that, but I
don't have much experience with them. They may not be happy until they
can infer a concrete implementation type for every variable, and there
may be some awkward corners then.