What is Expressiveness in a Computer Language

Fri Jun 23 03:42:25 EDT 2006

Marshall wrote:
>>The short answer is that I'm most directly referring to "the types in
>>the programmer's head".
> 
> 
> In the database theory world, we speak of three levels: conceptual,
> logical, physical. In a dbms, these might roughly be compared to
> business entities described in a requirements doc, (or in the
> business owners' heads), the (logical) schema encoded in the
> dbms, and the b-tree indicies the dbms uses for performance.
> 
> So when you say "latent types", I think "conceptual types."

That sounds plausible, but of course at some point we need to pick a 
term and attempt to define it.  What I'm attempting to do with "latent 
types" is to point out and emphasize their relationship to static types, 
which do have a very clear, formal definition.  Despite some people's 
skepticism, that definition gives us a lot of useful stuff that can be 
applied to what I'm calling latent types.

> The thing about this is, both the Lisp and the Haskell programmers
> are using conceptual types (as best I can tell.) 

Well, a big difference is that the Haskell programmers have language 
implementations that are clever enough to tell them, statically, a very 
precise type for every term in their program.  Lisp programmers usually 
don't have that luxury.  And in the Haskell case, the conceptual type 
and the static type match very closely.

There can be differences, e.g. a programmer might have knowledge about 
the input to the program or some programmed constraint that Haskell 
isn't capable of inferring, that allows them to figure out a more 
precise type than Haskell can.  (Whether that type can be expressed in 
Haskell's type system is a separate question.)

In that case, you could say that the conceptual type is different than 
the inferred static type.  But most of the time, the human is reasoning 
about pretty much the same types as the static types that Haskell 
infers.  Things would get a bit confusing otherwise.

> And also, the
> conceptual/latent types are not actually a property of the
> program; 

That's not exactly true in the Haskell case (or other languages with 
static type inference), assuming you accept the static/conceptual 
equivalence I've drawn above.

Although static types are often not explicitly written in the program in 
such languages, they are unambiguously defined and automatically 
inferrable.  They are a static property of the program, even if in many 
cases those properties are only implicit with respect to the source 
code.  You can ask the language to tell you what the type of any given 
term is.

> they are a property of the programmer's mental
> model of the program.

That's more accurate.  In languages with type inference, the programmer 
still has to figure out what the implicit types are (or ask the language 
to tell her).

You won't get any argument from me that this figuring out of implicit 
types in a Haskell program is quite similar to what a Lisp, Scheme, or 
Python programmer does.  That's one of the places the connection between 
static types and what I call latent types is the strongest.

(However, I think I just heard a sound as though a million type 
theorists howled in unison.[*])

[*] most obscure inadvertent pun ever.

> It seems we have languages:
> with or without static analysis
> with or without runtime type information (RTTI or "tags")
> with or without (runtime) safety
> with or without explicit type annotations
> with or without type inference
> 
> Wow. And I don't think that's a complete list, either.

Yup.

> I would be happy to abandon "strong/weak" as terminology
> because I can't pin those terms down. (It's not clear what
> they would add anyway.)

I wasn't following the discussion earlier, but I agree that strong/weak 
don't have strong and unambiguous definitions.

> Uh, oh, a new term, "manifest." Should I worry about that?

Well, people have used the term "manifest type" to refer to a type 
that's explicitly apparent in the source, but I wouldn't worry about it. 
  I just used the term to imply that at some point, the idea of "latent 
type" has to be converted to something less latent.  Once you explicitly 
identify a type, it's no longer latent to the entity doing the identifying.

>>But if we ask Javascript what it thinks the type of timestwo is, by
>>evaluating "typeof timestwo", it returns "function".  That's because the
>>value bound to timestwo has a tag associated with it which says, in
>>effect, "this value is a function".
> 
> 
> Well, darn. It strikes me that that's just a decision the language
> designers
> made, *not* to record complete RTTI. 

No, there's more to it.  There's no way for a dynamically-typed language 
to figure out that something like the timestwo function I gave has the 
type "number -> number" without doing type inference, by examining the 
source of the function, at which point it pretty much crosses the line 
into being statically typed.

> (Is it going to be claimed that
> there is an *advantage* to having only incomplete RTTI? It is a
> serious question.)

More than an advantage, it's difficult to do it any other way.  Tags are 
associated with values.  Types in the type theory sense are associated 
with terms in a program.  All sorts of values can flow through a given 
term, which means that types can get complicated (even in a nice clean 
statically typed language).  The only way to reason about types in that 
sense is statically - associating tags with values at runtime doesn't 
get you there.

This is the sense in which the static type folk object to the term 
"dynamic type" - because the tags/RTTI are not "types" in the type 
theory sense.

Latent types as I'm describing them are intended to more closely 
correspond to static types, in terms of the way in which they apply to 
terms in a program.

> Hmmm. Another place where the static type isn't the same thing as
> the runtime type occurs in languages with subtyping.

Yes.

> Question: if a language *does* record complete RTTI, and the
> languages does *not* have subtyping, could we then say that
> the runtime type information *is* the same as the static type?

You'd still have a problem with function types, as I mentioned above, 
as well as expressions like the conditional one I gave:

   (flag ? 5 : "foo")

The problem is that as the program is running, all it can ever normally 
do is tag a value with the tags obtained from the path the program 
followed on that run.  So RTTI isn't, by itself, going to determine 
anything similar to a static type for such a term, such as "string | 
number".

One way to think about "runtime" types is as being equivalent to certain 
kinds of leaves on a statically-typed program syntax tree.  In an 
expression like "x = 3", an inferring compiler might determine that 3 is 
of type "integer".  The tagged values which RTTI uses operate at this 
level: the level of individual values.

However, as soon as you start dealing with other kinds of nodes in the 
syntax tree -- nodes that don't represent literal values, or compound 
nodes (that have children) -- the possibility arises that the type of 
the overall expression will be more complex than that of a single value. 
  At that point, RTTI can't do what static types do.

Even in the simple "x = 3" case, a hypothetical inferring compiler might 
notice that elsewhere in the same function, x is treated as a floating 
point number, perhaps via an expression like "x = x / 2.0".   According 
to the rules of its type system, our language might determine that x has 
type "float", as opposed to "number" or "integer".  It might then either 
treat the original "3" as a float, or supply a conversion when assigning 
it to "x".  (Of course, some languages might just give an error and 
force you to be more explicit, but bear with me for the example - it's 
after 3:30am again.)

Compare this to the dynamically-typed language: it sees "3" and, 
depending on the language, might decide it's a "number" or perhaps an 
"integer", and tag it as such.  Either way, x ends up referring to that 
tagged value.  So what type is x?  All you can really say, in the RTTI 
case, is that x is a number or an integer, depending on the tag the 
value has.  There's no way it can figure out that x should be a float at 
that point.

Of course, further down when it runs into the code "x = x / 2.0", x 
might end up holding a value that's tagged as a float.  But that only 
tells you the value of x at some point in time, it doesn't help you with 
the static type of x, i.e. a type that either encompasses all possible 
values x could have during its lifetime, or alternatively determines how 
values assigned to x should be treated (e.g. cast to float).

BTW, it's worth noting at this point, since it's implied by the last 
paragraph, that a static type is only an approximation to the values 
that a term can have during the execution of a program.  Static types 
can (often!) be too conservative, describing possible values that a 
particular term couldn't actually ever have.  This gives another hint as 
to why RTTI can't be equivalent to static types.  It's only ever dealing 
with the concrete values right in front of it, it can't see the bigger 
static picture.

Anton