What is Expressiveness in a Computer Language

Sun Jun 25 03:38:53 EDT 2006

David Hopwood wrote:
> But since the relevant feature that the languages in question possess is
> dynamic tagging, it is more precise and accurate to use that term to
> describe them.

So you're proposing to call them dynamically-tagged languages?

> Also, dynamic tagging is only a minor help in this respect, as evidenced
> by the fact that explicit tag tests are quite rarely used by most programs,
> if I'm not mistaken. 

It sounds as though you're not considering the language implementations 
themselves, where tag tests occur all the time - potentially on every 
operation.  That's how "type errors" get detected.  This is what I'm 
referring to when I say that dynamic tags support latent types.

Tags are absolutely crucial for that purpose: without them, you have a 
language similar to untyped lambda calculus, where "latent type errors" 
can result in very difficult to debug errors, since execution can 
continue past errors and produce completely uninterpretable results.

> IMHO, the support does not go far enough for it to be
> considered a defining characteristic of these languages.

Since tag checking is an implicit feature of language implementations 
and the language semantics, it certainly qualifies as a defining 
characteristic.

> When tag tests are used implicitly by other language features such as
> pattern matching and dynamic dispatch, they are used for purposes that are
> equally applicable to statically typed and non-(statically-typed) languages.

A fully statically-typed language doesn't have to do tag checks to 
detect static type errors.

Latently-typed languages do tag checks to detect latent type errors.

You can take the preceding two sentences as a summary definition for 
"latently-typed language", which will come in handy below.

>>>or that languages
>>>that use dynamic tagging are "latently typed". This simply is not a
>>>property of the language (as you've already conceded).
>>
>>Right.  I see at least two issues here: one is that as a matter of
>>shorthand, compressing "language which supports latent typing" to
>>"latently-typed language" ought to be fine, as long as the term's
>>meaning is understood.
> 
> 
> If, for the sake of argument, "language which supports latent typing" is
> to be compressed to "latently-typed language", then statically typed
> languages must be considered also latently typed.

See definition above.  The phrase "language which supports latent 
typing" wasn't intended to be a complete definition.

> After all, statically typed languages support expression and
> verification of the "types in the programmer's head" at least as well
> as non-(statically-typed) languages do. In particular, most recent
> statically typed OO languages use dynamic tagging and are memory safe.
> And they support comments ;-)

But they don't use tags checks to validate their static types.

When statically-typed languages *do* use tags, in cases where the static 
type system isn't sufficient to avoid them, then indeed, those parts of 
the program use latent types, in the exact same sense as more fully 
latently-typed languages do.  There's no conflict here, it's simply the 
case that most statically-typed languages aren't fully statically typed.

> This is not, quite obviously, what most people mean when they say
> that a particular *language* is "latently typed". They almost always
> mean that the language is dynamically tagged, *not* statically typed,
> and memory safe. That is how this term is used in R5RS, for example.

The R5RS definition is compatible with what I've just described, because 
the parts of a statically-typed language that would be considered 
latently-typed are precisely those which rely on dynamic tags.

>>But beyond that, there's an issue here about the definition of "the
>>language".  When programming in a latently-typed language, a lot of
>>action goes on outside the language - reasoning about static properties
>>of programs that are not captured by the semantics of the language.
> 
> 
> This is true of programming in any language.

Right, but when you compare a statically-typed language to an untyped 
language at the formal level, a great deal more static reasoning goes on 
outside the language in the untyped case.

What I'm saying is that it makes no sense, in most realistic contexts, 
to think of untyped languages as being just that: languages in which the 
type of every term is simply a tagged value, as though no static 
knowledge about that value exists.  The formal model requires that you 
do this, but programmers can't function if that's all the static 
information they have.  This isn't true in the case of a fully 
statically-typed language.

>>This means that there's a sense in which the language that the
>>programmer programs in is not the same language that has a formal
>>semantic definition.  As I mentioned in another post, programmers are
>>essentially mentally programming in a richer language - a language which
>>has informal (static) types - but the code they write down elides this
>>type information, or else puts it in comments.
> 
> 
> If you consider stuff that might be in the programmer's head as part
> of the program, where do you stop? When I maintain a program written
> by someone I've never met, I have no idea what was in that programmer's
> head. All I have is comments, which may be (and frequently are,
> unfortunately) inaccurate.

You have to make the same kind of inferences that the original 
programmer made.  E.g. when you see a function that takes some values, 
manipulates them using numeric operators, and returns a value, you have 
to either figure out or trust the comments that the function accepts 
numbers (or integers, floats etc.) and returns a number.

This is not some mystical psychological quality: it's something that 
absolutely must be done in order to be able to reason about a program at 
all.

> (Actually, it's worse than that -- if I come back to a program 5 years
> later, I probably have little idea what was in my head at the time I
> wrote it.)

But that's simply the reality - you have to reconstruct: recover the 
latent types, IOW.  There's no way around that!

Just to be concrete, I'll resurrect my little Javascript example:

   function timestwo(x) { return x*2 }

Assume I wrote this and distributed it in source form as a library, and 
now you have to maintain it.  How do you know what values to call it 
with, or what kind of values it returns?  What stops you from calling it 
with a string?

Note that according the formal semantics of an untyped language, the 
type of that function is "value -> value".

The phrase "in the programmer's head" was supposed to help communicate 
the concept.  Don't get hung up on it.

>>We have to accept, then, that the formal semantic definitions of
>>dynamically-checked languages are incomplete in some important ways.
>>Referring to those semantic definitions as "the language", as though
>>that's all there is to the language in a broader sense, is misleading.
> 
> 
> Bah, humbug. The language is just the language.

If you want to have this discussion precisely, you have to do better 
than that.  There is an enormous difference between the formal semantics 
of an untyped language, and the (same) language which programmers work 
with and think of as "dynamically typed".

Use whatever terms you like to characterize this distinction, but you 
can't deny that the distinction exists, and that it's quite a big, 
important one.

>>In this context, the term "latently-typed language" refers to the
>>language that a programmer experiences, not to the subset of that
>>language which is all that we're typically able to formally define.
> 
> 
> I'm with Marshall -- this is way too mystical for me.

I said more in my reply to Marshall.  Perhaps I'm not communicating 
well.  At worst, what I'm talking about is informal.  It's easy to prove 
that you can't reason about a program if you don't know what types of 
values a function accepts or returns - which is what an untyped formal 
model gives you.

Anton