What is Expressiveness in a Computer Language

Fri Jun 23 22:25:36 EDT 2006

I thought about this in the context of reading Anton's latest post to 
me, but I'm just throwing out an idea.  It's certainly too fuzzy in my 
mind at the moment to consider it in any way developed.  I'm sure it's 
just as problematic, if not more so, as any existing accounts of dynamic 
types.  Here it is, anyway, just because it's a different way of seeing 
things.

(I'll first point out here, because it's about to become significant, 
that contrary to what's been said several times, static type systems 
assign types to expressions rather than variables.)

First of all, it's been pointed out several times that static typing 
assumes a static program, and that it becomes difficult to describe if 
the program itself is being modified as it is evaluated.  The 
interesting thing that occurred to me earlier today is that for 
languages that are reducible to the lambda calculus at least, ALL 
programs are from some perspective self-modifying.  Specifically, no 
single expression is ever evaluated more than once.  When a statement is 
evaluated several times, that simply means that there is actually a more 
complex statement that, when it is evaluated, writes a new statement, 
which consists of a copy of itself, plus something else.  Thus it makes 
little sense, from that perspective, to worry about the possibility that 
an expression might have a different type each time it's evaluated, and 
therefore violate static typing.  It is never evaluated more than once.

What if I were to describe a dynamically typed system as one in which 
expressions have types just as they do in a static type system, but in 
which programs are seen as self-modifying?  Rather than limiting forms 
of recursion for well-typed programs to, say, the standard fixed-point 
operator and then assigning some special-purpose rule for it, we simply 
admit that the program is being modified as it is run.  Type inference 
(and therefore checking) would need to be performed incrementally.  When 
they do, the results of evaluating everything up to this point will have 
changed the type environment, providing more information than was 
previously known.  The type environment for assigning types to an 
expression could still be treated statically at the time that the 
expression first comes into existence; but it would be different then 
from what it was at the beginning of the program.

Of course, this is only a formal description of the meaning, and not a 
description of any kind of a reasonable implementation.  However, this 
initially appears, on the surface, to take us some distance toward 
unifying dynamic types and static types.

At least two interesting correlations just sort of "fall out".

First, Marshall has observed (and I agree) that static and dynamic types 
form not different ways of looking at verifying correctness, but rather 
different forms of programming in the first place.  It appears to be 
encouraging, then, that this perspective classifies static and dynamic 
types based on how we define the program rather than how we define the 
type checking.

Second, type theory is sometimes concerned with the distinction between 
structural and nominal types and various hybrids.  We've concerned 
ourself a good bit in this thread with exactly what kinds of "tags" or 
other runtime type notations are acceptable for defining a dynamically 
typed language.  In this view, Smalltalk (or some simplification thereof 
wherein doesNotUnderstand is considered an error handling mechanism 
rather than a first-class part of the language semantics) would 
represent the dynamic version of structural types, whereas type 
"tagging" systems with a user-declared finite set of type tags would be 
the dynamic equivalent to nominal types.  Just like in type theory, 
dynamic type systems can also have hybrids between structural and 
nominal types.

Obviously, that has a lot of problems... some of the same ones that I've 
been complaining about.  However, it has some interesting 
characteristics as well.  If an expression is assumed to be different 
each time it is encountered, then type the type of "one" expression is 
easy to evaluate in the initial environment in which it occurs (for some 
well-defined order of evaluation).

Now, to Anton's response, with admittedly rather liberal skimming 
(sorry!) to avoid my staying up all night thinking about this.

Anton wrote:
> It's obviously going to be difficult to formally pin down something that 
> is fundamentally informal.  It's fundamentally informal because if 
> reasoning about the static properties of terms in DT languages were 
> formalized, it would essentially be something like a formal type system.

We may be talking past each other, then, but I'm not particularly 
interested in the informal aspects of what makes up a type.  I agree 
that formalizing the notion appears quite difficult, especially if my 
conjecture above were to become part of it.

> However, there are some pretty concrete things we can look at.  One of 
> them, which as I've mentioned elsewhere is part of what led me to my 
> position, is to look at what a soft type inferencer does.  It takes a 
> program in a dynamically typed language, and infers a static type scheme 
> for it (obviously, it also defines an appropriate type system for the 
> language.)  This has been done in both Scheme and Erlang, for example.

> How do you account for such a feat in the absence of something like 
> latent types?

There are undoubtedly any number of typing relations and systems of 
rules which can be assigned to any program in a dynamically typed 
language.  I think the more interesting question is how these type 
inference systems choose between the limitless possibilities in a way 
that tends to be acceptable.  I suspect that they do it by assuming 
certain kinds of type systems, and then finding them.  I doubt they are 
going to find a type system in some language until the basics of that 
system have been fed to the software as something to look for.

I'd be delighted to discover that I'm wrong.  If nothing else, it would 
mean that applying these programs to significant amounts of Scheme and 
Erlang software would become an invaluable tool for doing research in 
type theory.

> > Undoubtedly, some programmers sometimes perform reasoning about their 
> > programs which could also be performed by a static type system.  
> 
> I think that's a severe understatement.

Yes, I agree.  I did not intend the connotations that ended up there.  
In fact, I'd say that something like 90% (and yes, I just grabbed that 
number out of thin air) of reasoning performed by programmers about 
their programs could be performed by a static type system.

> BTW, I notice you didn't answer any of the significant questions which I 
> posed to Vesa.  So let me pose one directly to you: how should I rewrite 
> the first sentence in the preceding paragraph to avoid appealing to an 
> admittedly informal notion of type?

I'll try to answer your question.  My answer is that I wouldn't try.  I 
would, instead, merely point out that the notion of type is informal, 
and that it is not necessarily even meaningful.  The two possibilities 
that concern me are that either reasoning with types in this informal 
sense is just a word, which gets arbitrarily applied to some things but 
not others such that distinctions are made that have no real value, and 
that any sufficient definition of type-based reasoning is going to be 
broad enough to cover most reasoning that programmers ever do, once we 
start applying enough rigor to try to distinguish between what counts as 
a type and what doesn't.  The fact that programmers think about types is 
not disputed; my concern is that the types we think about may not 
meaningfully exist.

> Note, also, that I'm using the word 
> in the sense of static properties, not in the sense of tagged values.

In the static sense of types, as I just wrote to Chris Uppal, I can then 
definitively assert at least the second of the two possibilities above.  
Every possible property of program execution is a type once the problem 
of how to statically check it has been solved.

> For now, I'll stand on what I've written above.  When I see if or how 
> that doesn't convince you, I can go further.

To be clear, I am certainly convinced that people talk and think about 
types and also about other things regarding their programs which they do 
not call types.  I am concerned only with how accurate it is to make 
that distinction, on something other than psychological grounds.  If we 
are addressing different problems, then so be it.  If however, you 
believe there is a type/non-type distinction that can be built on some 
logical foundation, then I'd love to hash it out.

> > It is, nevertheless, quite appropriate to call the language "untyped" if 
> > I am considering static type systems.  
> 
> I agree that in the narrow context of considering to what extent a 
> dynamically typed language has a formal static type system, you can call 
> it untyped.

OK.

> However, what that essentially says is that formal type 
> theory doesn't have the tools to deal with that language, and you can't 
> go much further than that.  As long as that's what you mean by untyped, 
> I'm OK with it.

I can see that in a sense... yes, if a specific program has certain 
properties, then it may be possible to extend the proof-generator (i.e., 
the type checker) to be able to prove those properties.  However, I 
remain inconvinced that the same is true of the LANGUAGE.  Comments in 
this thread indicate to me that even if I could write a type-checker 
that checks correctness in a general-purpose programming language, and 
that language was Scheme, there would be Scheme programmers who would 
prefer not to use my Scheme+Types language because it would pester them 
until they got everything perfect before they could run their code, and 
Scheme would remain a separate dynamically typed (i.e., from a type 
theory perspective, untyped) language despite my having achieved all of 
the wildest dreams of the field of type theory.  In that sense, I don't 
consider this a limitation of type theory.

(Two notes: first, I don't actually believe that this is possible, since 
the descriptions people would give for "correct" are nowhere near so 
formal as the type theory that would try to check them; and second, I am 
not so sure that I'd want to use this new language either, for anything 
except applications in which bugs are truly life-threatening.)

> Think of it like this: the more ambitious a language's type system is, 
> the fewer uncaptured static properties remain in the code of programs in 
> that language.  However, there are plenty of languages with rather weak 
> static type systems.  In those cases, code has more static properties 
> that aren't captured by the type system.

Yes, I agree with this.

> I'm pointing out that in many 
> of these cases, those properties resemble types, to the point that it 
> can make sense to think of them and reason about them as such, applying 
> the same sort of reasoning that an automated type inferencer applies.

It often makes sense to apply the same sort of reasoning as would be 
involved in static type inference.  I also agree that there are 
properties called types, which are frequently the same properties caught 
by commonly occurring type systems, and which enough developers already 
think of under the word "type" to make it worth trying to change that 
EXCEPT when there is an explicit discussion going on that touches on 
type theory.  In that latter case, though, we need to be very careful 
about defining some group of things as not-types across the scope of all 
possible type systems.  There are no such things a priori, and even when 
you take into account the common requirement that a type system is 
tractable, the set of things that are not-type because it's provably 
impossible to check them in polynomial time is smaller than the 
intuitive sense of that word.

Essentially, this use of "type" becomes problematic when it is used to 
draw conclusions about the scope, nature, purpose, etc. of static types.

This answers a good number of the following concerns.

> > I see it as quite reasonable when there's an effort by several 
> > participants in this thread to either imply or say outright that static 
> > type systems and dynamic type systems are variations of something 
> > generally called a "type system", and given that static type systems are 
> > quite formally defined, that we'd want to see a formal definition for a 
> > dynamic type system before accepting the proposition that they are of a 
> > kind with each other.  
> 
> A complete formal definition of what I'm talking about may be impossible 
> in principle, because if you could completely formally define it, you'd 
> have a static type system.

That may not actually be the case.  Lots of things may be formally 
defined that would nevertheless be different from a static type system.  
For example, if someone did provide a rigorous definition of some 
category of type errors that could be recognized by some defined process 
as such, then the set of language features that detect and react to 
these errors could be defined as a type system.  Chris Uppal just posted 
such a thing in a different part of this thread, and I'm unsure at this 
point whether it works or not.  The problem with earlier attempts to 
define type errors in this thread is that they either tend to include 
all possible errors, or are distinguished by some nebulously defined 
condition such as being commonly thought of in some way by some set of 
human beings.

> If that makes you throw up your hands, then all you're saying is that 
> you're unwilling to deal with a very real phenomenon that has obvious 
> connections to type theory, examples of which I've given above.  That's 
> your choice, but at the same time, you have to give up exclusive claim 
> to any variation of the word "type".

I really think it's been a while since I claimed any such thing... and 
even at the beginning, I only meant to challenge that use of type in 
comparing static and dynamic type systems.

I am not so sure, though, about these obvious connections to type 
theory.  Those connections may be, for example, some combination of the 
following rather uninteresting phenomena:

1. Type theory got the word "type" from somewhere a century ago, and the 
word has continued to be used in its original sense as well, and it just 
happens that despite the divergence of these uses, there remains some 
common ground between the two different meanings.

or, 2. Many programmers approach type theory from the perspective of 
having some intuitive definition of a type in mind, and they tend to mix 
it in with a few of the ideas of type theory in their minds, and then go 
off and use the word in other contexts.

> Terms are used in a context, and it's perfectly reasonable to call 
> something a "latent type" or even a "dynamic type" in a certain context
> and point out connections between those terms and their cousins (or if 
> you insist, their completely unrelated namesakes) static types.

Yes, I agree.  I'm yet to be convinced that they are completely 
unrelated namesakes.  I am actually trying to reconstruct such a thing.  
The problem is that, knowing how "type" is used in the static sense, I 
am forced to reject attempts that share only basic superficial 
similarities and yet miss the main point.

> > I believe that's unacceptable for several reasons, but the most 
> > significant of them is this.  It's not reasonable to ask anyone to 
> > accept that static type systems gain their essential "type system-ness" 
> > from the idea of having well-defined semantics.  
> 
> The definition of static type system is not in question.  However, 
> realistically, as I pointed out above, you have to acknowledge that type 
> systems exist in, and are inextricably connected to, a larger, less 
> formal context.

No, but it certainly seems that the reason behind calling them a "type 
system" is being placed very much in doubt.  Simply saying static type 
system is horribly wrong, and I do it only to avoid inventing words 
wholesale... but I greatly regret the implication that it means "a type 
system, which happens to be static."

In some senses, this is akin to describing a boat as "a floating 
platform".  That may be somewhat accurate as a rough definition for 
someone who is familiar with platforms but not with boats.  It's a 
terrible failure, though, in that it implies that being a platform is 
the important bit; where actually the important bit is that it floats!  
In fact, it may turn out that boats have very little to do with common 
platforms... for example, while some boats (barges) have many of the 
same properties that platforms do, there are whole other classes of 
boats, which are considerably more powerful than barges, that end up 
looking very little like a platform at all.  If you press hard enough, I 
can be forced to admit that there is some kind of connection between 
platforms and floating platforms, because after all one has to stand on 
something while using a boat, right?

Well, the important part of the phrase "static type" isn't "type" but 
"static."  Without even having to undertake the dead-end task of 
deciding who's got the right to the word "type", let's assume that I 
should really be saying some other word, like boat.  However, that word 
doesn't really exist, so I am stuck with "static type".  I just have to 
be very careful about explaining that what I mean by type is completely 
different from what others mean by type.  There is some overlap, which 
potentially arises from one of the uninteresting sources I've listed 
above, or maybe... just maybe... arises from some actual similarity that 
we haven't identified yet.  But we still haven't identified it.

This is very much the situation I see ourselves in.  I very much want to 
agree with everyone's right to use the word "type", but then people keep 
saying things that seem to me to be clear misunderstandings that the 
important part is the "static", not the "type".  When qualification is 
not needed

Fundamentally, the important aspect of types, in the static sense, is 
that they are used in a formal, syntactic, tractable method for proving 
the absence of program behaviors.  Even if it turns out that, in 
practice, all type systems contain mechanisms for solving a certain 
rigorously definable class of problems, that will not be part of the 
definition of types, when the word is used in the static sense.  So we 
can search for a more formal similarity, or we can agree that it's 
incidentally true that they sometimes solve the same problems but also 
agree that it's pointless and misleading to ever talk about a "type 
system" without clarifying the usage either explicitly or by context.  
It remains incorrect to relax the degree of formality and talk about 
static types in an intuitive and undefinable way, because than they lose 
the entire property of being types at all.  At that point, it's best to 
drop "static", which is not true anyway, and switch over to the other 
set of terminology where it makes sense to call them types.

> There's a sense in which one can say that yes, the informal types I'm 
> referring to have an interesting degree of overlap with static types; 
> and also that static types do, loosely speaking, have the "purpose" of 
> formalizing the informal properties I'm talking about.

Static types have the property of proving ANY properties of a program 
that may be proven.  Certainly, research is pragmatically restricted to 
specific problems that are (1) useful and (2) solvable.  Aside from 
that, I can't imagine any competent language designer building a type 
system, but rejecting some useful and solvable feature for the sole 
reason that he or she doesn't think of it as solving a type error.

> But I hardly see why such an informal characterization should bother 
> you.  It doesn't affect the definition of static type.

I hope I've explained how it does affect the definition of a static 
type.

> This is based on the assumption that all we're talking about is 
> "well-defined semantics".

Indeed.  My statement there will not apply if we find some formal 
definition of a dynamic type system that doesn't reduce to well-defined 
semantics once you remove the intuitive bits from it.

-- 
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation