What is Expressiveness in a Computer Language

Sun Jun 25 17:03:53 EDT 2006

Chris F Clark <cfc at shell01.TheWorld.com> (I) wrote:
> Do you reject that there could be something more general than what a 
> type theorist discusses?  Or do you reject calling such things a type?

Chris Smith <cdsmith at twu.net> wrote:

> I think that the correspondence partly in the wrong direction to 
> describe it that way.
...
> What I can't agree to is that what you propose is actually more general.  
> It is more general in some ways, and more limited in others.  As such, 
> the best you can say is that is analogous to formal types in some ways, 
> but certainly not that it's a generalization of them.

Yes, I see your point.

Me:
> I'm particularly interested if something unsound (and perhaps 
> ambiguous) could be called a type system.

Chris Smith:

> Yes, although this is sort of a pedantic question in the first place, I 
> believe that an unsound type system would still generally be called a 
> type system in formal type theory.  However, I have to qualify that with 
> a clarification of what is meant by unsound.  We mean that it DOES 
> assign types to expressions, but that those types are either not 
> sufficient to prove what we want about program behavior, or they are not 
> consistent so that a program that was well typed may reduce to poorly 
> typed program during its execution.  Obviously, such type systems don't 
> prove anything at all, but they probably still count as type systems.

Yes, and you see mine.

These informal systems, which may not prove what they claim to prove
are my concept of a "type system".  That's because at some point,
there is a person reasoning informally about the program.  (There may
also people reasoning formally about the program, but I am going to
neglect them.)  It is to the people who are reasoning informally about
the program we wish to communicate that there is a type error.  That
is, we want someone who is dealing with the program informally to
realize that there is an error, and that this error is somehow related
to some input not being kept within proper bounds (inside the domain
set) and that as a result the program will compute an unexpected and
incorrect result.

I stressed the informal aspect, because the intended client of the
program is almost universally *NOT* someone who is thinking rigorously
about some mathematical model, but is dealing with some "real world"
phenomena which they wish to model and may not have a completely
formed concrete representation of.  Even the author of the program is
not likely to have a completely formed rigorous model in mind, only
some approxiamtion to such a model.

I also stress the informality, because beyond a certain nearly trivial
level of complexity, people are not capable of dealing with truly
formal systems.  As a compiler person, I often have to read sections
of language standards to try to discern what the standard is requiring
the compiler to do.  Many language standards attempt to describe some
formal model.  However, most are so imprecise and ambiguous when they
attempt to do so that the result is only slightly better than not
trying at all.  Only when one understands the standard in the context
of typical practice can one begin to fathom what the standard is
trying to say and why they say it the way they do.  

A typical example of this is the grammars for most programming
languages, they are expressed in a BNF variant.  However, most of the
BNF descriptions omit important details and gloss over certain
inconsistencies.  Moreover, most BNF descriptions are totally
unsuitable to a formal translation (e.g. an LL or LR parser generator,
or even an Earley or CYK one).  Yet, when considered in context of
typical implementations, the errors in the formal specification can
easily be overlooked and resolved to produce what the standard authors
intended.

For example, a few years ago I wrote a Verilog parser by
transliterating the grammar in the standard (IEEE 1364-1995) into the
notation used by my parser generator, Yacc++.  It took a couple of
days to do so.  The resulting parser essentially worked on the first
try.  However, the reason it did so, is that I had a lot of outside
knowledge when I was making the transliteration.  There were places
where I knew that this part of the grammar was lexical for lexing and
this part was for parsing and cut the grammar correctly into two
parts, despite there being nothing in the standard which described
that dichotomy.  There were other parts, like the embedded
preprocessor, that I knew were borrowed from C, so that I could use
the reoultions of issues the way a C compiler would do so, to guide
resolving the ambiguities in the Verilog standard.  There were other
parts, I could tell the standard was simply written wrong, that is
where the grammar did not specify the correct BNF, because the BNF
they had written did not specifiy something which would make
sense--for example, the endif on an if-then-else-statement was bound
to the else, so that one would write an if then without an endif, but
and if then else with the endif--and I knew that was not the correct
syntax.

The point of the above paragraph is that the very knowledgable people
writing the Verilog standard made mistakes in their formal
specification, and in fact these mistakes made it passed several
layers of review and had been formally ratified.  Given the correct
outside knowledge these mistakes are relatively trivial to ignore and
correct. However, from a completely formal perspective, the standard
is simply incorrect.

More importantly, chip designers use knowledge of the standard to
guide the way they write chip descriptions.  Thus, at some level these
formal erros, creep into chip designs.  Fortunately, the chip
designers also look past the inconsistencies and "program" (design) in
a language which looks very like standard Verilog, but actually says
what they mean.

Moreover, not only is the grammar in the standard a place where
Verilog is not properly formalized.  There are several parts of the
type model which suffer similar problems.  Again, we have what the
standard specifies, and what is a useful (and expected) description of
the target model.  There are very useful and informal models of how
the Verilog language work and what types are involved.  There is also
an attempt in the standard to convey some of these aspects formally.
Occassionally, the formal description is write and illuminating, at
other times it is opaque and at still other times, it is inconsistent,
saying contradictory things in different parts of the standard.

Some of the more egregious errors were corrected in the 2001 revision.
However, at the same time, new features were added which were (as far
as I can tell) just as inconsistently specified.  And, this is not the
case of incompetence or unprofessionalism.  The Verilog standard is
not alone in these problems--it's just my most recent experience, and
thus the one I am most familiar with at the moment.

I would expect the Python, Scheme, Haskell, and ML standards to be
more precise and less inconsistent, because of their communities.
However, I would be surprised if they were entirely unambiguous and
error-free.  In general, I think most non-trivial attempts at
formalization flounder on our inability.

Therefore, I do not want to exlcude from type systems, things wich are
informal and unsound.  They are simply artifacts of human creation.

-Chris