What is a type error?

Wed Jun 21 15:02:19 EDT 2006

Chris Uppal wrote:
> David Hopwood wrote:
> 
>> When people talk about "types" being associated with values in a "latently typed"
>> or "dynamically typed" language, they really mean *tag*, not type.
> 
> I don't think that's true.  Maybe /some/ people do confuse the two, but I am
> certainly a counter-example ;-)
> 
> The tag (if any) is part of the runtime machinery (or, if not, then I don't
> understand what you mean by the word), and while that is certainly a reasonably
> approximation to the type of the object/value, it is only an approximation,
> and -- what's more -- is only an approximation to the type as yielded by one
> specific (albeit abstract, maybe even hypothetical) type system.

Yes. I should perhaps have mentioned that people sometimes mean "protocol"
rather than "tag" or "type" (a protocol being the set of messages that an object
can respond to, roughly speaking).

> If I send #someMessage to a proxy object which has not had its referent set
> (and assuming the default value, presumably some variant of nil, does not
> understand #someMessage), then that's just as much a type error as sending
> #someMessage to a variable holding a nil value.

It's an error, certainly. People usually call it a type error. But does that
terminology actually make sense?

Typical programming languages have many kinds of semantic error that can occur
at run-time: null references, array index out of bounds, assertion failures,
failed casts, "message not understood", ArrayStoreExceptions in Java,
arithmetic overflow, divide by zero, etc.

Conventionally, some of these errors are called "type errors" and some are
not. But there seems to be little rhyme or reason to this categorization, as
far as I can see. If in a particular language, both array index bounds errors
and "message not understood" can occur at run-time, then there's no objective
reason to call one a type error and the other not. Both *could* potentially
be caught by a type-based analysis in some cases, and both *are not* caught
by such an analysis in that language.

A more consistent terminology would reserve "type error" for errors that
occur when a typechecking/inference algorithm fails, or when an explicit
type coercion or typecheck fails.

According to this view, the only instances where a run-time error should be
called a "type error" are:

 - a failed cast, or no match for any branch of a 'typecase' construct.
   Here the construct that fails is a coercion of a value to a specific type,
   or a check that it conforms to that type, and so the term "type error"
   makes sense.

 - cases where a typechecking/inference algorithm fails at run-time (e.g.
   in a language with staged compilation, or dynamic loading with link-time
   typechecking).

In other cases, just say "run-time error".

> If I then assign the referent
> of the proxy to some object which does understand #someMessage, then it is not
> a type error to send #someMessage to the proxy.  So the type has changed, but
> nothing in the tag system of the language implementation has changed.

In the terminology I'm suggesting, the object has no type in this language
(assuming we're talking about a Smalltalk-like language without any type system
extensions). So there is no type error, and no inconsistency.

Objects in this language do have protocols, so this situation can be described
as a change to the object's protocol, which changes whether a given message
causes a protocol error.

-- 
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>