What is a type error?

Fri Jul 14 13:21:43 EDT 2006

Andreas Rossberg wrote:
> OK, this is interesting. I don't know Hermes, is this sort of like a 
> dynamically checked equivalent of linear or uniqueness typing?

I'm not sure what linear or uniqueness typing is. It's typestate, and if 
I remember correctly the papers I read 10 years ago, the folks at 
TJWatson that invented Hermes also invented the concept of typestate. 
They at least claim to have coined the term.

It's essentially a dataflow analysis that allows you to do the same 
sorts of things that "don't read variables that may not yet have been 
assigned to", except that you could annotate that variables change to 
the state of "uninitialized" after they've already been initialized.

> Mh, but if I understand correctly, this seems to require performing a 
> deep copy - which is well-known to be problematic, and particularly 
> breaks all kinds of abstractions.

Um, no, because there are no aliases. There's only one name for any 
given value, so there's no "deep copy" problems. A deep copy and a 
shallow copy are the same thing. And there are types of values you can 
assign but not copy, such as callmessages (which are problematic to copy 
for the same reason a stack frame would be problematic to copy).

I believe, internally, that there were cases where the copy was "deep" 
and cases where it was "shallow", depending on the surrounding code. 
Making a copy of a table and passing it to another process had to be a 
"deep" copy (given that a column could contain another table, for 
example). Making a copy of a table and using it for read-only purposes 
in the same process would likely make a shallow copy of the table. 
Iterating over a table and making changes during the iteration made a 
copy-on-write subtable, then merged it back into the original table when 
it was done the loop, since the high-level semantic definition of 
looping over a table is that you iterate over a copy of the table.

The only thing close to aliases are references to some other process's 
input ports (i.e., multiple client-side sockets connected to a 
server-side socket). If you want to share data (such as a file system or 
program library), you put the data in a table in a process, and you hand 
out client-side connections to the process. Mostly, you'd define such 
connections as accepting a data value (for the file contents) with the 
parameter being undefined upon return from the call, and the file name 
as being read-only, for example. If you wanted to store the file, you 
could just pass a pointer to its data (in the implementation). If you 
wanted a copy of it, you would either copy it and pass the pointer, or 
you'd pass the pointer with a flag indicating it's copy-on-write, or you 
could pass the pointer and have the caller copy it at some point before 
returning, depending on what the caller did with it. The semantics were 
high-level with the intent to allow the compiler lots of leeway in 
implementation, not unlike SQL.

> Not to mention the issue with 
> uninitialized variables that I would expect occuring all over the place. 

The typestate tracks this, and prevents you from using uninitialized 
variables. If you do a read (say, from a socket) and it throws an "end 
of data" exception, the compiler prevents you from using the buffer you 
just tried but failed to read.

Indeed, that's the very point of it all. By doing this, you can run 
"untrusted" code in the same address space as trusted code, and be 
assured that the compiler will prevent the untrusted code from messing 
up the trusted code. The predecessor of Hermes (NIL) was designed to let 
IBM's customers write efficient networking code and emulations and such 
that ran in IBM's routers, without the need for expensive (in 
performance or money) hardware yet with the safety that they couldn't 
screw up IBM's code and hence cause customer service problems.

> So unless I'm misunderstanding something, this feels like trading one 
> evil for an even greater one.

In truth, it was pretty annoying. But more because you wound up having 
to write extensive declarations and compile the declarations before 
compiling the code that implements them and such. That you didn't get to 
use uninitialized variables was a relatively minor thing, especially 
given that many languages nowadays complain about uninitialized 
variables, dead code, etc. But for lots of types of programs, it let you 
do all kinds of things with a good assurance that they'd work safely and 
efficiently. It was really a language for writing operating systems in, 
when you get right down to it.

-- 
   Darren New / San Diego, CA, USA (PST)
     This octopus isn't tasty. Too many
     tentacles, not enough chops.