What is a type error?

Sun Jul 16 12:01:54 EDT 2006

"Marshall" <marshall.spight at gmail.com> wrote:
> In general, I feel that "records" are not the right conceptual
> level to think about.

Unfortunately, they are the right level.  Actually,the right level
might even be lower, the fields within a record, but that's moving
even farther away from the direction you wish to go.  The right level
is the lowest level at which the programmer can see and manipulate
values.  Thus, since a programmer can see and manipulate fields within
records in SQL that is the level we need to concern ourselves with
when talking about aliasing in SQL.  In other words, since a SELECT
(or UPDATE) statement can talk about specific fields of specific
records within the table (not directly, but reliably, which I'll
explain in a moment) then those are the "primitive objects" in SQL
that can be aliased.

What I meant by "not directly, but reliably":

If you have a fixed database, and you do two selects which specify the
same sets of fields to be selected and the same keys to select records
with, one expects the two selects to return the same values. You do
not necessarily expect the records to be returned in the same order,
but you do expect the same set of records to be returned and the
records to have the same values in the same fields.  Further, if you
do an update, you expect certain fields of certain records to change
(and be reflected in subsequent selects).

However, therein lies the rub, if you do a select on some records, and
then an update that changes those records, the records you have from
the select have either changed or show outdated values.  If there is
some way to refer to the records you first selected before the update,
then you have an aliasing problem, but maybe one can't do that.  One
could also have an aliasing problem, if one were allowed to do two
updates simultaneously, so that one update could changed records in
the middle of the other changing the records.  However, I don't know
if that's allowed in the relational algebra either. (I think I finally
see your point, and more on that later.)

> I do not see that they have
> any identity outside of their value. We can uniquely identify
> any particular record via a key, but a table may have more
> than one key, and an update may change the values of one
> key but not another. So it is not possible in general to
> definitely and uniquely assign a mapping from each record
> of a table after an update to each record of the table before
> the update, and if you can't do that, then where
> is the record identity?

I don't know if your point of having two keys within one table amkes a
difference.  If one can uniquely identify a record, then it has
identity.  The fact that there is another mapping where one may not be
able to uniquely identify the record is not necessarily relevant.

> But what you descrbe is certainly *not* possible in the
> relational algebra; alas that SQL doesn't hew closer
> to it. Would you agree? Would you also agree that
> if a language *did* adhere to relation semantics,
> then relation elements would *not* have identity?
> (Under such a language, one could not have two
> equal records, for example.)

Ok, here I think I will explain my understanding of your point.  If we
treat each select statement and each update statements as a separate
program, i.e. we can't save records from one select statement to a
later one (and I think in the relational algebra one cannot), then
each single (individual) select statement or update statement is a
functional program.  That is we can view a single select statement as
doing a fold over some incoming data, but it cannot change the data.
Similarly, we can view an update statement as creating a new copy of
the table with mutations in it, but the update statement can only
refer to the original immutable table that was input.  (Well, that's
atleast how I recall the relational algebra.)  It is the second point,
about the semantics of the update statement which is important.  There
are no aliasing issues because in the relational algebra one cannot
refer to the mutated values in an update statement, every reference in
an update statement is refering to the unmutated input (again, that's
my recollection).  (Note, if my recollection is off, and one can refer
to the mutated records in an update statement, perhaps they can
explain why aliasing is not an issue in it.)

Now for some minor points:

> For example, we might ask in C, if we update a[i], will
> the value of b[j] change? And we can't say, because
> we don't know if there is aliasing. But in Fortran, we
> can say that it won't, because they are definitely
> different variables. 

Unfortunately, your statement about FORTRAN is not true, if a and b
are parameters to a subroutine (or members of a common block, or
equivalenced, or ...), then an update of a[i] might change b[j].

This is in fact, an important point, it is in subroutines (and
subroutine like things), where we have local names for things
(bindings) that are actually names for things which may or may not be
distinct, that aliasing is the primary issue.  I don't recall the
relational algebra having subroutines, and even if it did they would
be unimportant, given that one could not refer within a single update
statement to the records that a subroutine changed (since one cannot
refer in an update statement to any record which has been changed, see
the caveats above).

Joachim Durchholz wrote: > >
> > Indeed.
> > Though the number of parameter passing mechanisms isn't that large
> > anyway. Off the top of my head, I could recite just three (by value, by
> > reference, by name aka lazy), the rest are just combinations with other
> > concepts (in/out/through, most notably) or a mapping to implementation
> > details (by reference vs. "pointer by value" in C++, for example).

Marshall again:
> Fair enough. I could only remember three, but I was sure someone
> else could name more. :-)

There actual are some more, but very rare, for example there was call-
by-text-string, which is sort of like call-by-name, but allows a
parameter to reach into the calling routine and mess with local
variables therein.  Most of the rare ones have really terrible
semantics, and are perhaps best forgotten except to keep future
implementors from choosing them.  For example, call-by-text-string is
really easy to implement in a naive interpreter that reparses each
statement at every invocation, but but it exposes the implementation
properties so blatantly that writing a different interpretor is nearly
impossible.

If I recall correctly, there are also some call-by- methods that take
into account networks and addressing space issues--call-by-reference
doesn't work if one cannot see the thing being referenced.  Of coruse,
one must then ask whether one considers "remote-procedure-call" and/or
message-passing to be the same thing as a local call.

One last nit, Call-by-value is actually call-by-copy-in-copy-out when
generalized.  

There are some principles that one can use to organize the parameter
passing methods.  However, the space is much richer than people
commonly know about (and that is actually a good thing, that most
people aren't aware of the richness, simplicity is good).

-Chris