What is a type error?

Mon Jul 17 12:27:06 EDT 2006

Joachim Durchholz wrote:
> Marshall schrieb:
> > Joachim Durchholz wrote:
> >> Marshall schrieb:
> >>> Good point. Perhaps I should have said "relational algebra +
> >>> variables with assignment." It is interesting to consider
> >>> assignment vs. the more restricted update operators: insert,
> >>> update, delete.
> >> Actually I see it the other way round: assignment is strictly less
> >> powerful than DML since it doesn't allow creating or destroying
> >> variables, while UPDATE does cover assignment to fields.
> >
> > Oh, my.
> >
> > Well, for all table variables T, there exists some pair of
> > values v and v', such that we can transition the value of
> > T from v to v' via assignment, but not by any single
> > insert, update or delete.
>
> I fail to see an example that would support such a claim.

variable T : unary relation of int
T = { 1, 2, 3 };  // initialization
T := { 4, 5 };   // assignment

The above transition of the value of T cannot be be
done by any one single insert, update or delete.
Two would suffice, however. (In fact, any assignement
can be modeled at a full delete followed by an insert
of the new value.)

> On the other hand, UPDATE can assign any value to any field of any
> record,

Yes.

> so it's doing exactly what an assignment does.

No. The variable is the table, not the records. Relations are not
arrays.
Records are not lvalues.

> INSERT/DELETE can
> create resp. destroy records, which is what new and delete operators
> would do.
>
> I must really be missing the point.
>
>  > Further, it is my understanding
> > that your claim of row identity *depends* on the restricted
> > nature of DML; if the only mutator operation is assignment,
> > then there is definitely no record identity.
>
> Again, I fail to connect.
>
> I and others have given aliasing examples that use just SELECT and UPDATE.

Sure, but update's semantics are defined in a per-record way,
which is consistent with record identity. Assignment's isn't.

> >> (However, it's usually new+assignment+delete vs. INSERT+UPDATE+DELETE,
> >> at which point there is not much of a difference.)
> >
> > I am not sure what this means. Delete can be expressed in
> > terms of assignment. (As can insert and update.)
>
> INSERT cannot be expressed in terms of assignment. INSERT creates a new
> record; there's no way that assignment in a language like C can create a
> new data structure!
> The same goes for DELETE.

I was intendind to be discussing a hypothetical relation-based
language,
so while I generally agree with you statement about C, I don't see
how it applies.

>  > (Assignment can also be expressed in terms of insert and delete.)
>
> Agreed.
>
> I also realize that this makes it a bit more difficult to nail down the
> nature of identity in a database.

I would propose that variables have identity, and values do not.
In part this is via the supplied definition of identity, in which, when
you change one thing, if something else changes as well, they
share identity. Since one cannot change values, they necessarily
lack identity.

> It's certainly not storage location:
> if you DELETE a record and then INSERT it with the same values, it may
> be allocated somewhere entirely else, and our intuition would say it's
> not "the same" (i.e. not identical).

Well, it would depend on how our intuition had been primed. If it
was via implementation techniques in low level languages, we
might reach a different conclusion than if our intuition was primed
via logical models and relation theory.

> (In a system with OID, it would
> even be impossible to recreate such a record, since it would have a
> different OID. I'm not sure whether this makes OID systems better or
> worse at preserving identity, but that's just a side track.)

OIDs are something of a kludge, and they break set semantics.

> Since intuition gives me ambivalent results here, I'll go back to my
> semiformal definition (and take the opportunity to make it a bit more
> precise):
> Two path expressions (lvalues, ...) are aliases if and only if the
> referred-to values compare equal, and if they stay equal after applying
> any operation to the referred-to value through either of the path
> expressions.

Alas, this leaves me confused. I don't see how a path expression
(in this case, SELECT ... WHERE) can be an l-value. You cannot
apply imperative operations to the result. (Also I think the use
of equality here is too narrow; it is only necessary to show that
two things both change, not that they change in the same way.)

I was under the impression you agred that "i+2" was not
a "path expression". If our hypothetical language lacks record
identity, then I would say that any query is simply an expression
that returns a value, as in "i+2."

> In the context of SQL, this means that identity isn't the location where
> the data is stored. It's also not the values stored in the record -
> these may change, including key data. SQL record identity is local, it
> can be defined from one operation to the next, but there is no such
> thing as a global identity that one can memorize and look up years
> later, without looking at the intermediate states of the store.

Yes, however all of this depends on record identity.

> It's a gross concept, now that I think about it. Well, or at least
> rather alien for us programmers, who are used to taking the address of a
> variable to get a tangible identity that will stay stable over time.

It is certaily alien if one is not used to relation semantics, which
is the default case.

> On the other hand, variable addresses as tangible identities don't hold
> much water anyway.
> Imagine data that's written out to disk at program end, and read back
> in. Further imagine that while the data is read into main memory,
> there's a mechanism that redirects all further reads and writes to the
> file into the read-in copy in memory, i.e. whenever any program changes
> the data, all other programs see the change, too.
> Alternatively, think about software agents that move from machine to
> machine, carrying their data with them. They might be communicating with
> each other, so they need some means of establishing identity
> ("addressing") the memory buffers that they use for communication.

These are exactly why content-based addressing is so important.
Location addressing depends on an address space, and this
concept does not distribute well.

>  > I don't know what "new" would be in a value-semantics, relational
> > world.
>
> It would be INSERT.
>
> Um, my idea of "value semantics" is associated with immutable values.
> SQL with INSERT/DELETE/UPDATE certainly doesn't match that definition.

Sorry, I was vague. Compare, in OOP, the difference between a value
object and a "regular" object.

> So by my definition, SQL doesn't have value semantics, by your
> definition, it would have value semantics but updates which are enough
> to create aliasing problems, so I'm not sure what point you're making
> here...
>
> >> Filters are just like array indexing: both select a subset of variables
> >> from a collection.
> >
> > I can't agree with this wording. A filter produces a collection
> > value from a collection value. I don't see how variables
> > enter in to it.
>
> A collection can consist of values or variables.
>
> And yes, I do think that WHERE is a selection over a bunch of variables
> - you can update records after all, so they are variables! They don't
> have a name, at least none which is guaranteed to be constant over their
> lifetime, but they can be mutated!

We seem to have slipped back from the hypothetical relation language
with only assignement back to SQL.

>  > One can filter either a collection constant or
> > a collection variable; if one speaks of filtering a collection
> > variable, on is really speaking of filtering the collection value
> > that the variable currently contains; filtering is not an operation
> > on the variable as such, the way the "address of" operator is.
> > Note you can't update the result of a filter.
>
> If that's your definition of a filter, then WHERE is not a filter,
> simple as that.

Fair enough! Can you correct my definition of filter, though?
I am still unaware of the difference.

> >> In SQL, you select a subset of a table, in a
> >> programming language, you select a subset of an array.
> >>
> >> (The SQL selection mechanism is far more flexible in the kinds of
> >> filtering you can apply, while array indexing allows filtering just by
> >> ordinal position. However, the relevant point is that both select things
> >> that can be updated.)
> >
> > When you have been saying "select things that can be updated"
> > I have been assuming you meant that one can derive values
> > from variables, and that some other operation can update that
> > variable, causing the expression, if re-evaluated, to produce
> > a different value.
>
> That's what I meant.
>
>  > However the phrase also suggests that
> > you mean that the *result* of the select can *itself* be
> > updated.
>
> The "that" in "things that can be updated" refers to the selected
> things. I'm not sure how this "that" could be interpreted to refer to
> the selection as a whole (is my understanding of English really that bad?)

Your English is extraordinary. I could easily conclude that you
were born in Boston and educated at Harvard, and either have
Germanic ancestry or have simply adopted a Germanic name
out of whimsy. If English is not your native tongue, there is no
way to detect it.

Argh, late for dropping off my daughter at school now. Must run.
Sorry if I was a bit unclear due to being rushed.

Marshall