GC in OO (was Re: Python 1.6 The balanced language)

Alex Martelli aleaxit at yahoo.com
Mon Sep 4 19:13:07 EDT 2000


"Darren New" <dnew at san.rr.com> wrote in message
news:39B3DE12.6BDD2961 at san.rr.com...
> Alex Martelli wrote:
> > > Depends on what you mean by "safe". If by "safe" you mean "according
to
> > > specificiation of the problem", then yes. If by "safe" you mean what
> > > language designers mean, which is "defined behavior according to the
> > > language specification", then no.
> >
> > "safe" as in "will not make the program do things it shouldn't".
>
> That's begging the question. Now you have to define "should". My payroll

That's the function of analysis and design: "defining the right `should`"
for
each program.  Preconditions, postconditions, invariants.

> program "shouldn't" pay out a million bucks to the janitor, but it
certainly
> will if that's what I typed in the source code.

And that would probably be a more serious defect than a program
crash (for most programs).


> > Whether
> > a program crashes because the language specification is violated, or
> > because a semantic constraint (precondition) is, it's still a crash
>
> Err, no. Throwing an exception is not a crash, any more than running
> "shutdown" crashes a UNIX computer. On the other hand...

You can set things up (depending on the environment) so that a
program survives 'crashes' (ideally, with a rollback of whatever
atomic transactions are in progress, of course); whether you do
that through exceptions, signals, setjmp/longjmp, or other
mechanisms yet, is, in good part, an issue of environment.

For example, say that you have a significance loss through floating
point underflow.  Will that crash your program?  Should it?  IEEE
has something to say about that in its standards, but few programming
language designers seem to care enough to specify that -- maybe
because they want to keep the option open, of running on machines
which have hardware than can't implement IEEE standards fast
enough to make it sensible (many supercomputers used to be like
that, for example); only if a language fully specifies the 'virtual
machine' it can run on, can it dare specifying such fine points of
floating-point behavior (and, of course, such VM specs will simply
make it practically unusable for serious floating-point work on any
machine whose FP work with significant difference).

So, you end up (at best) invoking system-calls that nicely ask the
FP hardware to interrupt, or not-interrupt, on certain conditions;
and, on several kinds of modern hardware, you had better accept
nondeterministic timing of error determination if you want decent
performance (else, you can't fully exploit multiple pipelines).

So, what happens when (after your program has gone on to
execute anything between, say, 2 and 10 further instructions),
the hardware FP unit finds out -- and tells you through some
interrupt -- that, oh, by the way, some FP precondition was
violated a short while ago, so you had an underflow, overflow,
domain error, whatever?

You can catch this as an (asynchronous) exception in some
environments (e.g., NT's Structured-Exception-Handling setup),
as a 'signal' in some others -- call it as you wish: it's still as
much of a 'crash' as, say, a wild pointer (something which you
can catch in basically the same ways -- 'segmentation violation'
& similar exceptions/signals -- though on present-day hardware
it tends to have less nondeterminism in timing/detection).

If it's imperative that your program stay up and running, you
may keep the process up (in environment-dependent ways,
and not on all environments), but do consider that you will
also need lots of transactional support, ability to rollback the
current transaction, recover state from the checkpoint before
it, &tc.

I reiterate my claim that it makes no significant difference
whether the cause of the crash (whether it's caught or not)
lies in violating a language spec (that isn't enforced before
runtime), or in a precondition which the language did not
specify (e.g. because it's inherent in a certain FP unit, but
not all FP hardware can enforce it, and the language's specs
chose to let the language be run decently on a variety of
hardware).  It seems to me the burden of proof is on you
to show otherwise.  What's the difference, e.g., _what_
asynchronous exception (or signal, etc) I've received (and
maybe caught) -- FP-underflow, or segment-violation, or
something else again...?


> >(and
> > not quite as bad as a program NOT crashing but corrupting persistent
> > data in subtly-wrong ways...).
>
> ... that would be a "crash".

No, it would be a violation of a program's semantic specs.  For
example, paying a million dollars to somebody who is not due
it is a (not-so-subtle) corruption of persistent data -- but the
program keeps running, and the error is hidden until it's found
out by other means.  Crashes are less serious than this, because
they're normally apparent (even if caught, any halfway decent
program will at least keep an audit trail of such emergencies...!).


> > > In other words, if you change an object to which I'm holding a
reference,
> > > I'll still do what you told me to. If you GC an object to which I'm
still
> > > holding a reference, you can't define the semantics of that operation.
> > It's
> >
> > But neither can I change the semantics of what happens in other
> > cases, e.g. a divide by zero; whether that raises a catchable exception
> > or not depends on the language (and possibly on the implementation,
> > if the language leaves it as implementation-defined).
>
> Well, I'd say that one of the basic tennants of OO programming is that the
> only operations that change the private data of a class are the methods of
> that class, yes?

No, that's only one style of OO programming.  In some OO styles, data
never changes (O'Haskell, for example: it's both OO and pure functional,
so all data are immutable -- let's *please* forget monads for the moment,
as they're quite something else again).  In others, encapsulation is not
enforced.  In particular, it's very common for object-persistence
frameworks (and similar doodads) to be allowed special licence to
persist _and depersist_ data of any object whatsoever -- and while
some setups let classes with special needs participate in the protocol,
it's most common for such classes to have no such participation and
rely on the default mechanics for save/restore (and possibly for
transaction support, including rollback and commit).

Anyway, it's a popular style in OO to have this encapsulation, whether
enforced or by convention: an object 'owns' some data (state) and
that data is only accessed by going through the object.


> Or at least that you need to operate on a reference of an
> instance in order to change the value of the instance, which definition
> allows for things like Python and Java and such.

You need to obtain an accessor from the instance-reference (if
encapsulation is to hold), although, depending on language and
circumstances, you may be allowed to hold that accessor for a
while before using it, to pass it to others who'll use it, etc.
(E.g., a "delegate" in typical Hejilsberg's languages such as C#;
a very similar bound-method in Python; an iterator over a
collection; &c).

> Now, if sending a "divide" message to the integer "5" causes it to corrupt
> the "Oracle_system_table_interface" object, I'd say you've got a bit of a
> data hiding problem.

You have a problem, maybe, but it need not be one of data-hiding.  Of
course, if anything is "corrupted", then, by definition of "corruption",
it's a problem.  But consider the unreliable-timing of (asynchronous)
floating-point exceptions: if one gets triggered, after you have executed
a few more (you don't know how many more...) operations, some
of those operations (who have now left the system in an invalid
overall state) may have been in methods of whatever other instance.
This has little to do with data hiding...


> > Will you therefore argue that a language, to be "object-oriented", must
> > give certain fixed semantics to divide-by-zero...?
>
> Yes. That semantics, however, is allowed to be throwing an exception, or
> "returns a random number", or "exits the program", just as rand() and
exit()
> are allowed to do. What it's not allowed to do is "changes private memory
of
> other instances" or "modifies control flow in ways disallowed by the
> language specification."

Then, by your (absurd, I surmise) definition, no "object oriented
language" can ever run effectively on an architecture whose floating
point exceptions are asynchronous and non-deterministic -- unless
it forces execution to terminate on any such exception (how could
allowable control flow be specified under such circumstances?).

What about other exceptions yet -- must a language fully specify
behaviour on floating-point underflow, overflow, etc etc, to be
"object oriented" by your definition?  Will it ever run decently on
significant floating-point problems on customers' machines...?

What if somebody (or some accident) functionally removes a
piece of writable disk on which your program is currently paging --
must your very hypothetical language, to gain the coveted title
of "object-oriented", fully specify THAT, too, and NOT allow
any recovery...?  There is really no difference between such
occurrences and other kinds of asynchronous exceptions...

*REAL* languages, fully including any object-oriented ones, are
fortunately designed (most of the time) with some input from
people who know a little about such issues.  As a consequence,
they _explicitly_ allow *UNDEFINED BEHAVIOUR* under these
and similar kinds of circumstances: specific implementations of
the language are allowed to do *WHATEVER IS MOST USEFUL*
in the context of their environment when such-and-such things
happen.  This lets effective, speedy implementations exist _as
much as feasible given hardware, OS, etc_.  Even Java, who
opts for full specification most of the time (cutting itself off
from hardware that won't satisfy that spec), has a bit more
flexibility than you allow -- as it is, after all, a real-world language.


> > Note that this ties back to the issue about whether I can mutate a
> > reference after giving it out -- i.e. whether the receiving class will
> > hold to that reference (and if so whether it will rely on it to stay
> > as it was) or make its own copies.  That's part of the receiving
> > class's semantics; I need to know the contract obligations of the
> > receiving class to perform my part of the bargain.  It's NOT an issue
> > of having to know the _implementation_ of the receiving class,
>
> Well, yes, kind of. It is.

No.  It's about the *CONTRACT*: the SEMANTIC SPECIFICATION of
the receiving class.  Is it ALLOWED to make a copy of the object
I'm passing to it?  Is it REQUIRED to?  This specification will of
course constrain both what I can do with it, and what authors of
code that aims to satisfy that specification will be able to do in
their turn.

Get it?  _Specification_ constrains _implementation possibilities_.
And, specification constrains *how the so-specified class can be
used*.  *NOT* the other way around!  The implementation does
NOT constrain anything.

Specification always constrainst both uses and implementation.
The specific bit of specification "is the receiving class allowed,
and/or required, to make its own copy of this object to which
I'm passing it a reference" is absolutely no different from any
other bit out of the zillions that make up classes' contracts.

Say I pass to the receiving class a string, which it can use as
a key on a relational database.  Is the receiving class's
semantic such, that it must immediately perform the SELECT
(and cache its results) because the database is allowed to
change afterwards and the class is specified to snapshot it
*NOW*?  Or, is it such that the key must be re-used afresh
each and every time a SELECT is needed, because, again, the
DB is allowed to change and the class is specified to use the
most-current data each time some other method is caled in
the future?  Or again, is there more latitude in the class's
spec (at the cost of more constraint on its client-code), e.g.
a specification that the DB won't change so the implementor
of the class can freely choose whether to cache or select
anew each time?

The situation is exactly analogous to the one that is troubling
you so much, where what is being passed is a reference to an
object rather than a key into a database.  And in both cases,
it's absolutely consistent with OO tenets, and indeed it is
inevitable, to focus on the specifications of the receiving
class -- what's it allowed to do, what's it required to.


> If I write a class that you pass a bitmap and I
> draw it on the screen (say), I can no longer subclass that class to cache
> that bitmap. I think Kay's point was that without some sort of automatic
> memory control, you have to keep track of who is using which object, which
> means you're leaking information about the instance variables. *You* call
it
> fundamental semantics, but it isn't always fundamental semantics.
Sometimes
> it's just implementation details.

It's not an implementation detail, whether and when some
portion of a database is allowed to change.  And what changes
are or aren't allowed to in-memory objects is _exactly_
analogous.  If you pass references, or database keys, to
objects which are _allowed_ to keep them and count on them
afterwards, then you must (to keep your part of the contract)
assume that the keeping has taken place and avoid performing
changes that aren't contractually allowed.  And I strongly doubt
Mr Kay could have failed to see this obvious point, which has
nothing to do with 'leaking information'.


> > but one of design-by-contract... and please don't tell me, or B.M.,
> > that dbc is not OO:-).
>
> Given that Eiffel is safe and has GC, this is just a strawman. Unless
you'd
> like to show me how to declare "argument A does not get garbage collected
> before you call procedure B" as a postcondition. Or would that be a
> precondition? Probably a class invariant. But you still can't do it, as
> Eiffel doesn't have any sort of temporal mechanisms in the assertions.
> ("old" doesn't count for a number of reasons.)

Nor can Eiffel let me specify "and these tables/views in the database
will not be changed after I've called useThisKey(), not until I've
later called stopUsingTheKey()".  Or any other such issues.  Inevitably,
those parts of the contracts will be in docs and comments.  And so?


> Look at it this way: What does Eiffel do when you run off the end of an
> array? It raises an exception. Perfectly defined, no memory corrupted. The
> fact that some compilers let you bypass this check for performance reasons
> is irrelevant for a number of reasons.

The existence of this check is also irrelevant, because it does not
address "and the array's contents will not change [or: are allowed
to change, but you must snapshot them right now; or: are allowed
to change, and you must use their current values when later
called]".  So, this check will catch some errors for you at runtime
(as will the similar 'check' performed by Python), but has really
nothing to do with the semantic specificaton of a class, or whether
a language is, or isn't, object-oriented.

Many things in life are very nice, but having or lacking them is an
orthogonal issue to "being object oriented".  Array-bounds checking
is a good example of a good thing that doesn't determine whether
you're OO or not; garbage-collection plays a VERY similar role.


> If you free the reference, and later attempts to use it cause a
> "ReferenceFreedException" (for example), then I think I'd agree. But that
> isn't how C or C++ (for example) works.

It causes *undefined behaviour*, because of performance
considerations: see above.


> > It's sure nice if all errors are specified to cause some
> > trappable exception rather than an outright crash, but
> > surely that's not a part of "object-orientedness"
>
> See above.  Of course there are lots of parts of OOness.  If everything is
> an object, and the only way to interact with objects is via messages (as
in
> Smalltalk, say), then yes, it is impossible to violate the semantics of
the

Does Smalltalk specify *synchronous* exceptions for every possible
occurrence?  I have my doubts (but don't know enough of modern
Smalltalk to be certain).


> But the *contract* is based on the *implementation*. I believe that you'll

***NOOOOO***!!!!  This is utterly absurd.  Do you REALLY program
that way -- you choose an implementation, then you build your specs
around it?!  *SHUDDER*.

> find *some* cases where the only reason that information is exposed in the
> contract is because of your implementation choice.

A *WELL-DESIGNED* specification will carefully leave things
*EXPLICITLY UNDEFINED* where this is the best compromise
between freedom for the client-code and for the implementer.


> Take, for example, a class that puts a bitmap up on the screen. Assume,
> also, that bitmaps are immutable, but freeable. If my class does not save
a
> copy of the bitmap, you can't free it as long as I may need to refresh the
> screen. If my class *does* save a copy, you don't need to worry about what
> you do with your copy. If there's GC, you don't need to worry either way.
> Hence, the lack of GC causes you to have to expose implementation details
in
> your contract, contrary to how ADTs work. ;-)

The "immutable" tidbit here is the key: you've rigged the dice by
specifying that there is exactly one mutation possible, 'freeing'.

I know of no language that works that way; functional languages,
with immutable data, invariably have no concept of 'freeing'.

So, say that data ARE mutable, as in almost all of today's languages.

Then, GC buys you absolutely nothing here: you have to specify
whether client-code is, or isn't, allowed to later mutate the bitmap
to some other thing, and will the screen be kept & refreshed OK
in this case.  Is the language 'not OO' if it has _any_ mutable data,
i.e. is O'Haskell almost the ONLY OO language in the world?

If it's compatible with OO to have to specify this when data are
mutable, it surely doesn't suddenly become incompatible with OO
when you decide the only mutation is 'freeing'; just as it would
not if the only mutation was 'rotation', say.  It's exactly the same
kind of issue, after all; whether all mutations are allowed, or
just some specific subset, is quite clearly secondary.


> > I don't mind the class-ic approach to OO, but prototype-based OO
> > would also have its advantages, and it seems peculiar to rule it
> > out as being OO at all because it lacks 'some sort of class concept'.
>
> I would say prototypes are "some sort of class concept". I.e., prototypes

The authors of prototype-based languages fight a hard uphill battle
to take classes away, and here comes Darren New and forces them
to have them anyway -- whether they think they have them, or not.
Because, *WHATEVER* mechanism they choose (even if totally,
utterly, completely different from classes), he's determined to
CALL them 'some sort of class' anyway.

By these Humpty-Dumpty-ish approaches, then ANY language could
equally well be said to have classes.  You say that Dartmouth Basic
doesn't?  Ah, but I've decided to call these here line numbers "some
sort of class concept", see -- after all, I _have_ seen a class called
'English 101', so that '101' number IS class-related, innit?

> form the same basis for extensions and inheritance and such that classes
do.
> Not in the same way, of course.

_Definitely_ "not in the same way"...


> > Dynamic dispatch, polymorphism, is what I'd call *the* one and
> > only real discriminant of 'OO'.
>
> I think you would need to define this more strictly, or you'll wind up
with
> BASIC's "ON GOTO" statement being object-oriented. :-)

It's not polymorphic.  You _can_ of course manually build OO structures
in languages that don't support OO directly (that's what a compiler or
interpreter for any OO language will do under the covers), and that
does not make the language 'oo' -- but it may make your _programs_
oo, even in a non-oo language.

The *language* is OO if and only if it offers SOME kind of language
support for dynamic dispatch (polymorphism) -- it's not, if you have
to build that support yourself.  A powerful enough language may
let you build a _library_ for the purpose (e.g. see the various
CLOS-like libraries for Scheme); then, the language-plus-library
will be OO, though the bare language wouldn't.


> Anyway, to slide this a little back on target, I just started learning
> Python, and it really is one of the nicer languages I've seen for doing
what
> it does. :-) Now I need to go buy more books and such.

Happy quest -- it's sure more fun than debating what is/isn't OO..


Alex






More information about the Python-list mailing list