OO conventions

Thu Feb 2 09:52:39 EST 2006

Steven D'Aprano <steve at REMOVETHIScyber.com.au> wrote:
> > image = Image( )
> 
> Now you have an "image" object. What is it?
> 
> Answer: it isn't an image at all, not in the plain English sense. (Or if
> it is, it is an arbitrary "default image" picked by the class designer.)

No doubt (presumably some kind of as-yet blank one).

> > image.read( "myfile.jpg" )
> 
> And now, at long last, the image object actually is an image. So why make
> this a two step process? Whatever the Image() initialization does, why
> can't it be done automatically when you read the file?

"Two-step construct" ("2SC") is a reasonably well-known and widely
useful idiom, and it can serve several kinds of purposes, many but not
all of which are tied to persistence.  For example, to some extent it's
used in Entity Enterprise Javabeans (EJBs): with CMP, instead of having
to create a separate object each time you need a new record from the
database, the container can juggle a pool of objects and reuse each of
them to hold different records at different times, so the number of
beans needed is the number of different records you need to have
_simultaneously_.  The "thread pool" concept can be implemented in a
similar way, although this is less common.  Another field where 2SC is
often used is GUIs; in that case the main motivation is "impedence
matching" between a GUI toolkit (often a cross-platform one) and a given
platform's underlying toolkit.

Python offers good opportunities to implement 2SC "under the covers"
thanks to the split between __new__ and __init__: indeed one can get
tricky, since __new__ need not necessarily perform the first step of 2SC
(it might well return a "scrubbed" object from the pool rather than the
new one).  Unpickling may normally use __setstate__ instead of __init__
(after __new__, anyway) -- that's more flexible and often easier to
arrange than going through getinitargs (which doesn't work for newstyle
classes anyway) or getnewargs (which doesn't work for classic ones).

Of course, the way OOP is normally taught, 2SC sounds like a heresy, but
"out in the real world" it does have some advantages (even though
single-step construction remains the normal approach in most cases).

> But if the class has no natural default state, then it makes no sense to
> create an "empty object" with no data, a "non-image image" so to speak.

Hmmm, it might, actually; that's what __new__ normally does for
instances of mutable classes, so that __init__ or __setstate__ or other
different methods yet, as appropriate, can then make the object
"nonempty" in different ways.

> In other words, if you find yourself writing methods like this:
> 
> class Klass:
>     def foo(self):
>         if self.data is None:
>             raise KlassError("Can't foo an uninitialized Klass object.")
>         else:
>             # do something
> 
> then you are just doing pointless make-work to fit a convention that
> doesn't make sense for your class. 

It does appear to be a code smell, yes.  But protocol semantics may
often require such constraints as "it does not make sense to call x.A
unless x.B has been previously called" or viceversa "it's forbidden to
call x.A if x.B has already been called", and such constraints are
generally implemented through something like the above idiom.

Consider a file object, for example: after you call f.close() any other
method call must raise an error.  The simplest, most natural way to
implement this is very similar to what you just coded, using a
self.closed flag -- a case of "two-step destruction", separating
termination ("close") from finalization (destruction proper).

Reusable objects, that can be born "scrubbed", loaded with some data and
used for a while, then "scrubbed" again (with the data getting persisted
off and the object surviving and going into a reuse-pool), typically
need a flag to know if they're scrubbed or not (or, some other data
member may play double duty, e.g. by being None iff an object is
scrubbed).  All normal methods must then raise if called on a scrubbed
object; code for that purpose may be injected by decorators or a custom
metaclass to reduce the boilerplate and code smells.

Note that I'm not defending the OP's contention -- I've seen no reason
in his post making 2SC/2SD desirable.  I'm just addressing the wider
issue... one I probably wouldn't even cover in "OOP 101", but hold for a
later course, e.g. one on the architecture of persistence frameworks.

Alex