classes (was Re: Same again please for OOP)

Sun Dec 24 11:17:09 EST 2000

"Rob Brown-Bayliss" <rob at ZOOstation.cc> wrote in message
news:mailman.977623149.8184.python-list at python.org...
> On 21 Dec 2000 20:49:27 -0800, Richard P. Muller wrote:
> > Can someone recommend a decent introduction to functional programming in
    [snip]
> Like wise,  I am trying to get my head around OOP, but all I can find
> are basic howto make a class in C++

OOP is _much_ more 'central' to Python than FP; thus, a good grasp
of it is proportionally more important.  'Howto' is easy, but you're
right in implying it's not as important as 'using it WELL'.

> I understand how to make classes in both Python and C++.  What I want is

Good; so, the easy part is out of the way.

> help in deciding waht would be a good class and what would not,
> preferably in a pythjon contxt, but a generic context would be wonderful

It can help to think of classes as coming in *two* kinds: one models
directly a type of 'objects' that you have identified while thinking about
the domain for which you want to provide solutions ('domain analysis');
the other one is a 'purely utilitarian' device, which just takes advantage
of some OO language mechanism.

The split is not as sharp as this would seem to imply, but this way of
presenting it may still help.  We could call the former kind 'semantic'
or 'problem-space' classes/objects, and the latter 'utility' or 'solution-
space' ones (many classes in real-world programs exhibit both aspects:
_some_ semantic correspondence to domain analysis, _some_ aspects
that are 'purely utilitarian').

Don't think of 'semantic' classes as 'good', and 'utility' ones as 'bad':
they're _both_ 'good' if, and only if, they help you solve your problems
with clarity and ease!  These are general programming issues, only
tangentially related to classes, but well worth emphasizing...:

    Semantic classes will be 'good' if they model those aspects of your
    problem-domain which you need to handle, to the level of detail and
    'faithfulness' which *you actually need* -- a model/program is always
    a _simplification and abstraction_ of the world (the map is not the
    territory)... eschew 'too-close' modeling for its own sake!

    Utility classes will be 'good' if they come in handy, with just the
right
    amount of 'mechanism' for your actual needs.  *Do the simplest thing
    that can possibly work*, in each case -- ready to 're-factor' things as
    the program evolves, often in an 'exploratory' way, to keep simplicity
    and elegance as you converge towards solutions.

    Don't put in extra complexity 'just in case', because *you ain't gonna
    need it* often enough that the effort won't pay for itself.  (Unless
    you are releasing a _framework_, or at least a module/library for
    general use, rather than a specific program; in this case, balancing
    generality, flexibility/reusability, and simplicity, can be even more
    of a design challenge:-).

And coming to classes (remember classes? it's a post about classes)...
it's hard to do them justice within the confines of one post, but, let me
start with some general considerations, and an example to clarify.

Classes let you *group*, "under one roof" so to speak, aspects of
_state_ ("data") and/or aspects of _behavior_ ("code").  In any given
case, unless you have some interest in effecting such a grouping, it's
quite unlikely that a class is the best approach for you in that case;
if you do have such interest, a class is likely to be 'right' -- unless
some other type of object (an already existing one, which you can
find in the Python library or some other available module/package)
meets some specific needs even better.

A simple case is when you just want to group some data items and
give each of them a name -- no specific 'behavior', just some state
with associate naming.  Tuples and lists let you do the grouping,
but the 'names' of the several items are just numeric indices -- often,
this is not the clearest, most readable approach.  Dictionaries would
let you 'name' items by supplying string keys -- the syntax is not
ideal if constant-strings are all you're using as names.  A class with
no behavior (methods) can easily be the best, simples approach here.

For example, suppose our program will need to model CD's.  A CD
object has several attributes -- depending on our purposes, we may
want to model some appropriate subset of them; for example, a
"title", a "price" in Euros, and a sequence of "tracks" (each of which
has its own attributes, say a "title" and a "duration" in seconds).

If we used tuples or lists, we might conventionally agree that the
first item is the title, the second one the price, the third one the
sequence of tracks (each with two items, title then duration).

For example, a Czech CD I bought this summer could be modeled
according to these conventions as (truncating the list of tracks!):
acd = ["Hity Papa Offenbacha", 5.00,
    [["Barkarola", 264], ["Piscu Heleny", 95], ["Pariduv Soud", 247]]]
or similarly with tuples instead of lists.  The problem with this is
that any given task, such as, say, "find the title of the longest
track", becomes not-very-readable and hardly-maintainable...:

def titleLongest(acd):
    tracks = acd[2]
    longest = 0
    for i in range(1,len(tracks)):
        if tracks[i][1] > tracks[longest][1]:
            longest = i
    return tracks[longest][0]

All of those '[2]', '[1]', '[0]' should feel unnerving -- strong hints
that our chosen representation is sub-optimal; and further, our
functions become *strongly* coupled to the physical way we've
chosen to lay out our data!  If we refactor the data representation,
we'll have LOTS of work ahead, redoing all of our functions -- what
an *unpleasant* prospect!

A dictionary may be a bit better...:

acd = {'title':"Hity Papa Offenbacha", 'price':5.00, 'tracks':
    [ {'title':"Barkarola", 'duration':264},
       {'title':"Piscu Heleny", 'duration':95},
       {'title':"Pariduv Soud", 'duration':247} ] }

The repetitiveness of the construction is unpleasant, but let's
see if the code using this representation is more readable...:

def titleLongest(acd):
    tracks = acd['tracks']
    longest = 0
    for i in range(1,len(tracks)):
        if tracks[i]['duration'] > tracks[longest]['duration']:
            longest = i
    return tracks[longest]['title']

Well, yes; at least, this code DOES suggest a little bit of what
we're doing.  But, could classes help...?

class CD:
    pass

class Track:
    pass

acd = CD()
acd.title = "Hity Papa Offenbacha"
acd.price = 5.00
acd.tracks = []

atrack = Track()
atrack.title = "Barkarola"
atrack.duration = 264
acd.tracks.append(atrack)

atrack = Track()
atrack.title = "Piscu Heleny"
atrack.duration = 95
acd.tracks.append(atrack)

atrack = Track()
atrack.title = "Pariduv Soud"
atrack.duration = 247
acd.tracks.append(atrack)

The construction is *quite* unpleasantly wordy.  But what about
the using-code?

def titleLongest(acd):
    tracks = acd.tracks
    longest = 0
    for i in range(1,len(tracks)):
        if tracks[i].duration > tracks[longest].duration
            longest = i
    return tracks[longest].title

Ah, yes, that's better... actually, thanks to the clarity afforded
by the named-attribute syntax, a sightly different approach
suggests itself:

def titleLongest(acd):
    longest = acd.tracks[0]
    for track in acd.tracks[1:]:
        if track.duration > longest.duration
            longest = track
    return longest.title

By looping on the tracks directly, rather than on an index to
them, we achieve a further worthwhile simplification.  It's true
that this approach *could* be taken even with the first of our
proposed representations, but...:

def titleLongest(acd):
    longest = acd[2][0]
    for track in acd[2][1:]:
        if track[1] > longest[1]
            longest = track
    return longest[0]

...the lack of names really kills us here!  This is hardly better
than the form we started out with, and _is_ rather opaque.

Pity about the wordy initialization code... anything we can do
about *that* one...?

Why, sure!  How do you like, for example:

acd = CD(title="Hity Papa Offenbacha", price=5.00, tracks=Tracks(
    ("Barkarola", 264), ("Piscu Heleny", 95), ("Pariduv Soud", 247) ))

Doesn't this look *just right* -- even better than the first attempt,
which had no naming at all, or the second one, which had names
all over the place, because we have names *right where we want to*...?

So how do we get this beauteous init'ing -- is it painful...?

Naah!  The CD class is too easy for words...:

class CD:
    def __init__(self, **kwds):
        for name,value in kwds.items():
             setattr(self,name,value)

Ain't it pretty?  The __init__ method is automatically called by
Python when we call the class object -- the first argument, which
we always name 'self', is the instance object we're building, then
come the others that were passed in the call to the class-object.

Here, we're using the '**somedictionary' notation to receive
whatever arguments were passed with named-form, and then
the handy idioms of 'iterate over dictionary keys/values' and
'set an object attribute' to smoothly, seamlessly translate them
into attributes of the CD object.

For the Tracks idea, as we don't *want* to make 'a sequence of
tracks' into a class-instance of its own (it's one of those cases
where an existing object type -- here, a list -- matches our needs,
so, let's not multiplicate entitites without necessity), we'll just
use a 'factory function' (fancy name for a function which builds
and returns a new object!-).

class Track:
    def __init__(self, title, duration):
        self.title = title
        self.duration = duration

def Tracks(*tracks):
    return [Track(title,duration) for title,duration in tracks]

The list-comprehension plucks and unpacks each 2-items tuple
from the sequence of arguments, and hands the results to the
Track classobject-call, which in turns uses them in __init__ to
initialize the object's attributes.  That's just one way of putting
things, an equally good alternative being:

def Tracks(*tracks):
    return [Track(*track) for track in tracks]

which may be preferable as it has no coupling to what and how
many arguments Track's __init__ wants -- it just passes them on.

So we're smoothly led from 'grouping state' to 'grouping (some)
behavior' -- at least, behavior connected to initialization.  What
could be another example of 'behavioral grouping'...?

What if we often need to refer to "a CD's duration" -- meaning,
the total of its tracks' durations.  Would using a class help here?

Why, sure!  One possibility is to compute 'total duration' in the
initialization, and it's probably the simplest one:

class CD:
    def __init__(self, **kwds):
        for name,value in kwds.items():
             setattr(self,name,value)
        if hasattr(self,'tracks') and not hasattr(self,'duration'):
            duration = 0
            for track in self.tracks: duration += track.duration
            self.duration = duration

As long as nobody changes the .tracks attribute after the
CD object's initialization, we're fine.

What if we're afraid that client-code *might* change the
tracks attribute -- how are we going to keep .duration up
to date then...?

Ah, then we DO need to make .tracks into an object instance
of a special-purpose class... just so we CAN 'hook into' any
modification and report it back to the containing CD object!

I'm not going to show this in detail, as it may get a little bit
involved, but you can look at this sort of things as one first
example of 'utility class' -- although it _does_ have a semantic
aspect, a list-of-tracks would normally suffice, but not if we
have special 'utilitarian' needs related to possible changes to
the sequence, or to some items of the sequence.

Yeah, sigh, doing justice to such considerations *would*
require a book, not just a post.  But the good books I know
about O-O are about statically-typed languages, where it is
all so different... pity, because Python is such a *great*
language in which to learn O-O approaches!  My best advice,
based on books I know, would be 'Structure and Interpretation
of Computer Programs' by Abelson and Sussman, based on
Scheme -- curiously, the same one I suggested for the
purpose of exploring FP (it *is* a great book!-).  Scheme is
not really OO, but then, it *IS* something of a "language
construction kit", and you can build decent OO on top of it
(not as handy as having it built-in, like in Python, but quite
OK to understand its underpinnings!-).  I opine you're NOT
likely to get great understanding, out of a book based on a
statically typed language [particularly one where you never
get to choose between OO and other approaches, such as
Eiffel or Java, where ONLY OO is on offer], about how to best
choose and implement OO in a multi-paradigm dynamically
typed language (such as Python)!

Alex