Unification of Methods and Functions

Mon May 10 19:53:06 EDT 2004

David MacQuigg <dmq at gain.com> wrote in message news:<889t90tdl9o9t25cv5dj6k5rnktuce0jin at 4ax.com>...
> On 8 May 2004 07:07:09 -0700, moughanj at tcd.ie (James Moughan) wrote:
> 
> >David MacQuigg <dmq at gain.com> wrote in message news:<4a9o90pbu122npgf4m2hrgg04g2j0ic6ka at 4ax.com>...
> >> On 7 May 2004 06:31:51 -0700, moughanj at tcd.ie (James Moughan) wrote:
>  <snip>
> >> Also, if you are calling
> >> a function that has an instance variable ( .length ) and no instance
> >> has been set by a prior binding, you would need to set __self__
> >> manually.
> >> __self__ = foo; print FooLen()
> >
> >???!!!??? 
> >
> >This is what I was talking about in my first post, global variables
> >which change depending on where you are in the code... as I understand
> >what you're saying, __self__ will have to be set, then reset when a
> >method is called from within a method and the exits.  And __self__
> >could presumably be changed halfway through a method, too. I'm sorry,
> >I don't see this as being more explicit or simpler.
> 
> The setting of __self__ happens automatically, just like the setting
> of the first argument in a call from an instance.  The user doesn't
> have to worry about it.  

Explicit is better than implicit, especially in getting people to
understand things. :)

> In fact, I can't think of a circumstance
> where the user would need to explicitly set __self__.  Maybe some
> diagnostic code, in which case having available a system variable like
> __self__ is a plus.  You can, without any loss of functionality in a
> normal program, never mention __self__ in an introductory course.  The
> user doesn't need to know what it is called.
> 

I can think of a time where I'd both want to do it and couldn't;

may(str.strip, list_of_str)

See what I mean about it breaking the funtional side of things? 
Without some definition about what's a static method the interpreter
can't resolve things like this.  In practice, map and a whole bunch of
functions would have to be modified to explicitly set __self__.

An implicit self is what leads to the whole
mem_fun(&std::vector<int>::push_back) (or whatever the exact syntax
is, it's too unusable to bother learning) in c++.

> My preference is to give it a name and highlight it with double
> underscores.  To me that makes the discussion more concrete and
> explicit, and builds on concepts already understood.  Don't forget,
> the students already understand global variables at this point in the
> course.  The "magic" of setting a particular global variable to an
> instance is about the same as the magic of inserting that instance as
> a first argument in a function call.  The problem in either syntax is
> not the magic of setting 'self' or '__self__'.
> 
> <snip>
> >> >A method in a class in Python is just like a global function; for a
> >> >global function to operate on an object, it must take it as an
> >> >argument. The prototype syntax would appear to break the above
> >> >example.
> >> 
> >> Global functions have no instance variables, so there is no need for a
> >> special first argument.  A Python method requires a special first
> >> argument (even if it is not used).  
> >
> >But the first argument isn't terribly 'special'; it tells the method
> >what it's working on, just like any other argument.  It's only
> >'special' characteristic is that there's some syntactic sugar to
> >convert foo.getLength() into Foo.getLength(foo).
> 
> The specialness of the first argument isn't much, I agree, but it is
> enough to make the calling sequence different from a normal function
> or a static method.  It is these differences that the new syntax gets
> rid of, thereby enabling the unification of all methods and functions,
> and simplifying the presentation of OOP.  Methods in the new syntax
> are identical to functions (which the students already understand),
> except for the presence of instance variables.
> 
> Instance variables are the one fundamental difference between
> functions and methods, and one that we wish to focus our entire
> attention on in the presentation.  

Except that that distinction doesn't exist in Python, since calling an
instance variable is an explicit call to a member of an instance.  If
you are trying to focus your presentation on something which doesn't
exist in Python then things are naturally going to be awkward.  I
would suggest that it's not a problem with the language, though. :)

> Any new and unnecessary syntactic
> clutter is a distraction, particularly if the new syntax is used in
> some cases (normal methods) but not others (static methods).
>

If you really want to get something like this accepted then the
closest thing which *might* have a chance would be to redefine the
calling sequence of some_class.method() so that it doesn't seek a self
argument so long as ``method'' was defined without one.  I don't think
this would break anyone's code (I may well be wrong of course) since
the sequence isn't currently valid in execution (though IIRC it will
compile to bytecode).

This makes the distinction between static and normal methods about as
simple as it can be; methods are just class methods which operate on
an instance.  You can show how it works in a few lines of an example
(though, currently, it should only take a few more.)

> >
> >Mammal.show() shows characteristics to do with Mammals, *but not
> >specifically Mammal*.  There really is a difference between a class
> >and it's subclasses.
> 
> The Mammal.show() function *is* specific to Mammal.  I think what you
> are saying is that calling Mammal.show() results in a display of
> characteristics of both Mammal and its ancestor Animal.  

No, it's not.  Let me try to be totally clear here; The numMammals
data member contains data not just about Mammal, but also about
instances of it's subclasses.  This is the problem.  The fact that
it's accessed through the show method is really just a detail, though
the presence of of show in other subclasses compounds the problem.

>That is a
> requirement of the problem we are solving, not a result of bad
> programming.  We want to see *all* the characteristics of Mammal,
> including those it inherited from Animal.
> 

You are not solving a problem; that's the problem. :)  If there were a
real programming task then it would be more trivial to show why your
object model is broken.

> Leave out the call to Animal.show() if you don't want to also see the
> ancestor's data.
> 
> >The general-purpose inventory solution would be a better solution.  It
> >doesn't require repetition, it's hard (impossible?) to break and it's
> >generic, allowing it to be used beyond this single class heirarchy.
> >
> >If the inventory function would be best placed outside a class, why do
> >you think it's a good idea to put something with exactly the same
> >functionality inside your classes?
> 
> The proposed Inventory() function is a general function that *would*
> be appropriate outside a class.  The exising class-specific functions
> like Mammal.show() are unique to each class.  I tried to make that
> clear in a short example by giving each data item a different text
> label.  I've now added some unique data to the example just so we can
> get past this stumbling block.  A real program would have a multi-line
> display for each class, and there would be *no way* you could come up
> with some general function to produce that display for any class.
> 

> >
> >Books are always kind of strange, because a book must have a certain
> >number of pages and cover a certain range of content at a certain
> >technical level.  For the level and range of the ORA Learning books,
> >that is going to mean a bit of padding for a simple language like
> >Python.  If I see Learning Python in a bookshop then I'll take a look,
> >though.
> >
> >Regardless, I stand by what I said before - students generally will
> >not read 70 pages on a single topic, especially when it's a relatively
> >minor part of the course.
> 
> Learning Python, 2nd ed. would be appropriate for a one-semester
> course.  My problem is that I have only a fraction of a semester in a
> circuit-design course.  So I don't cover OOP at all.  I would include
> OOP if I could do it with four more hours.  Currently Python is a
> little over the top.  I don't think it is a problem with Lutz's book.
> He covers what he needs to, and at an appropriate pace.
> 

If you can't take it below 70 pages and you only have 4 hours... maybe
it's not such a great idea to try this?  I can't see your students
benefiting from what you're proposing to do, if you have so little
time.

> >> >Learning to program is about 5% how to do something, and 95% when and
> >> >why you should do it.  You seem to be focusing almost exclusively on
> >> >how, which I suspect is why we're all so upset :) you get that way
> >> >when you have to fix the code which eventually results.
> >> 
> >> The OOP presentations I've seen that focus as much as 50% on *why*
> >> generally leave me bored and frustrated.  I feel like screaming --
> >> Stop talking about car parts and show me some nice code examples.  If
> >> it's useful, I'm motivated.  Good style is a separate issue, also best
> >> taught with good examples (and some bad for contrast).
> >> 
> >
> >I'm not talking about car parts.  I'm talking about explaining
> >modularity, complexity, side-effects, classes as data structures etc.
> 
> These are concepts that design engineers understand very well.  I
> wouldn't spend any time teaching them about modularity, but I would
> point out how different program structures facilitate modular design,
> and how syntax can sometimes restrict your ability to modularize as
> you see fit.  Case in point: The need for static methods to put the
> show() functions where we want them.
> 
> >> >OK: "The whole idea of having these structures in any program is
> >> >wrong."
> >> >
> >> >Firstly, the program uses a class hierarchy as a data structure.  That
> >> >isn't what class heirarchies are designed for, and not how they should
> >> >be used IMO. But it's what any bright student will pick up from the
> >> >example.
> >> 
> >> The classes contain both data and functions.  The data is specific to
> >> each class.  I even show an example of where the two-class first
> >> example forced us to put some data at an inappropriate level, but with
> >> a four class hierarchy, we can put each data item right where it
> >> belongs.
> >> 
> >
> >The data is not specific to the class.  It's specific to the class and
> >it's subclasses.  Subclasses should be dependent on the superclass,
> >and generally not the other way around.
> 
> What data are we talking about?  numMammals is specific to Mammal.
> genus is specific to Feline, but *inherited* by instances of a
> subclass like Cat.

The numAnimals etc... data, which is stored in Animals but gets
arbitrarily altered by the actions of subclasses of Animal, and
therefore is not specific to animal; it doesn't represent the state of
the Animal class or of Animal objects, but of a whole bunch of
subclasses of Animal.

> 
> >> Nothing in the Bovine class can affect anything in a Cat.  Feline and
> >> Bovine are independent branches below Mammal.  Adding a Mouse class
> >> anywhere other than in the chain Cat - Feline - Mammal - Animal cannot
> >> affect Cat.  Could you give a specific example?
> >> 
> >
> >Say someone adds a mouse class but doesn't call the constructor for
> >Mammal.  The data produced by mammal and therefore cat is now
> >incorrect, as instances of mouse are not included in your count.  In a
> >real example, anything might be hanging on that variable - so e.g.
> >someone adds some mouse instances and the program crashes with an
> >array index out of bounds (or whatever the Pythonic equivalent is :) )
> >, or maybe we just get bad user output.  This type of behaviour is
> >damn-near impossible to debug in a complex program, because you didn't
> >change anything which could have caused it.  It's caused by what you
> >didn't do.
> 
> These are normal programming errors that can occur in any program, no
> matter how well structured.  I don't see how the specific structure of
> Animals.py encourages these errors.

Imagine if your structure had been implemented as one of the basic
structures of, say, Java.  That is, some static data in the Object
class stores state for all the subclasses of Object.  Now, someone
coming along and innocently creating a class can break Object -
meaning that may break anything with a dependency on Object, which is
the entire system.  So I write a nice GUI widget and bang! by some
bizzare twist it breaks my program somewhere else because of an error
in, say, the StringBuffer class.  This is analagous to what you are
implementing here.

While errors are always going to happen, OOP calls on some conventions
to minimize them.  The most absolutely vital of these is that it's
clear what can break what.  Generally I should never be able to break
a subsystem by breaking it's wrapper; definitely I should never be
able to break a superclass by breaking it's subclass; and I
*certainly* shouldn't be able to break a part of the system by
changing something unconnected to it.  The whole of OOP derives, more
or less directly, from these principles.  Expressions like 'A is a
part/type of B' derive from this philosophy, not the other way around.

Your program breaks with this concept.  It allows an event in Cat to
affect data in Mammal and in Animal, which also has knock-on effects
for every other subclass of these.  Therefore it is bad object
oriented programming.

It takes us back to the days before even structured programming, when
no-one ever had any idea what the effects of altering or adding a
piece of code would be.

It is therefore not a good teaching example. :)

> 
> >> I'm not sure what you mean by "side effects" here.  The show()
> >> function at each level is completely independent of the show()
> >> function at another level.  >
> >
> >But the inventory data isn't independent.  It's affected by classes
> >somewhere else in the heirarchy.  Worse, it's done implicitly.
> 
> The "inventory data" actually consists of independent pieces of data
> from each class. ( numCats is a piece of inventory data from the Cat
> class.)  I'm sorry I just can't follow this.
>

numMammals OTOH is not just a piece of data from one class - it's a
piece of data stored in one class, but which stores data about events
in many different classes, all of which are outside it's scope.

> >> Chaining them together results in a
> >> sequence of calls, and a sequence of outputs that is exactly what we
> >> want.  The nice thing about separating the total "show" functionality
> >> into parts specific to each class is that when we add a class in the
> >> middle, as I did with Feline, inserted between Mammal and Cat, it is
> >> real easy to change the Cat class to accomodate the insertion.
> >> 
> >> Python has a 'super' function to facilitate this kind of chaining.
> >> Michele Simionato's 'prototype.py' module makes 'super' even easier to
> >> use. Instead of having Cat.show() call Mammal.show() I can now just
> >> say super.show() and it will automatically call the show() function
> >> from whatever class is the current parent.  Then when I add a Feline
> >> class between Mammal and Cat, I don't even need to change the
> >> internals of Cat.
> >
> >That's fine - providing you're not using a class heirarchy to store
> >data.  It's not the act of calling a method in a super-class which is
> >a bad idea, it's the way you are making *the numbers outputted* from
> >cat dependent of actions taken *or not taken* in another class
> >*completely outside cat's scope*.
> 
> Seems like this is the way it has to be if you want to increment the
> counts for Cat and all its ancestors whenever you create a new
> instance of Cat.  Again, I'm not understanding the problem you are
> seeing.  You seem to be saying there should be only methods, not data,
> stored in each class.
> 

That's the way it has to be, if you want to write it like that. 
However there is nothing to say that a given problem must use a
certain class structure.  If you come up with a solution like this
then it's near-guaranteed that there was something badly wrong with
the way you modelled the domain.  Either the program shouldn't need to
know the number of instances which ever existed of subclasses of
mammal or else your class structure is wrong.

And, as general rule, you should think carefully before using classes
to store data; that's typically what objects are for.  I used static
data in programs quite a lot before I realised that it too-often bit
me later on.

> >> In one syntax we need special "static methods" to handle calls where a
> >> specific instance is not available, or not appropriate.  In another
> >> syntax we can do the same thing with one universal function form.
>
> To try and get to the bottom of this, I re-wrote the Animals.py
> example, following what I think are your recommendations on moving the
> static methods to module-level functions.  I did not move the data out
> of the classes, because that makes no sense to me at all.
>

*Sigh*  No, I must say that doesn't help much. :-\

As I said, there is something wrong with the whole idea behind it; the
design needs refactoring, not individual lines of code.

Having said that, I'll try to redact the issues as best I can, on the
basis that it may illustrate what I mean.

OK: start with the basics.  We need iterative counting data about the
individual elements of the heirarchy.

The first thing is that we need to factor out the print statements. 
Your back-end data manipulation modules should never have UI elements
in them.  So, whatever form the data manipulation comes in, it should
be abstract.

Secondly, we want to keep the data stored in each class local to that
class.  So, Mammal can store the number of Mammals, if that turns out
to be a good solution, but not the number of it's subclasses.  OTOH we
could remove the data from the classes altogether.

Thirdly, it would probably be nice if we had the ability to implement
the whole thing in multiple independant systems.  Currently the design
only allows one of "whatever-we're-doing" at a time, which is almost
certainly bad.

After a bit of brainstorming this is what I came up with.  It's not a
specific solution to your problem; instead it's a general one.  The
following class may be sub-classed and an entire class-heirarchy can
be placed inside it.  It will then generate automatically the code to
keep a track of and count the elements of the class heirarchy,
returning the data you want at a method call.

This is done with a standard OO tool, the Decorator pattern, but
ramped up with the awesome power of the Python class system. :)

class Collective:
    class base: pass

    def startup(self, coll, root):
        #wrapper class to count creations of classes
        self.root = root
        class wrapper:
            def __init__(self, name, c):
                self.mycount = 0
                self.c = c
                self.name = name
            def __call__(self, *arg):
                tmp = self.c(*arg) 
                self.mycount += 1  
                return self.c(*arg)
        self.wrapper = wrapper
        #replace every class derived from root with a wrapper
        #plus build a table of the
        self.wrap_list = []
        for name, elem in coll.__dict__.items():
            try:
                if issubclass(elem, self.root):
                    tmp = wrapper(name, elem)
                    self.__dict__[name] = tmp
                    self.wrap_list.append(tmp)
            except: pass

    #when subclassing, override this
    #call startup with the class name
    #and the root of the class heirarchy
    def __init__(self):
        self.startup(Collective, self.base)

    #here's the stuff to do the counting
    #this could be much faster with marginally more work
    #exercise for the reader... ;)

    def get_counts(self, klass):
        counts = [ (x.c, (self.get_sub_count(x), x.name)) \
            for x in self.super_classes(klass) ]
        counts.append( (klass.c, (self.get_sub_count(klass),
klass.name)) )
        counts.sort(lambda x, y: issubclass(x[0], y[0]))
        return [x[-1] for x in counts]

    def get_sub_count(self, klass):
        count = klass.mycount
        for sub in self.sub_classes(klass):
            count += sub.mycount
        return count
    def super_classes(self, klass):
        return [x for x in self.wrap_list if issubclass(klass.c, x.c)
\
            and not x.c is klass.c]
    def sub_classes(self, klass):
        return [x for x in self.wrap_list if issubclass(x.c, klass.c)
\
            and not x.c is klass.c]

So we can now do:

class animal_farm(Collective):
    class Animal: pass
    class Mammal(Animal): pass
    class Bovine(Mammal): pass
    class Feline(Mammal): pass
    class Cat(Feline): pass
    def __init__(self):
        self.startup(animal_farm, self.Animal)

a_farm = animal_farm()
cat = a_farm.Cat()
feline = a_farm.Mammal()
print a_farm.get_counts(a_farm.Feline)

>>> [(2, 'Animal'), (2, 'Mammal'), (1, 'Feline')]

The above code is 51 lines with about 10 lines of comments.  For a
project of any size, this is a heck of an investment; I believe it
would take a fairly determined idiot to break the system, and *most
importantly*, they would be able to trace back the cause from the
effect fairly easily.

Admittedly the solution is on the complicated side, though perhaps
someone with more experience than me could simplify things. 
Unfortunately, a certain amount of complexity is just a reflection of
the fact that your demands strain the OO paradigm right to it's limit.
 You could possibly implement the same thing in Java with a Factory
pattern, and perhaps the reflection API.

(Of course I'm none too sure I could do that after many years of
hacking Java vs a few weeks of Python!)

> Take a look at http://ece.arizona.edu/~edatools/Python/Exercises/ and
> let me know if Animals_2b.py is what you had in mind.  If not, can you
> edit it to show me what you mean?
>
> -- Dave