Multiple constructors (part 2)

Thu Jan 18 06:19:36 EST 2001

"Daniel Klein" <DanielK at jBASE.com> wrote in message
news:D%o96.645$KD3.265469 at typhoon.aracnet.com...
> OK, I've researched and read the posts (and the FAQ) on how to simulate
> multiple constructors. Alex Martelli provided the most robust answer by
> testing the number of *args and executing the appropriate section of code.
> However in another post he says
>
> """
> This 'overloads' the constructors based on how many arguments
> are given -- how elegant (and how Pythonic...!) this is, being
> of course *debatable*.  Overloading on *types* would be less
> elegant *and* less Pythonic, though you could easily extend
> this idea to do it -- I would discourage it even more strongly.
> """

I think _this_ part of my response is what makes it "robust":-).

In other words, the best answer to this sort of queries is most
often: yes, you can do it (and here's how), but there are better
approaches (and here they are).  I didn't get _fully_ into the
"here's how" and "here they are" parts, admittedly.

> However, what I need to do is _exactly_ what is being discouraged, that is
> creating 3 constructors both with 2 arguments where the second argument of
> each is a different type. The real kicker is that in one of the
> constructors, I need to check the __class__ of the object to make sure the
> method is receiving the proper object. I have no problem coding this if
this
> is the way it has to be but if there are more acceptable (and Pythonic)
ways
> to do this, I would appreciate some pointers.

Why do you think you *NEED* to distinguish your processing based on
an argument's type, or class?  More likely, what you want to know
about an argument (to determine different processing in different
cases) is *how it BEHAVES* -- which you can't do by testing types,
or classes; rather, you may use hasattr or try/except to find out.

By focusing on behavior, rather than on type-identity, you make
life easier for the client-code programmer: he or she can then
make polymorphic use of your components with any instance that
implements the needed behavior, which IS all your code needs.

The idea of 'overloading' -- having a callable with a single given
name that maps to multiple internal callables depending on various
conditions -- is also related to polymorphism; the only good reason
to supply a single callable that maps to multiple ones is to let
client-code use that single callable polymorphically if need be.

It's generally a good idea to ALSO expose the multiple callables
directly -- don't make client-code programmers go through strange
contortions to make sure the 'right' overload is invoked in the
end; when their need are non-polymorphic, let them explicitly
state as much in the simplest possible way.  This does not sit in
well with 'constructors' -- which is why *factory functions* tend
to be preferable whenever any application need of some richness
and complexity is involved (factory callables that are not
functions are also OK, but meet a rarer, yet-more-involved need).

In Python (just like in VB, and other languages with the concept
of explicitly-named and default-valued arguments) you have another
stylistic alternative to 'multiple callables': one callable can
be explicitly used for several related purposes by supplying
different named-arguments.  This can easily be overdone (and VB
supplies a LOT of examples of this style being overused!-) but,
used with taste and moderation, it can be very helpful too.

Let's try to see one typical example.  We want to expose a class
Munger, whose instances need to be initialized with 'a lot of
data to be munged'; the 'lot of data' could be a file, a string,
or an instance of our own class DataBuffer which provides the
data-access features Munger instances need -- in fact, when we
are given a file or string, we construct a DataBuffer ourselves
and hold that anyway.

The 'overload' style might be:

class Munger:
    def __init__(self, data):
        name = type(data).__name__
        if name=='instance':
            name = data.__class__.__name__
        method = getattr(self, '_init_'+name)
        method(data)
    def _init_string(self, data):
        self.data = DataBuffer(data)
    def _init_file(self, data):
        self.data = DataBuffer(data)
    def _init_DataBuffer(self, data):
        self.data = data

Now, this IS intended as a 'bad example', and maybe I've overdone
the badness, but I hope at least it IS clear why doing it this
way would be heavily sub-optimal.  This does not exploit in any
way the polymorphism of *DataBuffer*'s own constructor, *AND*
it seriously inhibits polymorphism capabilities of client-code
(except via such tricks as naming a *class* as, say, 'string'...!).

It's clearly simpler to frame this as 'a Munger needs to be
passed a DataBuffer, or anything a DataBuffer may be built from':

class Munger:
    def __init__(self, data):
        if not isinstance(data, DataBuffer):
            data = DataBuffer(data)
        self.data = data

at least, we have some simplicity here.  Polymorphism is still
not optimal, though; if client-code wants to *mimic* a data
buffer, it needs to inherit from our DataBuffer class, even
if it's not using any of its implementation, just to satisfy
our isinstance check.  At the very least, one would 'split out'
from DataBuffer the interface and implementation parts:

class IDataBuffer:
    def rewind(self):
        raise TypeError, "must override .rewind method"
    def nextBytes(self, N):
        raise TypeError, "must override .nextBytes method"
    def pushBack(self, bytes):
        raise TypeError, "must override .pushBack method"

etc, with class DataBuffer inheriting from this (and providing
the needed overrides, of course) and the isinstance check
done against IDataBuffer.  Not very Pythonic, but workable
if there are a LOT of DataBuffer methods we need -- checking
for each of them separately may become more trouble than
it's worth.

DataBuffer's own 'overloading' ("am I being initialized from
a file or from a string?") needs to be handled.  Once again,
it would be *seriously* wrong to code:

class DataBuffer(IDataBuffer):
    def __init__(self, data):
        name = type(data).__name__
        if name=='instance':
            name = data.__class__.__name__
        method = getattr(self, '_init_'+name)
        method(data)
    def _init_string(self, data):
        self.data = data
        self.index = 0
    def _init_file(self, data):
        self.data = data.read()
        self.index = 0
    # etc etc

because it horribily inhibits client-code's polymorphism.
Here, all we need from a 'file object' is a .read method
we can call without arguments to supply our data -- so
why not code that directly...:

class DataBuffer(IDataBuffer):
    def __init__(self, data):
        try: self.data = data.read()
        except AttributeError: self.data=data
        self.index = 0
    # etc etc

this is MUCH simpler, of course.  One may add some
tests at initialization to ensure the resulting data
are usable for our purposes, but it's generally no
big problem if the error (if any) comes at first
usage rather than at initialization.

An alternative architecture is also worth considering.  DOES
client code REALLY NEED the polymorphism implicit in passing
a Munger constructor, either a file(-like) object, or a
string(-like) one, with very different implied semantics
regarding how one gets data from said object?  Python libraries
give us counterexamples of that -- file-like objects and
string-like ones are generally passed through *separate*
methods; there's no real polymorphism opportunity there!

So...:

class DataBuffer(IDataBuffer):
    def __init__(self, data):
        self.data = data
        self.index = 0
    # etc etc

class Munger:
    def __init__(self, data):
        self.data = data
    # etc etc

def FileMunger(afile):
    return Munger(DataBuffer(afile.read()))

def StringMunger(astring):
    return Munger(DataBuffer(astring))

There, isn't THIS better?  Two non-overloaded factory
functions, maximal simplicity in the constructors proper.

Client-code knows what it IS using to construct the
Munger and doesn't need the polymorphism -- it will
be clearer and more explicit and readable if it calls
FileMunger or StringMunger appropriately, and only
uses Munger's ctor directly for those cases where it
needs to reuse some existing IDataBuffer instance.

If very occasionally a polymorphic use may benefit
the client-code author, we can add a further factory
function for that purpose only:

def AnyMunger(mystery):
    if isinstance(mystery, IDataBuffer):
        return Munger(mystery)
    else:
        try: return FileMunger(mystery)
        except AttributeError: return StringMunger(mystery)

However, one doesn't go around just adding such stuff
*unless its appropriateness is clearly shown by some
specific use-case/scenario* -- "you ain't gonna need it"
is a GREAT design principle:-) [XP rules...!-)].

Now, this IS of course a toy-level example, but I hope
that just because of this it may show up the issues more
clearly -- and perhaps convince you to rethink your
design in simpler and more usable ways.

Alex