An evangelist's handbook? (Was: Re: Making a better textbook)

Fri Nov 8 17:03:00 EST 2002

Ben Wiedermann wrote:
   ...
> In my experience, the "encapsulation" problem is a significant hurdle.
   ...
> So, let's help folks get over this problem. If we could come up with
> some sound bites to offer potential converts, I think it would be of
> good use. Alex was extrememly helpful in this regard, offering up
> tomes of information, as I recall. I will try to dig them out, if I
> have them, but I think we all would much rather hear it from the
> source (hint, hint). I have heard Guido say one of the reasons for

OK, let's see what I had to say about this subject in our correspondence
of about a year ago, then... the quoted questions in the following are
also by you, Ben.

[ start of edited quotes from our correspondence ]

>    Once we show __private, why would we suggest using the single
> underscore convention, as this would be inconsistent w/ the "principle
> of least privilege" we espouse in our other books?

Hmmm, because that principle is incorrect?

Once upon a time, there was the dream of the Waterfall model of software
development.  First, one would do all Analysis (advanced texts split that
into Domain and Requirements phases, a mini-waterfall of its own), thus
reaching perfect understanding of the domain being modeled and the needs
to be met by one's programs in that domain.  Then, one would proceed with
Design, the outline of every solution aspect not depended upon details of
implementation.  Then, Coding.  And so on.

Maybe that could work for Martians; never having met one, it's hard for
me to say.  30+ years of experience have easily shown it's a disaster
for Earthlings, anyway (what a waste: anybody familiar with the works
of some key thinkers of our grandfathers' generation, such as Ludwig
von Wittgenstein, Alfred Korzibski, and George Santayana, would have
known that from the start -- apparently, however, culture is fragmented
enough today that few Methodologists had ever heard of these guys).

Human beings are evolved to work in chaotic environments, superimposing
MODEST, LOCALIZED amounts of control and supervision upon the sheer but
undirected energy of Chaos to yield an overall process flowing more or
less in the desired direction.  Since a little supervision and control
makes things so much better when compared to total disorder and anarchy,
it's a natural fallacy to think that MUCH MORE control and supervision
will make everything just perfect.  Wrong.  When some sort of threshold
is exceeded, the attempts towards tighter control take on a dynamics of
their own and start absorbing unbounded amount of energy and effort in
a self-perpetuating system.

I don't want to get into politics, but this IS very much what happened
East of the Iron Curtain -- far too much control, indeed so much that
the first serious attempt to release it crumbled the whole system to
not much above feudal/anarchy.  The excesses of control in the West
were fortunately moderate enough that they could to some extent be
(partially) unwound less explosively.  I'm not arguing for anarchy,
mind you -- Somalia is a living example of what anarchy means; just that,
quite apart from ethical issues, from a strictly engineering, effectiveness
standpoint there is a reasonably flat optimal region, well short of
totalitarian states but much more controlled than 'everyone for himself'.

Back to software development, we've witnessed similar "secular trends".
Here, the "anarchy" side, in a small scale, is played back again and
again in every student's initial approach to programming, and most small
software houses' beginnings.  But we also have plenty of huge projects
run under "the stricter, the better" principles to remind us constantly
of what an utter disaster those principles are.

To be specific.  "Principle of least privilege" assumes you somehow know
what the 'right' privilege for a certain operation SHOULD be.  This in
turn takes it for granted that you completed a detailed and workable
design before starting to assign privileges (presumably, before you
started coding).  And things just don't work that way.

"Least privilege" is indispensable for SECURITY issues.  It's anything but
unusual for security issues to have diametrically-opposed needs to any other
kind: if you have to assume an adversary who is actively trying to break
things to his advantage, your correct mindset is very different than it
would be otherwise.  Assuming adversarial situations where none exist is,
clinically speaking, symptom number one of paranoia.  I'm as keen on 
security as anybody else I know (I run OpenBSD, I refuse to run any 
protocol on the net except SSH and other demonstrably-secure ones, etc, 
etc), but it's best to keep such mindsets WELL SEPARATED from ordinary 
software development needs, to keep one's sanity and productivity.

Didactically speaking, you have to show a beginner that utter chaos will
not work optimally, of course.  But proposing over-tight control as the
viable alternative is not feasible.  The only really workable way to
develop large software projects, just as the only really workable way to
run a large business, is a state of controlled chaos.  There is a broad
range of mixes between chaos and control that can work, depending on the
circumstances (as I said above, the optimal region is reasonably flat --
a good thing too, or we'd NEVER get it right), but the range does NOT
extend (by a LONG shot!) all the way to the "Waterfall Model" on one
side (just as it doesn't extend all the way to "Do what thou wilt" on
the other:-).

There's a didactical school that claims, to counteract the students'
natural disposition to anarchy, they should be taught programming
discipline as strict as possible.  That's like saying that all schooling
should be in a Victorian College atmosphere, with starched collars and
canings from the Prefects, for similar reasons.  The Victorians did
achieve marvels of engineering that way, albeit perhaps at excessive
social and psychological costs.  But we don't do things that way any more
in most fields.  We surely don't in the kind of software development that
makes a difference in a competitive field (if you develop non-competitively,
e.g. for the government or in a university setting, your incentives are
clearly different; personally, after a decade working for university,
IBM Research, etc, I chose the rough and tumble playing field of real
life software development -- more money, AND is more fun too:-).

>    What is the benefit of "public by default"?

Minimizing boilerplate.  90% of exposed properties in languages and
object models which don't support that lead to a lot of boilerplate:

    public WhateverType
    getPliffo()
    {
        return m_Pliffo;
    }

or (in COM/C++):

    HRESULT get_Pliffo(WhateverType* pVal) {
        if(!pVal) return E_POINTER;
        *pVal = m_Pliffo;
    }

or (in VB):

    Public Property Get Pliffo() As WhateverType
        Let Pliffo = mvarPliffo
    End Property

and so on, and so forth.  There is no benefit whatsoever in all that
boilerplate.  Maybe, if one has been educated to such strict discipline,
there is a "feel-good factor" -- "See, Mum, I'm being a good boy and
only accessing Pliffo via an accessor method!".  But quite apart from
program efficiency (that's not a major issue AT ALL), all that extra,
useless code is a dead-weight dragging your productivity down -- sure,
some silly "Wizard" can originally generate it, but _people_ are going
to have to maintain and test and read and document it forevermore.  
Sheer dead weight.

Some languages don't give you alternatives: if you originally expose
a "public WhateverType Pliffo" you're sunk -- client code will come to
depend on access idioms such as
    WhateverType myPliffoReference = theNiceObject.Pliffo;
and then you can't turn that publicly exposed data into a call to an
accessor forevermore (without breaking client code, NOT a viable
option in most large software projects).

But that's a language defect!  You don't have to get all the way to
Python to find solutions: e.g., in Eiffel, you can have client code
use that sort of construct AND Pliffo will be just accessed if it's
a data attribute, called if it's a method.  Client code will need
to _recompile_, of course, but then, in Eiffel, you need to 'rebuild
world' every time you sneeze, so that's not considered an issue.

Point is, with this approach you don't HAVE to do "Big Design Up
Front" (http://xp.c2.com/BigDesignUpFront.html for much more).  You
have an attribute that looks like client code might perhaps be
interested in looking at, you just use it.  If while refining the
design it turns out that a call to a getter-method is needed, fine,
you add the getter-method *WITHOUT* breaking any client-code.  When
you do it IF AND WHEN NEEDED, you find out that (depending on various
factors) it's something like 10% of properties that need to be
"synthesized" or otherwise controlled by accessor-methods -- the
others are just fine without any boilerplate.

So, OF COURSE, the default in Python is oriented to supporting that
90% or so of cases where nothing special is needed, NOT the remaining
'special' 10%.  This minimizes programmers' effort over the whole
lifecycle -- which is typically interactive, of course, NEVER a
"Waterfall" (you start with some analysis, then begin design and
need to go back to the analysis because design has given you a
better understanding, then back to design, start coding and you find
the design needs to be tweaked, etc, etc -- AND YET you'd better
*release early, release often* if your software artifact is to have
some relevance to your customers' problems in this frantically
changing world of business today...!!!).

See http://www.agilealliance.org/ for example.  I find it particularly
interesting that, while many of the "conspirators" were early enthusiasts
of (what is now called) Agile development in various guises, others (such
as Fowler, Martin, Mellor) were upper-M Methodologists and gradually saw
the light over the last decade or so of experience.  Of course, these
things take forever to percolate down to universities.  But, the pace
is picking up, particularly as movement back and forth between universities
and "real life" software development is not as glacial as it used to be.

Saturday I was a scheduled speaker at Linuxday, presenting the changes in
Python over the last year-plus, and I noticed with interest that well over
half the attendees were involved with both university/research endeavors
AND commercial projects.  I was also struck once again by how many were
using functional programming (Haskell foremost, but O'Caml seems to be
gaining) for the "academic respectability" in their publication-oriented
endeavors, AND pragmatical programming (Python foremost, but Ruby seems to
be gaining) for the "real word productivity" in those projects where they
actually have to deliver AND maintain working software.  I don't necessarily
LIKE all of these trends (I'll take Haskell over any ML, and Python over
Ruby, any day:-) but I do observe them and think them significant.  Some
of these people teach SW development to freshmen as part of their duties,
and in that case it appears that they almost invariably have to teach more
traditional languages (Pascal foremost, but Java seems to be gaining).  I
don't know how much that depends on these being Italian institutions in
particular, of course (I gather Java has already overtaken Pascal as an
introductory language in the US universities, for example).  But that's
nothing new -- back when _I_ taught in universities, I invariably had to
teach Fortran (since I was teaching at the Engineering school -- would
have been Pascal if I taught at CS!-) even though I was using C for real
world programming and Lisp (actually Scheme) for publications (needing
"academic respectability", you see:-).

[ I'm rephrasing, clarifying, and correcting some errors in the following 
example of signature-based polymorphism as enabled by Python's approach to 
encapsulation, compared to our original correspondence ]

def hypot(somepoint):
    import math
    return math.hypot(somepoint.x, somepoint.y)

this client-code only depends on the following subsignature of 'somepoint': 
exposing .x and .y attributes transparently compliant to 'float'.

In practice this means interchangeability of:

class NaivePoint:
    def __init__(self, x=0.0, y=0.0):
        self.x = x
        self.y = y

and

class UnchangeablePoint:
    def __init__(self, x=0.0, y=0.0):
        self.__dict__['x'] = x
        self.__dict__['y'] = y
    def __setattr__(self, *junk): raise TypeError
    def __hash__(self): return hash(self.x)+hash(self.y)
    def __eq__(self, other): return self.x==other.x and self.y==other.y

and many other kinds of 'point'.  The somepoint.x access may be mapped
down to very different mechanisms (it's self.__dict__['x'] in both of
these, but not in many others) but stays *signature-polymorphic* to the
need of client-code 'hypot'.

So the proper procedure is to choose the kinds of 'point' implementations
that meet the design needs *you currently perceive at this stage in your
project*: you don't need to be Nostradamus and foresee how your project
will look in a year, nor to "wear braces AND a belt" and overdesign
everything *just in case* currently-unforeseen needs MIGHT emerge one day.

Rather, you can privilege SIMPLICITY -- that often-underrated, yet most
crucial of all design virtues.  "A designer know that perfection is reached,
not when there is nothing left to add, but when there is nothing left to
take away" (St. Exupery).  Meet simple design-needs with simple mechanics 
(including NONE AT ALL, from the point of view of the Python coder -- 
rather, the simple needed mechanics are inside the Python interpreter, of 
course), rarer and more complex needs with more advanced mechanics.  No 
"impedence mismatch" between the constraints that arise during a design's 
development, and the remedies needed to meet them.  Bliss!

The single-underline convention (actually enforced where it NEEDS to
be, such as, the Bastion class, "from somemodule import *" where the
module doesn't define an __all__ attribute, etc) is a good example of
"human beings knowing their limit".  If we drop the pretense that a
designer is omniscient, we can see that any determination of a need
or constraint has COSTS -- actual design costs (design time), costs
in terms of missed oportunities, etc.  The parallel is spookily close
to that of asymmetric-information economic exchanges, and I note that
the Nobel Prize in Economics [last] year were granted exactly for work
in the asymmetric-information and signal-exchanging fields, so it would
seem SOME academics are well aware of the issues.

Sometimes (pretty often) you can determine at reasonable 'cost' that
client code may well need to access attributes X, Y, and Z.  Then, you
expose them (without any underlines) -- and later may choose to wrap
them up into accessors, etc, as above.  Sometimes (not often) you can
determine at reasonable cost that client code has no conceivable need
to peek at attributes foo, bar and baz.  Then, you name them __foo,
__bar, and __baz, and generally don't even provide accessors -- those
are the strictly-implementation-related attributes, not part of the
abstract client-visible state of your class at all.

There remains a considerable gray area of attributes that you do not
THINK client-code will need to access, but can't make SURE about without
unreasonable cost.  If you expose an accessor method getEenie that does
some irreversible computation on an internal attribute 'eenie' and
returns the result, for example, it may require unreasonable delays and
cost to ascertain whether it's ever conceivably possible that some client
code may need to access the raw unprocessed value for 'eenie' rather than
always being content with getEenie()'s results.  These are very appropriate
cases to handle with the single-underline convention, meaning: I don't THINK
client-code ever needs to see this, but I can't be SURE -- *at your own
risk* (very strong coupling -- the single-underline DOES mean "internal"!)
you may choose to access this if there's no other way to do your job.

If you've ever programmed to a framework designed by other criteria you
know what I mean... there's this class X in the framework that does ALL
you need BUT doesn't expose one key attribute or method 'meenie' -- it
has it, but as "private".  This recurring horror leads to "copy and paste
programming", forking of framework code and the resulting nightmares in
maintenance (it's a recognized "Antipattern" in the book by that title
by Brown, Malveau, McCormick and Mowbray) -- or even weirder tricks such
as "#define private public" before some "#include" in C++...!!!  The
problem was: the framework designer thought he was omniscient.  The
language URGED him to believe in his omniscience, rather than strongly
discouraging such hubris.  Well, he WASN'T omniscient -- surprise,
surprise, he was a human being (maybe we should subcontract such design
work to martians...?).  So there's cognitive dissonance between the
language-induced mindset, and human biological reality.  Reality wins
out each and every time, but not without much anguish in-between.  (At
the risk of skirting politics again: similarly, the Soviet system urged
central planners to believe their own ominiscience, since everything had
to be planned right from the start -- no built-in flexibility in the
system; I recommend Perrow's "Normal Accidents", a masterpiece of
sociology in my view, about the ills of tight-coupling and assumed
omniscience in accident-prone systems, such as nuclear plants and ships).

Note that I'm only arguing one side of the issue because you're not
advocating the OTHER wrong extreme -- total lack of control, weak
typing, haphazard jumbled-together slapdash 'systems' that don't
deserve that name.  You should see me argue FOR control and against
chaos versus the typical Javascripters of this world:-). (As all people long 
used to occupying a reasonable middle position, I'm also quite used at 
being shot at from extremists of both sides, of course:-).

[ end of edited quotes from our correspondence ]

Alex