Is there a "Large Scale Python Software Design" ?

Fri Oct 22 09:13:09 EDT 2004

Peter Hansen <peter at engcorp.com> wrote:
    ...
> And you've reemphasized my point.  "Testing" is not test-driven
> development.  In fact, test-driven development is about *design*,
> not just about testing.  The two are related, but definitely not

Hmmm... the way I see it, it's about how one (or, better!!!, two: pair
programming is a GREAT idea) proceeds to _implement_ a design.  The fact
that the package (or other kind of component) I'm writing will offer a
class Foo with a no-parameters constructor, methods A, B, and C with
parameters thus and thus, etc, has hopefully been determined and agreed
beforehand -- the people who now write that package, and other teams who
write other code using the package, have presumably met and haggled
about it and coded one or more mock-up versions of the package (or used
other lesser way to clarify the specs), so that code depending on the
package can be tested-and-coded (with the mock-ups) even while the
package itself is being tested and coded...

I know Kent Beck's book shows a much more 'exploratory' kind of TDD, but
in a large project that would lead to its own deleterious equivalent of
"waterfall": everything must proceed bottom-up because no mid-level
components can be coded until the low-level components are done, and so
forth.  I don't think that's acceptable in this form, in general.

_Design_, in large scale software development, is mainly about sketching
reasonable boundaries between components to allow testing and coding to
proceed with full advantage of the fact that the team of developers is
of substantial size.  Indeed there may be several sub-teams, or even
several full-fledged teams, though the latter situation (over, say,
around 20 developers, assuming colocation... that's the very maximum you
could possibly, sensibly cram into a single team) suddenly begets its
own sociopolitical AND technical problems... I have not been in that
situation with Python, yet, only with Fortran, C, C++.

Of course when both components are nominally done there comes
integration testing time, and the sparks fly;-).  Designing integration
tests ahead of time, at the same time as the mock-ups, _would_ help, but
somehow or other it never really seems to happen (I'd love hearing
real-life experiences from somebody who DID manage to make it happen,
btw; maybe I'd learn how to "facilitate" its happening, too!-).

If and when you're lucky there's some 'customer' (in the
extreme-programming sense) busy writing _acceptance_ tests for the
system, making the user-stories concrete, at the same time -- but good
acceptance tests are NOT the same thing as good integration tests... you
need both kinds (at least if the system is truly large).  Anyway, at
integration-testing time and/or acceptance-testing time, there is
typically at least one iteration where the mock-ups/specs are updated to
take into account of what we've learned while implementing the component
and consuming it, and it's back to the pair-programming parts with TDD.

But these fascinating AND crucially important issues are about far wider
concerns than "static type testing" can help with.  "Design by
Contract", where the mock-up includes preconditions and postconditions
and invariants, can be helpful, but such DbC thingies are to be checked
at runtime, anyway (they're great in pinpointing more problems during
integration testing, etc, etc, they don't _substitute_ for testing
though, they simply amplify its effectiveness, which is good enough).

TDD may surely help defining the internal structures and algorithms
within a single component, of course, if that's what you mean by design.

But (with decent partitioning) a single component should be _at most_ a
few thousand lines of Python code -- very offhand I'd say no more than
2/3 thousand lines of functional code, as much again of unit tests, and
a generous addition of comments, docstrings, and blank lines, to a total
line count, as "wc *.py" gives it of no more than 6k, 7k tops.  If it's
bigger, there are problems -- docstrings are trying to become user
reference manuals, comments are summarizing whole books on data
structures and algorithms rather than giving the URLs to them, or, most
likely, there was a mispartitioning and this poor "component" is being
asked to do far too much, way more than one cohesive set of
responsibilities which need to be well-coordinated.  Time to call an
emergency team meeting and repartition a little bit.

Hmmm, this has relatively little to do with static type checks, but is
extremely relevant to the 'Subject' - indeed, it's one (or two;-) of the
many sets of issues that (IMHO) need to be addressed in a book or course
on large scale software development (not JUST design, mind you: the
process whereby the design is defined, how it's changed during the
development, and how it is implemented by the various components, is at
least as important as the technical characteristics that the design
itself, seen as a finished piece of work, should exhibit...).

> the same thing, and eliminating TDD with a wave of a hand intended
> to poo-poo mere testing is to miss the point.  Once someone has

Absolutely -- I do fully agree with you on this.

> tried TDD, they are unlikely to lump it in with simple "unit testing"
> as it has other properties that aren't obvious on the surface.

It sure beats "retrofitting" unit tests post facto.  But I'm not sure
what properties you have in mind here; care to expand?

> > What is the biggest system you have built with python personally?  I'm
> > happy to be proven wrong, but honestly, the most enthusiastic "testing
> > solves all my problem" people I have seen haven't worked on anything
> > "large" -- and my definition of large agrees with Alex's; over 100
> > kloc, more than a handful of developers.
> 
> The topic of the thread was large projects with _large teams_,
> I thought, so I won't focus on my personal work.  The team I

Yeah, I think the intended meaning was "in which you personally have
taken part" rather than any implication of "single-handedly" -- the
mention of "more than a handful of developers" being key.

BTW, a team with 5-6 colocated developers, plus support personnel for
GUI painting/design (in the graphical sense) and/or webpage/style ditto,
system administration, documentation, acceptance testing, etc, can build
QUITE a large system, if the team dynamics and the skills of the people
involved are right.  So the "more of a handful of developers" doesn't
seem a necessary part of the definition of "large scale software
development".  5-6 full-time developers already require the kind of
coordination and component design (partitioning) that 10-12 will,
there's no big jump there in my experience.  The jump does come when you
double again (or can't have colocation, even with just, say, 6 people),
because that's when the one team _must_ split into cooperating teams
(again in my experience: I _have_ seen -- thankfully not participated in
-- alleged single "teams" of 50 or more people, but I am not sure they
actually even managed to deploy any working code... whatever language
we're talking about, matters little, as here we're clashing with a
biological characteristic of human beings, probably connected to the
prehistorical size of optimal hunting bands or something!-).

> was leading worked on code that, if I recall, was somewhat over
> 100,000 lines of Python code including tests.  I don't recall
> whether that number was the largest piece, or combining several
> separate applications which ran together but in a distributed
> system...  I think there were close to 20 man years in the main
> bit.

I think this qualifies as large, assuming the separate applications had
to cooperate reasonably closely (i.e. acting as "components", even
though maybe in separate processes and "only" touching on the same
database or set of files or whatever).

> (And remembering that 1 line of Python code corresponds to
> some larger number, maybe five or ten, of C code, that should
> qualify it as a large project by many definitions.)

I agree.  There IS a persistent idea that codebase size is all that
really matters, so 100,000 lines of code are just as difficult to
develop and maintain whether they're assembly, C, or Python.  I think
this common idea overstates the case a bit (and even Capers Jones
agrees, though he tries to do so by distinguishing coding from other
development activities, which isn't _quite_ my motivation).

Part of why I recommend having no more than 2-3 k lines of functional
code in a single Python component (plus about as much again of unit
test, etc, to no more than 6-7k lines including blanks/cmts/docstrings,
as above explained) is that those (say) 2.5k lines can do a hell of a
_LOT_ of stuff, quite comparable in my experience to 10k-15k lines of
C++ or Java (and more than that of C, of course) -- on the order of
magnitude of 200-300 function points at least.  If you go much above
that, keeping the characteristics of cohesion and coherence becomes way
too hard.  So, a 100kSLOC Python project will have at least about 40
components, and 10k or so FPs, where a Java project with the same line
count might typically have 2-3K FPs spread into, say, 15 components.
(I'm thinking of functional effective lines, net of testing, comments,
docstrings, or any kind of code instrumentation for debug/profile/&c).

In other words: the Python project is _way_ bigger in functionality, and
therefore in needed/opportune internal granularity, than the Java one
with the same SLOCs.  Jones' estimates for Java's language level; are
"10 to 20 function points per staff month".  He doesn't estimate Python,
but if I'm right and the language level (FP/SLOC) is about 4-5 times
Java's, nevertheless according to Jones' tables that, per se, would only
push productivity to "30 to 50 function points per staff month" -- a
factor of less than three.

(( Of course, for both Java and Python, and also C, C++, etc,
superimposed on all of these productivity estimates there _is_ the
possibility of reuse of the huge libraries of code available for these
languages -- most of all for Python, who's well supplied with tools and
tecnologies to leech^H^H^H^H^H ahem, I mean, fruitfully reuse good
existing libraries almost regardless of what language the libraries were
originally made _for_.  A reuse-oriented culture, particularly now that
so many good libraries are available under open-source terms, CAN in my
opinion easily boost overall productivity, in terms of functionality
delivered and deployed, by _AT LEAST_ a factor of 2 in any of these
languages.  But this, in a way, is a different issue... ))

> > So people don't get me wrong: I love python.  Most of my programming
> > friends call me "the python zealot" behind my back.  I just don't think
> > it's the right tool for every problem.
> 
> Neither do I.  The above project also involved some C and
> some assembly, plus some Javascript and possibly something else
> I've forgotten by now.  We just made efforts to use Python *as
> much as possible* and it paid off.

Hmmmm, yes, assembly may be unusual these days, but C extensions are
very common, pyrex ones rightfully becoming more so, Javascript quite
typical when you need to serve webpages that are richly interactive
without requiring round-trips to the server, and we shouldn't ignore the
role of XSLT and friends too.  And what large project is without some
SQL?  Exceedingly few, I think.

But Python can fruitfully sit in the center and easily amount to 80% or
90% of the codebase even in projects needing all of these other
technologies for specialized purposes...

> > Specifically, in my experience, statically-typed languages make it much
> > easier to say "okay, I'm fixing a bug in Class.Foo; here's all the
> > places where it's used."  This lets me see how Foo is actually used --
> > in a perfect world, Foo's documentation is precise and up to date, but
> > I haven't worked anywhere that this was always the case -- which lets
> > me make my fix with a reasonable chance of not breaking anything.
> > Compile-time type checking increases those chances.  Unit tests
> > increase that further, but relying on unit tests as your first and only
> > line of defense is suboptimal when there are better options.
> 
> But what if you already had tests which allowed you to do exactly
> the thing you describe?  Is there a need for "better options"
> at that point?  Are they really better?  When I do TDD, I can
> *trivially* catch all the cases where Class.Foo is used
> because they are all exercised by the tests.  Furthermore, I

Absolutely.  The main role of the unit tests is exactly to define all
the use cases of Foo and the expected results of such uses.  If the unit
tests are decent, and with TDD they _will_ be, they suffice to let you
change Foo's internals without breaking Foo's uses (refactoring).

One thing unit tests can't do, and Foo's documentation cannot either, is
to find out if any of Foo's abilities are _totally unused_ -- for that,
you do need to scour the codebase.  Trimming functionality that had
originally seemed necessary and was negotiated to be included, but turns
out to be overdesigned, is not a crucial activity (it's sure not worth
distorting a language to make such trimming faster), but it's a nice
periodic exercise.  Anything that's excised from the code, and tests,
and internal docs, is so much less to maintain in the future.  Of
course, you can't do that anyway if you "publish" components for outside
consumption by code you can't check or control; and even in a single
team situation you still need to check with others if they weren't
planning to use just tomorrow one of the capabilities you'd like to
remove today.

One interesting possibility is to instrument Foo to record all the uses
it gets, tracing them into a file or wherever, then run the system
through its paces -- all the unit tests of every component that depends
(even indirectly) on the one containing Foo, and all the existing
integration and acceptance tests.  A profiler can typically do it for
you, in any language, when used in "code coverage" mode.  If any part of
Foo's code has 0 coverage _except_ possibly by Foo's own unit tests,
that _does_ tell you something.  And it need have nothing to do with
typing, of course.  One case I recall from many years ago was something
like:

int foo(int x, int y) {
    if (x<23) { /* small fast case, get out of the way quick */
        /* a dozen lines of code for the small fast case */
    } else { /* the real thing, get to work! */
        /* six dozen lines of code for the real thing */
    }
}

where the whole 'real thing' _never_ happened to be exercised.  With a
little checking around, changing this to return an error code if x>=23
(it should never have happened, just as it never did) was a really nice
_snip_ (excised code goes to a vault and a pointer to it is left in a
comment here, of course, in case it's needed again in the future; but
meanwhile it doesn't need to get maintained or tested, maybe for years,
maybe forever...).

> can catch real bugs, not just typos and simple things involving
> using the wrong type.  A superset of the bugs your statically
> typed language tools are letting you catch.  But obviously
> I'm rehashing the argument, and one which has been discussed
> here many times, so I should let it go.

You surely won't get any disagreement from me about this -- and I don't
believe any static-typing enthusiast argues _against_ unit tests and
TDD, they just want BOTH, even though we claim (and C++/Java guru Robert
Martin himself strongly claims) that TDD and systematic unit testing
really makes static-typing rather redundant... you keep paying all the
price for that language feature, don't get much benefit in return.

> >>Having experience with both approaches, and choosing one over
> >>the other, gives one greater credibility than having experience
> >>with just one approach, yet clinging to it...
> > 
> > You are incorrect if you assume I am unfamiliar with python.  
> 
> I assumed no such thing, just that you were unfamiliar with
> large projects in Python and yet were advising the OP on its
> suitability in that realm.  You're bright and experienced, and
> your comments have substance, but until you've actually
> participated in a large project with Python and seen it fail
> gloriously *because it was not statically typed*, I wouldn't
> put much weight on your comments in this area if I were the
> OP.  That's all I was saying...

I would gladly accept as relevant experiences with other languages that
are strictly but dynamically typed, such as, say, Smalltalk or Ruby or
Erlang, if project failures (or even, short of failures, severe
productivity hits) can indeed be traced, despite proper TDD/unit
testing, to the lack of statically checked typing.  I try to keep up
with the relevant literature (can't possibly manage for _all_ of it of
course) and don't recall any such experiences, but of course I may well
have missed some, particularly since not everything gets published.

Alex