[stdlib-sig] Breaking out the stdlib

Tue Sep 15 12:29:17 CEST 2009

When I said that I wanted things in the stdlibrary to 'never change
until hell freezes over' I didn't mean that the std library shouldn't
evolve by adding new modules.  What I meant was that the old,
perfectly good working modules who haven't needed changes for years
and years and years don't get summarily yanked out because people
'don't want to maintain them any more'.  There is a case to be made
that code that refers to hardware that hasn't been used for years and
years can be removed, and perhaps other things which are similarily
unused at this point in time can be removed, but nobody is making that
claim about getopt and optparse.

Indeed, right now, if you write code that parses options you are
most likely doing one of 4 things:

a) using getopt
b) using optparse
c) using some third party module for parsing that is currently under
   development
d) writing your own option parser in python

Now, for years I have been telling people that d) was by far the poorest
way of doing things.  Don't reinvent the wheel, and all that.  But on
the day that you yank getopt and optparse out of the library, you will
have made this terrible advice to give my poor hospital clients.  And
you will have made all the time I spent ripping out hand-made option
parsers and replacing them with getopt instead wasted time that I billed
my clients for.  Instead of giving them a robust solution that will
require little or no maintainance, I have given them a zillion scripts,
_all of which will suddenly break one day_.  And since the pure python
parsers that I never got around to converting will probably still be
working, the people who did d) will get the last laugh.

So -- real use case time -- the hospitals where I have done a lot of
work have some really weird equipment.  And using them costs real
money, uses up real lab supplies, and conceivably can ruin a sample
that you will find it inconvenient or impossible to replace.

There are all sorts of weird dependencies among the various options
you can use to operate the devices.  If you have specified option K,
you may not also specify option M or N, and if you have specified
option L, you must also specify option Q, or option R and option S.

Thus the whole exercise of writing a script to use the equipment
becomes as matter of validating the options you selected are complete
and non-contradictory, and then going out and exercising the hardware.

You can build a validating option parser by hand (option d) or using
getopt (option a).  You will find it difficult to the point of near
impossibility to build one using optparse, because optparse specifically
rejects the notion of 'required options' which is the meat-and-potoatoes
part of this app.  I found this out when optparse went into the standard
library, and was touted at being superior to getopt.

I tried it, and tried to subclass it, and talked to its author about
whether it could be changed to support my use case, and even volunteered
to write code to change it, but was firmly told that optparse worked
the way it did, on purpose, in order to prevent the sort of use that
I wanted to make of it, and that patches to make that possible were
entirely unwelcome.

At which point I go back to using getopt.  There is no particular hard
feelings about this --  I figure that the people who want to use
optparse can use it, and getopt is here for the people who won't, or
can't.  But when later in time people suggested getting rid of getopt
because 'it was old' and 'optparse was better' I realise, a long time
when it was too late to do anything, that I should have been spearheading
the 'I don't want optparse in the standard library' effort, on the
grounds that it didn't support 'required arguments'.  That fight might
have become nasty.  Far better to allow multiple ways of doing things.

As far as I can tell most software packages go through three stages:

Stage A:  "rapid development"  Expect changes to the API.  New releases
          can and will break all the existing code out there.  There are
          lots of bugs, but it is hard to tell a bug from a thing that
          is incomplete sometimes.

Stage B:  "things have (mostly) settled down"  Major releases  may break 
          existing code.  Minor ones will not. The bugs are being shaken
          out.  Development slows.

	  There are two major variants in Stage B code.

	  B-1: has an active maintainer(s)
	  B-2: isn't being maintained any more

Stage C:  "until hell freezes over"  The author of the package has determined
          that he has solved his problem as well as he ever cares to.  He
          considers his package 'finished', 'done', or 'complete'. No more
	  bugs are being found.  No more development is planned.  

	  It also has two variants.

	  C-1:  The maintainer will fix major bugs should you ever find
                any.
          C-2: The maintainer won't.  He might even be dead.

Some software packages never reach stage C.  Forgetting those that are
abandoned by their authors, for the moment while they are still in
stage B, we are still left with many that deal with the rapidly
changing world in such a way that their development can never be said
to be done.

Now software in Stage A doesn't need maintenance so much as development
in the first place.  And software in stage B requires a lot of
maintainance.  But most software in stage C requires little or no
maintenance, precisely because it is unchanging.  So, if you decide to
change how Exceptions are inherited, for instance, you may break a
whole lot of Stage C: code, but fixing them is part of the general
problem of 'fixing exceptions everywhere' not part of 'fixing getopt'.

Now I think that there is some confusion here between packages who
are in Stage C:  and packages who are in Stage B-2:  around here.
The B-2 packages are probably the core developers greatest headache.
But I don't see the C packages as being troublesome at all.  If you
don't like them, it isn't because of the work that maintaining is
costing you.  It may be that hatred for the B-2's has become a general
hatred of all packages with no maintainers, which is an understandable
mistake.  But from reading this list I get the distinct impression that
some people just hate C:s _precisely because they are old and unchanging_,
and would continue to hate them for that reason even if I was in some
way able to guarantee that they would never need any maintenance ever
again.  These people are condemning the packages I love best for the
reason they are the packages I love best.  And that is the attitude
I would like to change.

It boils down to a matter of trust.  My customers trust me to not give
them ticking time bombs that will all stop working one day, and I trust
you not to go about gratuitously removing perfectly working code that
is quietly sitting there, not needing any changes, and not bothering
anybody.  When you break that contract with me, my customers suffer,
I suffer, and the people who said 'You shouldn't have coded it in
Python in the first place, but picked a mature language like Java'
are completely vindicated.  Deciding that we want to be more flexible
than Java and actually retire some old packages doesn't mean that
we should condone retiring old packages because they are old and
haven't changed in a long while.  There are a great number of packages
that should be kept for this very reason, and I think that getopt is
a great example of one of them.

So, if we are going to reorganise the standard library, even if only
conceptually, I'd like to toss in my suggestions for improvement.

I'd like to make it possible to tag modules.

One good set of tags would be 'CPython', 'Jython', 'PyPy', etc.
When you want to build your own version of the standard library you
just get the modules that are tagged for you.

Another good tag is 'who is maintaining this thing'.  Which could be
'no one'.  Along with this information I would like to know how long
this person has promised to maintain the thing, and at what point in
time (if ever) does he expect this module to become

And then I would like something along my A B and C scheme, though of
course the number of categories is likely to change and they should have
meaningful names (which is something I am particularly poor at coming
up with).  

This would mean as a developer I could take a look at the standard
library and find out:

getopt:  maintained by (nobody).  Entered the standard library in 1990.
         current status (dead as a doornail).  Expected to reach the
         status (dead as a doornail) by (already reached).  
Alternatives:
         optparse, argparse, optfunct* (not in the standard library)

argparse: maintained by (Steven Bethard)  Entered the standard library
          in 2009.  current status (a moderate amount of change is 
          happening).  Expected to reach the status (dead as a doornail)
          by (2012).  Has never reached the status 'dead as a doornail'.
	  Alternatives: optparse, getopt,  optfunct* (not in the standard 
          library)

or maybe we will see

argparse: maintained by (Steven Bethard)  Entered the standard library
          in (2009).  current status (a moderate amount of change is
          happening).  Expected to reach that status (dead as a doornail)
          by (never, Steve expects that this module will be under
          continuous development through its lifetime).  Has never reached
          the status 'dead as a doornail'. Alternatives: optparse, getopt, 
          optfunct* (not in the standard library)

we can also get things like:

unittest: maintained by (Michael Foord).  Entered the standard library in
          1999-or-whenever-the-real-date-was. Current status (a moderate 
          amount of change is happening).  Expected to reach the status 
          (dead as a doornail) by (2012). This module reached the status
          of 'dead as a doornail' in 2002 but was subsequently revived
          by Michael Foord in 2008.  Alternatives py.test*  (not in the 
          standard library) nose*  (not in the standard library)

elementtree: maintained by (noone) Entered the standard library in
          2005-or-whenever-the-real-date-was.  Current status (no changes
          are happening).  Expected to reach the status of 'dead as a
          doornail' by (unknown.  Fredrik Lundh, the original author
          of the module is no longer maintaining it.  It is incomplete,
          and a new maintainer is actively sought.)  No alternatives.

or maybe we get

elementtree: maintained by (noone) Entered the standard library in
          2005-or-whenever-the-real-date-was.  Current status (no changes
          are happening).  Expected to reach the status of 'dead as a
          doornail' by (unknown.  Fredrik Lundh, the original author
          of the module is no longer maintaining it.  It is quite finished,
          and the proposal has been made to call it dead as a doornail
          in 2010.) No alternatives.

I would find this very useful, as would anybody who wants to use dead-as-a
doornail whenever possible.  And I think that it would give a certain
breathing space for python core developers -- breaking things when the
library involved was tagged as 'under moderate development' is a much
less heinous sin than breaking the ones that are 'dead as a doornail'.
People like Michael would understand that when they re-open something
like unittest, they are taking on a responsibility which comes with a
much larger burden of 'don't break existing code'.  After all, if I
had wanted new unittesting features, I would be using py.test or nose.
When I use unittest at all, these days, you can take it a a very strong
indication that either I, or my customers, are vastly more interested
in stability than new features, and will be insensed if something breaks
in the name of 'this is better'.

So what do you think of this proposal?

Laura