[stdlib-sig] Breaking out the stdlib

Tue Sep 15 16:38:20 CEST 2009

On Tue, Sep 15, 2009 at 10:29 AM, Laura Creighton <lac at openend.se> wrote:
[snip]
> Indeed, right now, if you write code that parses options you are
> most likely doing one of 4 things:
>
> a) using getopt
> b) using optparse
> c) using some third party module for parsing that is currently under
>   development
> d) writing your own option parser in python
>
> Now, for years I have been telling people that d) was by far the poorest
> way of doing things.  Don't reinvent the wheel, and all that.  But on
> the day that you yank getopt and optparse out of the library, you will
> have made this terrible advice to give my poor hospital clients.  And
> you will have made all the time I spent ripping out hand-made option
> parsers and replacing them with getopt instead wasted time that I billed
> my clients for.  Instead of giving them a robust solution that will
> require little or no maintainance, I have given them a zillion scripts,
> _all of which will suddenly break one day_.  And since the pure python
> parsers that I never got around to converting will probably still be
> working, the people who did d) will get the last laugh.

This is incorrect. Code that runs correctly on Python x.y will not
"suddenly break one day": no-one from python-dev is going to reach
into your customers' systems and install some other version of Python
behind your/their back, against their will.

The code you have written will continue to work, indefinitely, until
the customer decides to upgrade their version of Python. When that
happens, they can choose to perform this migration correctly or
incorrectly. Upgrading correctly involves testing (your code comes
with a test suite, I assume) against the new version of Python, to
make sure things work, before deploying to production. This is no
different than upgrading versions of Postgres, gcc or Linux: it has
risks, but gcc doesn't upgrade itself on its own. You choose when to
do this, and you do so in a way that contains and mitigates the risk.

> So -- real use case time -- the hospitals where I have done a lot of
> work have some really weird equipment.  And using them costs real
> money, uses up real lab supplies, and conceivably can ruin a sample
> that you will find it inconvenient or impossible to replace.
>
> There are all sorts of weird dependencies among the various options
> you can use to operate the devices.  If you have specified option K,
> you may not also specify option M or N, and if you have specified
> option L, you must also specify option Q, or option R and option S.
>
> Thus the whole exercise of writing a script to use the equipment
> becomes as matter of validating the options you selected are complete
> and non-contradictory, and then going out and exercising the hardware.
>
> You can build a validating option parser by hand (option d) or using
> getopt (option a).  You will find it difficult to the point of near
> impossibility to build one using optparse, because optparse specifically
> rejects the notion of 'required options' which is the meat-and-potoatoes
> part of this app.  I found this out when optparse went into the standard
> library, and was touted at being superior to getopt.
>
> I tried it, and tried to subclass it, and talked to its author about
> whether it could be changed to support my use case, and even volunteered
> to write code to change it, but was firmly told that optparse worked
> the way it did, on purpose, in order to prevent the sort of use that
> I wanted to make of it, and that patches to make that possible were
> entirely unwelcome.

Frankly, this sounds like an *excellent* reason to get rid of optparse
and replace it with something more flexible. Steven Bethard has said
that he originally tried adding the features of argparse to optparse,
but found the code to be so poorly designed that it really couldn't be
changed in any meaningful way. If the Python community is finding
common argument-parsing scenarios that optparse doesn't support (such
as what you outlined above), to the point where people are hand-coding
argument parsers because that's easier than using optparse, then
optparse is a failure and we should look for something that better
serves the needs of developers like yourself.

> At which point I go back to using getopt.  There is no particular hard
> feelings about this --  I figure that the people who want to use
> optparse can use it, and getopt is here for the people who won't, or
> can't.  But when later in time people suggested getting rid of getopt
> because 'it was old' and 'optparse was better' I realise, a long time
> when it was too late to do anything, that I should have been spearheading
> the 'I don't want optparse in the standard library' effort, on the
> grounds that it didn't support 'required arguments'.  That fight might
> have become nasty.  Far better to allow multiple ways of doing things.

I sympathize with your experience, but disagree with your conclusion.
It doesn't sound like we have multiple ways of doing things at all: it
seems we have several different argument parsing libraries that,
despite starting from a common goal of "argument parsing", have
implemented divergent functionalities with only a relatively small
area of overlap. They actually do different things, and in different
ways.

This is a serious problem. If I've been happily using optparse in my
project, and one day discover that I need required arguments, I have
to switch to a totally different library? Libraries should be able to
scale to meet their users' needs. This is what Jacob Kaplan-Moss was
talking about in his recent PyCon Argentina/Brazil keynotes
(http://jacobian.org/TO): as the user has more complex/specific needs,
the library should be able to yield gracefully, either implementing
the desired functionality itself or making it possible for the user to
customize specific aspects of the library's behaviour without throwing
away/rewriting the entire library.

> As far as I can tell most software packages go through three stages:
[snip stage descriptions]
> Now software in Stage A doesn't need maintenance so much as development
> in the first place.  And software in stage B requires a lot of
> maintainance.  But most software in stage C requires little or no
> maintenance, precisely because it is unchanging.  So, if you decide to
> change how Exceptions are inherited, for instance, you may break a
> whole lot of Stage C: code, but fixing them is part of the general
> problem of 'fixing exceptions everywhere' not part of 'fixing getopt'.
>
> Now I think that there is some confusion here between packages who
> are in Stage C:  and packages who are in Stage B-2:  around here.
> The B-2 packages are probably the core developers greatest headache.
> But I don't see the C packages as being troublesome at all.  If you
> don't like them, it isn't because of the work that maintaining is
> costing you.  It may be that hatred for the B-2's has become a general
> hatred of all packages with no maintainers, which is an understandable
> mistake.  But from reading this list I get the distinct impression that
> some people just hate C:s _precisely because they are old and unchanging_,
> and would continue to hate them for that reason even if I was in some
> way able to guarantee that they would never need any maintenance ever
> again.  These people are condemning the packages I love best for the
> reason they are the packages I love best.  And that is the attitude
> I would like to change.

Speaking as a core developer, I disagree. The problem with
unmaintained (in your terms, B-2) or intentionally-frozen (C-*)
packages is that they make it difficult for us to evolve and adapt
Python the language and Python the standard library: if no-one is
willing/available to update the code to account for language/library
changes, the frozen package will become pinned to a specific
known-good version (or range of versions) of Python. Over time, that
version of Python will become uncommon (as distros phase it out) and
unsupported (as python-dev end-of-lifes it). This is a problem for
users of that package, who may wish to use newer version of Python for
performance or bug-fix reasons, and it is also a problem for
python-dev, since those frozen packages create inertia.

> It boils down to a matter of trust.  My customers trust me to not give
> them ticking time bombs that will all stop working one day, and I trust
> you not to go about gratuitously removing perfectly working code that
> is quietly sitting there, not needing any changes, and not bothering
> anybody.  When you break that contract with me, my customers suffer,
> I suffer, and the people who said 'You shouldn't have coded it in
> Python in the first place, but picked a mature language like Java'
> are completely vindicated.

The systems you have written for your customers are not autonomous
agents; presumably, you have not written code like "if today.year ==
2011: sudo apt-get upgrade python" into these systems that would
change the version of Python running without anyone asking. Human
beings control these upgrades. If the human being performs the upgrade
blindly, without taking appropriate risk mitigation steps, there's
very little that we can do to protect them. You left out a key element
in the chain of trust above: presumably, you trust your customers not
to violate the minimum requirements you've set out for the software
you've written for them. If you say "this software requires 2GB of
RAM" and the customer later decides to try running the machines with
only 256MB to save money instead, that's not your fault: it's theirs.
Likewise, if you say "this software runs on Python 2.5" and they
blindly install Python 3.1 instead, that's not your fault: it's
theirs.

As to the matter of Java's deprecation policy, I don't regard it as
"mature": I regard it as a sign of different requirements. Because you
can't know what browser version or JRE version a user's desktop is
running, stability is paramount for Java; "write once, run anywhere"
is not free, it has its costs. As Frank Wierzbicki has said (either in
this thread or the other one about argparse), the inability to ever
remove code from the libraries makes life difficult for Java's
developers -- who have to maintain this code -- as well as for
everyday Java engineers, who have to learn to navigate this maze of
deprecated vs non-deprecated solutions to the same problem. Java shops
generally end up with a list of Approved Java Classes so that new
hires and old pros alike don't get tripped up.

In Python, we don't have the luxury of a paid staff to work on our
libraries, to maintain the crufty, fragile,
we'd-like-to-get-rid-of-you-but-maybe-someone's-using-you-we-don't-really-know
modules. We rely almost exclusively on volunteer contributions, and
it's tough to find volunteers to work on crap code. It's one thing to
choose not to change something; it's another thing entirely not to be
able to change something, to have your hands tied by code you can't
see and no-one will change. As Python development slows, as stability
gets confused for permanence and stasis, I predict it will be harder
to attract enthusiastic, eager contributors. After all, who wants to
work on something you're not allowed to modify?

To speak more personally, and specifically to the issue of
getopt/optparse vs argparse: at Google, I'm part of the Python
readability team, which helps train the large numbers of Python
developers that the company produces. Part of this job involves
conducting detailed code reviews for new Python programmers,
explaining both Google style and idiomatic Python code generally,
suggesting library A over hand-written solution B. I am, frankly,
embarrassed whenever I have to explain the difference between getopt
and optparse, urllib and urllib2, popen2 vs os.popen* vs subprocess,
string.* vs str.*, etc. I cannot imagine how embarrassed I will be
when I have to explain why the standard library includes getopt,
optparse and argparse.

Collin Winter