exception handling in complex Python programs

Wed Aug 20 20:49:14 EDT 2008

On Aug 20, 10:59 am, Steven D'Aprano <st... at REMOVE-THIS-
cybersource.com.au> wrote:
> Oh goodie. Another programmer who goes out of his way to make it hard for
> other programmers, by destroying duck-typing.

Remember kids: personal attacks are cruise control for cool.

So this was a simplification - most of the asserts I've written don't
actually use isinstance, partly because typing isinstance takes too
long. The point is to create a barricade so that when something goes
wrong, you get an assertion error against the code you wrote, not an
exception against doing something like

  print("blah blah %s" % message)

where message turns out to be None. This is simply a way to make
debugging a more pleasant experience (quite valuable IMHO since
debugging is inherently difficult and can be quite aggravating). Here
is a sampling:

assert statelt.tag == 'stat'
assert len(path) > 0 and path[0] == '/'
assert self.__expr != None

So here asserts are used to made distinctions that are more fine-
grained than type.

> > and other similar assertions in routines. The point is that EAFP
> > conflicts with the interest of reporting errors as soon as possible
>
> Not necessarily. Tell me how this conflicts with reporting errors as soon
> as possible:
>
> def do_something(filename):
>     try:
>         f = open(filename)
>     except IOError, e:
>         report_exception(e)  # use a GUI, log to a file, whatever...
>
> How could you report the exception any earlier than immediately?

Here is an example: a simple query tool for a tiny "mock SQL"
relational database. With a method (called "select_all") you can
perform the equivalent of a select query on the database. The contents
of the query are specified with triples of the form

[field, comparison_operator, value]

for instance ['name', operator.equals, cmd_name]. You can also specify
an "order by" field which is None by default. In the code written,
there is an assertion that the order-by field is either None or a
valid field name (we can't order by a nonexistent field!). If the
assertion isn't there, then I will get an error on this line:

  key_extractor = KeyExtractor(q_column_names.index(order_by_column))

In this particular case, I will get a ValueError (what does ValueError
mean again? And what is this KeyExtractor?) since the index method
will fail. I wrote the tiny relational database a long time ago, and I
really don't want to put pressure on my mental cache by thinking about
the internal logic of this chunk of code. After scratching my head for
a while, I'll probably figure it out. Now imagine that you instead get
an error on this line:

  assert order_by_column in q_column_names

Now the programming error slaps me with a fish and yells "STOP! YOU
CAN'T ORDER BY A FIELD THAT DOESN'T EXIST!!!". It will take about 2
seconds to figure out what went wrong. I just saved a minute figuring
out what the problem is. Multiply that by ten, and you've just
eliminated work in a potentially laborious debugging session.

If you look at the history of the EAFP concept in Python, then you see
that it comes from Alex Martelli's Python in a Nutshell around pages
113-114. I don't think the code examples make the case for EAFP very
well (not that I know what EAFP is in the first place, given that it
is barely explained. I interpret it as "wrap questionable stuff in try/
except blocks"), and in any case there is practically no support for
using EAFP as the dominant error-handling paradigm. If you look at
Code Complete, then you'll see the opposite suggestion, namely that
exceptions should only be used for truly exceptional circumstances
(i.e. bugs). McConnell argues that try/except is an inherently complex
control structure so it should be used sparingly (just like balsamic
vinegar!). I happen to think that truth lies between these extremes,
but I'd err on using fewer try/except structures, not more. Using
several try/except blocks across multiple activation records sounds
like unreadable code to me.

If shared objects are used pervasively, then I would predict that EAFP
will not provide adequate abstractions to control program complexity
(see http://research.microsoft.com/Users/simonpj/papers/stm/stm.pdf
and the Wikipedia article on software transactional memory). These
days you can switch to Stackless and use tasklets and atomic
operations (See http://www.stackless.com/wiki/Tasklets). There is a
debate between EAFP and LBYL here: http://mail.python.org/pipermail/python-list/2003-May/205182.html.
Martelli's posts in support of EAFP are heavily skewed towards a
multithreaded scenario and avoiding race conditions. IMHO, letting
locking and race condition concerns dictate your error-handling
paradigm is a case of the tail wagging the dog, especially when there
are alternatives to this particular tar pit: pipes or a shared nothing
architecture.

David