File handling: The easy and the hard way

Steve Holden steve at holdenweb.com
Thu Sep 30 10:37:49 EDT 2004


Hans-Joachim Widmaier wrote:

> Hi all.
> 
> Handling files is an extremely frequent task in programming, so most
> programming languages have an abstraction of the basic files offered by
> the underlying operating system. This is indeed also true for our language
> of choice, Python. Its file type allows some extraordinary convenient
> access like:
> 
>     for line in open("blah"):
>         handle_line(line)
> 
> While this is very handy for interactive usage or throw-away scripts, I'd
> consider it a serious bug in a "production quality" software. Tracebacks
> are fine for programmers, but end users really never should see any.
> Especially not when the error is not in the program itself, but rather
> just a mistyped filename. (Most of my helper scripts that we use to
> develop software handle files this way. And even my co-workers don't
> recognize 'file or directory not found' for what it is.) End users are
> entitled to error messages they can easily understand like "I could not
> open 'blaah' because there is no such file". Graceful error handling is
> even more important when a program isn't just run on a command line but
> with a GUI.
> 
I agree we really shouldn't expect users to have to see tracebacks, but 
that doesn't mean that exception handling has to be sophisticated.

Here's something I'd consider acceptable, which doesn't add hugely to 
the programming overhead for multiple files but doesn't burden the user 
with horrible tracebacks. I've used it to print itself, so you see how 
it works and what it contains all in the same output:

sholden at DELLBOY ~
$ ./ft.py one.py ft.py
Problem handling file one.py : [Errno 2] No such file or directory: 'one.py'
#!/usr/bin/python
#
# ft.py: simple multi-file processor with error handling
#
import sys

files = sys.argv[1:]

for f in files:
     try:
         for l in file(f):
             sys.stdout.write(l)
     except Exception, reason:
         print >> sys.stderr, "Problem handling file", f, ":", reason

I'm quite happy to let the process-termination housekeeping code, or 
perhaps (in some implementations) the Python housekeeping at garbage 
collection, close the file, which you might think is unduly sloppy. What 
can I say, the user pays if they don't want sloppy :-). But I'd consider 
this sufficiently close to "production quality" to be delivered to 
end-users. Clearly you can add file assignment to a variable and a 
try/finally to ensure it's closed. You gets what you pays for.

Naturally, if recovery is required rather than just error-reporting then 
the situation can be expected to be a little more complicated.

> Which means? Which means that all this convenient file handling that
> Python offers really should not be used in programs you give away. When I
> asked for a canonical file access pattern some months ago, this was the
> result:
> http://groups.google.com/groups?hl=de&lr=&ie=UTF-8&threadm=pan.2003.12.30.21.32.37.195763%40web.de&rnum=1&prev=/groups%3Fhl%3Dde%26lr%3D%26ie%3DUTF-8%26q%3Dfile%2Bpattern%2Bcanonical%26btnG%3DSuche%26meta%3D
> 
> Now I have some programs that read and write several files at once. And
> the reading and writing is done all over the place. If I really wanted to
> do it "right", my once clear and readily understandable code turns into a
> nightmare. This doesn't look like the language I love for its clarity and
> expressivness any more.

Well, the more complex your processing gets the more complex your 
error-handling gets too, but I'd say you should look at some serious 
refactoring here - you appear to have what's sometimes called a "code 
smell" in extreme programming circles. See

     http://c2.com/cgi/wiki/?CodeSmell

You also appear to have a good nose, one of the distinctive properties 
of the conscientious programmer.

 >                   Python, being a very high level language, needs a
> higher level file type, IMHO. This is, of course, much easier said than
> done. And renown dimwits like me aren't expected to come up with solutions.

Don't talk yourself down! You have already shown sound instinct.

> I've thought about subclassing file, but to me it looks like it wouldn't
> help much. With all this try/except framing you need to insert a call
> level anyway (wondering if this new decorator stuff might help?). The best
> I've come up so far is a vague idea for an error callback (if there isn't
> one, the well known exceptions might be raised) that gets called for
> whatever error occured, like:
> 
> class File:
>     ...
>     def write(self, data):
>         while True:
>             try:
>                 self._write(data)
>             except IOError, e:
>                 if self.errorcallback:
>                     ret, dat = self.errorcallback(self, F_WRITE, e, data)
>                     if ret == F_RETURN:
>                         return dat
>                 else:
>                     raise
> 
> The callback could then write a nice error message, abort the program,
> maybe retry the operation (that's what the 'while True'-loop is for) or
> return whatever value to the original caller. Although the callback
> function will usually be more than a few lines, it can be reused. It can
> even be packed into your own file-error-handling module, something the
> original usage pattern can't.
> 
The problem that any such approach is likely to have can be summed up as 
"If processing is complicated then error-handling may also become 
complicated, and error-recovery even more so". You shouldn't expect it 
to be too simple, but if it's too complex then you might find that a 
restructuring of your program will yield better results.

> If you still bear with me, you might as well sacrifice a few more seconds
> and tell me what you think about my rant. Is everything just fine as it is
> now? Or do I have a point? I always felt it most important to handle all
> errors a program may encounter gracefully, and the easier this is to do,
> the less likely it is a programmer will just sneak around the issue and
> let the interpreter/run time system/operating system handle it. (And yes,
> I'm guilty of not obeying it myself, as it can double or triple the time
> needed to write the whole program; just because its so cumbersome.)
> 
There's an old rule-of-thumb, which may come from "The Mythical 
Man-Month", still quite a worthwhile read though probably at least 30 
years old now. Or it may not. It states that you can expect to spend 
three times as much effort producing a program product (something to be 
delivered to end-users) as producing a program (something you plan to 
use yourself, and write accordingly); and that a further factor of three 
is required to produce a programmed system product (a collection of 
programs which work together as a system and will be delivered to 
end-users) over just producing the program products individually.

This combined factor of nine is often referred to as "engineering 
effort", and includes

a) The creative exercise of imagination sufficient to anticipate the 
usual and unusual failure cases;
b) The creative exercise of programming skill sufficient to ensure that 
the failure cases still result in acceptable system behavior; and
c) The creative exercise of political skill sufficient to persuade a 
reluctant management that steps a) and b) are worth paying for.

The combination of all three components is to be found in beasts 
sometimes known as "software engineers", frequently held by some to be 
mythical.

If I were feeling cynical, I might sum this up by saying "Python is a 
programming language, not a f***ing magic wand". But that won't stop 
people from looking for the silver bullet that solves all their problems 
in a songle line of code.

Hope this helps, and doesn't come across as critical. Your questions are 
reasonable, and show a sincere appreciation of the difficulties of 
producing high-quality software. And we don't ever want anything else, 
do we?

regards
  Steve



More information about the Python-list mailing list