[Tutor] Parsing data from a set of files iteratively

Thu May 31 03:41:19 CEST 2012

Spyros Charonis wrote:
> On Wed, May 30, 2012 at 8:16 AM, Steven D'Aprano <steve at pearwood.info>wrote:
[...]
>> There is little as painful as a program which prints "An error occurred"
>> and then *keeps working*. What does this mean? Can I trust that the
>> program's final result is correct? How can it be correct if an error
>> occurred? What error occurred? How do I fix it?
>>
> My understanding is that an except clause will catch a relevant error and
> raise an exception if there is one, discontinuing program execution.

No, the opposite. An except clause will catch the exception and *continue* 
execution past the end of the try...except block.

Python automatically raises exceptions and halts execution if you do nothing. 
For example:

py> for x in (1, 0, 2):
...     print(1/x)
...
1.0
Traceback (most recent call last):
   File "<stdin>", line 2, in <module>
ZeroDivisionError: division by zero

Notice that the first time around the loop, 1.0 is printed. The second time, 
an error occurs (1/0 is not defined), so Python raises an exception. Because I 
don't catch that exception, it halts execution of the loop and prints the 
traceback.

But a try...except clause will catch the exception and keep going:

py> for x in (1, 0, 2):
...     try:
...             print(1/x)
...     except ZeroDivisionError:
...             print('something bad happened')
...
1.0
something bad happened
0.5

There are good reasons for catching exceptions. Sometimes you can recover from 
the error, or skip the bad data. Sometimes one calculation fails but you can 
try another one. Or you might want to catch an exception of one type, and 
replace it with a different exception with a more appropriate error message.

But all too often, I see beginners catching exceptions and just covering them 
up, or replacing useful tracebacks which help with debugging with bland and 
useless generic error messages like "An error occurred".

>>>     except SyntaxError:
>>>         print "Check Your Syntax!"
>> This except-clause is even more useless. SyntaxErrors happen when the
>> code is compiled, not run, so by the time the for-loop is entered, the
>> code has already been compiled and cannot possibly raise SyntaxError.
>>
> What I meant was, check the syntax of my pathname specification, i.e. check
> that I
> did not make a type when writing the path of the directory I want to scan
> over. I realize
> syntax has a much more specific meaning in the context of programming -
> code syntax!

That's not what SyntaxError does in Python. Python only understands one form 
of syntax: *Python* syntax, not the syntax of pathnames to files. If you type 
the wrong pathname:

pathname = "My Documents!letters!personal!letter to my mother+doc"

Python will not raise SyntaxError. It will try to open the file called

My Documents!letters!personal!letter to my mother+doc

*exactly* as you typed it, and either succeed (if by some unimaginable fluke 
there happens to be a file of that name!) or fail. If it fails, you will get 
an OSError or IOError, depending on the nature of the failure reported by the 
operating system.

[...]
>> But you don't just get IOError for *missing* files, but also for
>> *unreadable* files, perhaps because you don't have permission to read
>> them, or perhaps because the file is corrupt and can't be read.
>>
> Understood, but given that I am reading and processing are standard ASCII
> text files,
> there is no good reason (which I can think of) that the files would be
> *unreadable*

*Any* file can be unreadable. The disk may develop a fault, and no longer be 
able to read the file's data blocks. Or the file system may be corrupted and 
the operating system can see that the file is there, but not where it is. If 
the file is on a network share, the network may have gone down halfway through 
reading the file. If it's on a USB stick or external hard drive, somebody 
might have unplugged it, or the connector might be wobbly.

Normally, for a script like this, failure to read a file should be considered 
a fatal error. If a file which *should* be there is no longer there, you 
should report the problem and halt. I recommend that you don't catch the 
exception at all, just let the traceback occur as normal.

> I verified that I had read/write permissions for all my files, which are
> the default access privileges anyway (for the owner).

Fine, but I'm talking in general rather than specific for you. In general, 
"file not found" is not the only error you can get. There is a remarkably 
large number of things that can go wrong when reading files, fortunately most 
of them are very rare.

Consider what your code does. First, you ask the operating system for a list 
of the files in a directory, using os.listdir. Then you expect that some of 
those files might be missing, and try to catch the exception. Is this 
reasonable? Do you actually expect the operating system will lie to you and 
say that files are there that actually don't exist?

For a robust application that runs in an environment where it is possible that 
files will routinely be created or destroyed at the same time that the 
application is running, a more careful and paranoid approach is appropriate. 
For a short script that you control, perhaps not so much.

-- 
Steven