Sharing: File Reader Generator with & w/o Policy

Sat Mar 15 22:01:21 EDT 2014

On Sat, 15 Mar 2014 16:38:18 -0500, Mark H Harris wrote:

> hi folks, I am posting to share a File Reader Generator which I have
> been playing with, that simplifies reading of text files on-demand: like
> log files, config files, small record flat data-bases, &c.

Reading from files is already pretty simple. I would expect that it will 
be harder to learn the specific details of custom, specialised, file 
readers that *almost*, but not quite, do what you want, than to just 
write a couple of lines of code to do what you need when you need it. 
Particularly for interactive use, where robustness is less important than 
ease of use.

> I have two generators to share, one with & one without "policy". 

What's "policy"?

> The idea is to have the generator open and close the file (with error
> checking:  try-finish block) and then maintain its state for on-demand
> reading either into memory (as list or dict) or for in-line processing.
> 
> I will demonstrate the generators here, and then post the code
> following. The generator will be reading a path+filename of a local disk
> file and printing it as in this simple case without policy:
>
> >>> from my_utils import *
> >>> for record in fName(path+"my_fox"):
> 	      print(record)
> 
> The quick brown fox jumped
> over the lazy dog's tail.

What's "fName" mean? "File name"? That's a horribly misleading name, 
since it *takes* a file name as argument, it doesn't return one. That 
would be like renaming the len() function to "list", since it takes a 
list as argument. Function and class names should be descriptive, giving 
at least a hint as to what they do.

It looks to me that this fName just iterates over the lines in a file, 
which makes it pretty close to just:

for line in open(path + "my_fox"):
    print(line)

> The second generator adds "policy" to the generator processing and
> yields tuples, rather than strings. Each tuple contains the record
> number (from zero), and record length (minus the line end), and the
> record itself (stripped of the line end):

I presume that "record" here means "line", rather than an actual record 
from a flat file with fixed-width fields, or some delimiter other than 
newlines.

for i, line in enumerate(open(pathname + "my_fox")):
    print((i, len(line), line))

>  >>> for record in fnName(path+"my_fox"):
> 	      print(record)

What's "fnName" mean? Perhaps "filename name"? "function name"? Again, 
the name gives no hint as to what the function does.

> def fName(filename):
>      try:
>          fh = open(filename, 'r')
>      except FileNotFoundError as err_code:
>          print (err_code)

For interactive use, this is *just barely* acceptable as a (supposedly) 
user-friendly alternative to a stack trace. 

[Aside: I don't believe that insulating programmers from tracebacks does 
them any favours. Like the Dark Side of the Force, hiding errors is 
seductively attractive, but ultimately harmful, since error tracebacks 
are intimidating to beginners but an essential weapon in the battle 
against buggy code. But reading tracebacks is a skill programmers have to 
learn. Hiding tracebacks does them no favours, it just makes it harder 
for them to learn good debugging skills, and encourages them to treat 
errors as *something to hide* rather than *something to fix*.]

But as a reusable tool for use in non-interactive code, this function 
fails badly. By capturing the exception, it makes it painfully difficult 
for the caller to have control over error-handling. You cannot let the 
exception propagate to some other part of the application for handling; 
you cannot log the exception, or ignore it, or silently swallow the 
exception and try another file. The fName function makes the decision for 
you: it will print the error to standard output (not even standard 
error!) no matter what you want. That's the very essence of *user-
hostile* for library code.

Worse, it's inconsistent! Some errors are handled normally, with an 
exception. It's only FileNotFoundError that is captured and printed. So 
if the user wants to re-use this function and do something with any 
exceptions, she has to use *two* forms of error handling:

(1) wrap it in try...except handler to capture any exception other 
    than FileNotFoundError; and

(2) intercept writes to standard out, capture the error message, and 
    reverse-engineer what went wrong.

instead of just one.

>      else:
>          while True:
>              linein = fh.readline()
>              if (linein!=''):
>                  yield(linein.strip('\n'))
>              else:
>                  break
>          fh.close()

Apart from stripping newlines, which is surely better left to the user 
(what if they need to see the newline? by stripping them automatically, 
the user cannot distinguish between a file which ends with a newline 
character and one which does not), this part is just a re-invention of 
the existing wheel. File objects are already iterable, and yield the 
lines of the file.

>      finally:
>          None

The finally clause is pointless, and not even written idiomatically as a 
do-nothing statement ("pass").

> def fnName(filename):
>      try:
>          fh = open(filename, 'r')
>      except FileNotFoundError as err_code:
>          print (err_code)
>      else:
>          line_count = 0
>          while True:
>              linein = fh.readline()
>              if (linein!=''):
>                  lineout = linein.strip('\n')
>                  length = len(lineout)
>                  yield((line_count, length, lineout))
>                  line_count+=1
>              else:
>                  break
>          fh.close()
>      finally:
>          None

This function re-implements the fName function, except for a simple 
addition. It could be written as:

def fnName(filename):
    for count, line in enumerate(fName(filename)):
        yield (count, len(line), line)

-- 
Steven D'Aprano
http://import-that.dreamwidth.org/