Sharing: File Reader Generator with & w/o Policy

Mark H Harris harrismh777 at gmail.com
Sat Mar 15 23:34:52 EDT 2014


On 3/15/14 9:01 PM, Steven D'Aprano wrote:
> Reading from files is already pretty simple. I would expect that it will
> be harder to learn the specific details of custom, specialised, file
> readers that *almost*, but not quite, do what you want, than to just
> write a couple of lines of code to do what you need when you need it.
> Particularly for interactive use, where robustness is less important than
> ease of use.

    Yes. What I'm finding is that I'm coding the same 4-6 lines of code 
with every file open (I do want error handling, at least for 
FileNotFoundError) and I only want it to be two lines, read the file 
into a list with error handling.

> What's "policy"?

    That's part of what I personally struggle with (frequently) is do I 
place the policy in the generator, or do I handle it on the outside. For 
instance, I normally strip the line-end and I want to know the record 
lengths. I also may want to know the record number from arrival 
sequence. This policy can be handled in the generator; although, I could 
have handled it outside too.



> for i, line in enumerate(open(pathname + "my_fox")):
>      print((i, len(line), line))

I like it...  and this is where I've always been, when I finally said to 
myself, yuk.  yes, it technically works very well. But, its ugly. And I 
don't mean its technically ugly, I mean its aesthetically ugly and not 
user-easy-to-read.  (I know that's all subjective)

for line in getnumline(path+"my_foxy")):
       print(line)

In this case getnumline() is a generator wrapper around fName(). It of 
course doesn't do anything different than the two lines you listed, but 
it is immediately easier to tell what is happening; even if you're not 
an experienced python programmer.


> [Aside: I don't believe that insulating programmers from tracebacks does
> them any favours.

Yes. I think you're right about that.  But what if they're not 
programmers; what if they're just application users that don't have a 
clue what a trace-back is, and just want to know that the file does not 
exist?  And right away they realize that, oops, I spelled the filename 
wrong.  Yeaah, I struggle with this as I'm trying to simplify, because 
personally I want to see the trace back info.

> Worse, it's inconsistent! Some errors are handled normally, with an
> exception. It's only FileNotFoundError that is captured and printed. So
> if the user wants to re-use this function and do something with any
> exceptions, she has to use *two* forms of error handling:

Yes. The exception handling needs to handle all normal errors.
>
> (1) wrap it in try...except handler to capture any exception other
>      than FileNotFoundError; and
>
> (2) intercept writes to standard out, capture the error message, and
>      reverse-engineer what went wrong.

Ok.


> Apart from stripping newlines, which is surely better left to the user
> (what if they need to see the newline? by stripping them automatically,
> the user cannot distinguish between a file which ends with a newline
> character and one which does not), this part is just a re-invention of
> the existing wheel. File objects are already iterable, and yield the
> lines of the file.

Yes, this is based on my use case, which never needs the line-ends, in 
fact they are a pain. These files are variable record length and the 
only thing the newline is used for is delimiting the records.


>
> def fnName(filename):
>      for count, line in enumerate(fName(filename)):
>          yield (count, len(line), line)
>
I like this, thanks!   enumerate and I are becoming friends.

I like this case philosophically because it is a both | and.  The policy 
is contained in the wrapper generator using enumerate() and len() 
leaving the fName() generator to produce the line.

And you are right about another thing,  I just want to use this thing 
over and over.

for line in getnumline(filename):
     {whatever}

    There does seem to be just one way of doing this (file reads) but 
there are actually many ways of doing this. Is a file object really 
better than a generator, are there good reasons for using the generator, 
are there absolute cases for using a file object?

marcus



More information about the Python-list mailing list