iterating over lines in a file

Alex Martelli alex at magenta.com
Fri Jul 21 18:25:21 EDT 2000


"Roger Upole" <rupole at compaq.net> wrote in message
news:vnLd5.27741$r9.839310 at news.easynews.com...
> I must take exception to your insistence that
>    "Everything should be done ONCE, and only ONCE.  In ONE
>    place in the code."

Hey, if you must, you must.  Maybe I should have said "must
be done" rather than "should be done", too:-).

> Besides sounding didactic,

Uh, is this supposed to be bad?  My prose's main defect is
generally that of being long and convoluted.  If for once
I've managed to be short and pithy, it was no doubt because
I was subconsciously quoting some Great Master.  Or maybe
channeling Him.  (Kent Beck...?  Ward Cunningham...?)

> it is also completely false.  In many cases,
> an initial read is necessary to determine how (or even if) the rest of a
> file will be processed.  Also, different code may need to be called if
> the file is empty.

Yes, such special cases do indeed come up (when one just
cannot control the specs/fileformats).  In this case, the
general structure tends to be:
    read the first piece if any
    if appropriate, do the rest

The "first piece" (a line, if you're lucky; but, often,
a bunch of them -- e.g., all lines up to the first empty
one, included) gets read *and processed*, *once*.  Then,
if appropriate, after the first piece has set up stuff,
"the rest" gets read and processed.

In other words, the read-the-first-piece part does not
leave you with an already-read beginning-of-the-rest
ready to be processed before further reading, in all
such cases.  When it does (the read-item that tells you
the first-piece is finished is not a separator/terminator
token, but is already the beginning of the-rest), then
you do have an input structure that may lend itself to
your favourite idiom (although in some cases, pushing
the inappropriate item back to be pseudo-reread next is
also a very useful idiom).

But then, the semantic role of the reading in the two
places differs.  If you pre-test, transform, &c, the
line just read, you will probably do it in different
ways for the prologue and the main-body.  E.g.,
consider:

    line = f.readline()
    while we_are_in_headers(line):
        process_as_headers(line)
        line = f.readline()
    while we_are_in_main(line):
        process_as_main(line)
        line = f.readline()

i.e., we're now reading-next-line in *three* places.
While the different things we're doing are *two*.
Our code's structure does not ideally reflect its
internal logic; once again, the initial readline
stands out as an artificial construct.

We can restore the balance, and regain simplicity,
with a little bit of abstraction.  Maybe a bit too
extreme, but sort of nice, for example:

    headers = Filter(f, we_are_in_headers, process_as_headers)
    while headers.more():
        headers.next()

    main = Filter(f, we_are_in_main, process_as_main)
    while main.more():
        main.next()

Of course, there's no need for the while loops and
the more and next methods of the Filter class, which
in fact could perfectly well be a function and just
do everything itself.

So, our main code can become:
    process(f, we_are_in_headers, process_as_headers)
    process(f, we_are_in_main, process_as_main)
utterly simple; and the function process loops just
once, readline's in just one place, and has all the
simplicity one could wish...:

def process(file, testfun, procfun):
    while 1:
        line=file.readline()
        if !testfun(line):
            return
        procfun(line)

...well, all the simplicity except one little detail:
we _again_ have the "while 1:" idiom!-)


> Additionally, this overly simplistic programming style would be completely
> inadequate for most involved programming tasks.

The most involved tasks have the most ferocious need
for an extremistic pursuit of simplicity.  The
unpleasant issue of the "extra initial readline"
should, IMHO, rankle even more if the actual task
looms pretty complicated, because we surely _don't_
need extra complication when the tasks itself supplies
a lot.  Python has just about the right degree of
_sophistication_ to let me refactor things until they
are satisfactorily _simple_...

...and among simplicity's cornerstones, "code each
[different] thing ONCE", i.e., "in ONE place", stands
pretty high indeed.


Alex






More information about the Python-list mailing list