while true: !!!

Alex Martelli aleaxit at yahoo.com
Mon Dec 11 08:59:51 EST 2000


"Jaroslav Gergic" <j_gergic at yahoo.com> wrote in message
news:3A34CEA7.CD9D85AE at yahoo.com...
    [snip]
> >     file = open('file.input')
> >
> >     count = 0
> >     while 1:
> >         lines = file.readlines(8192)
> >         if not lines: break
> >         count = count + len(lines)

Pretty simple, classic, and fast.


> I hate it so much I prefer to write something like this:
>
> line = fh.readline()
> while(line != ""):
>   ... do something ...
>   line = fh.readline()

_Substantially_ slower: the first form you quote reads a
bunch of up to 8k bytes at a time, which will typically
speed it up importantly, while here you only read one
line at a time, making the runtime work quite a bit more.


> OK, it is ugly to write the same line of code twice,

Yes, particularly when such extra effort results in
a slowdown of one's program.

> but 'while true' or 'while 1' is very non-programmer construction
> it smells like VisualBasic GOTO.

"non-programmer"?  Ever heard of one Don Knuth, he of "The Art
of Computer Programming", TeX, etc?  If the 'non-programmers' I
happened to work with were as 'bad' as Dr. Knuth, I think I'd
be pretty happy.  Maybe I'm content with too little, and your
programming flair puts Dr. Knuth's to shame...?

Anyway, in his landmark 1974 article on the Communications of
the ACM, "Structured Programming WITH Go-To Statements", Dr.
Knuth made the point that the generally desired form of a loop
statement is (in pseudo-language):

    loop
        statements for part 1
        exit when condition
        statements for part 2
    repeat

which at the time no language supported directly ('while' and
'until' being just special-cases of this).  I guess he did not
know C, which already existed (not quite in the modern form)
and supports such forms as:

    for(;;) {
        statements for part 1;
        if(condition) break;
        statements for part 2;
    }

or, in exact and total equivalence:

    while(1) {
        statements for part 1;
        if(condition) break;
        statements for part 2;
    }

both of which are exactly implementations of the Knuth normal
loop-form.


Among the canonical (aka trivial) transformations of this
general loop-form is one introducing a flag-variable (*NOT*
the best solution, by any means, for reasons Knuth explains
lucidly in his article):

    flag = true
    while(flag) {
        statements for part 1;
        if(condition) {
            flag=false;
        } else {
            statements for part 2;
        }
    }

I'm not sure why one *would* want to use an inferior, slower,
less-clear, worse-coupled trivially-transformed pattern, but,
hey, whatever floats your boat.  So apply the canonical
transform to the beautiful code you hate, starting from:

    count = 0
    while 1:
        lines = file.readlines(8192)
        if not lines: break
        count = count + len(lines)

and produce the semantically-equivalent:

    count = 0
    flag = 1
    while flag:
        lines = file.readlines(8192)
        if lines:
            count = count + len(lines)
        else:
            flag = 0

At least, while goofier and slower, it will still be faster
than the one-line-at-a-time variant!  (It's also easily sped
up a bit by using a number larger than 8192, on most boxes --
try it with 32768, for example).


> Do you know better construction to avoid both - duplicate lines
> and horrible 'while 1' construction?

No, I don't know any BETTER construction than the while 1
loop; that is because there is nothing horrible about having
a loop in the general (Knuth) form, and "while 1" is the way
such a loop is written in Python (in C you get the choice
between 'while(1)' and the identically equivalent 'for(;;)',
but that's just syntax-sugar-level difference).

I do know many WORSE constructions, coming from canonical or
not-so-canonical transformations, such as the one I just gave.


> Something like:
>
> while( "" != (line = fh.readline()) ):
>   ... do something ...

This is a slightly different issue, where you want to BOTH
set something ('line') AND return a value to be tested; this
is idiomatic in C (where assignment is an expression) but not
in Python (where it isn't).

If you want to write set-something-AND-return-it in Python,
you can do it in several ways, of which the most Pythonic is
probably to use a class, e.g.:

class SetAndReturn:
    def setAs(self, data):
        self.data = data
        return data

line = SetAndReturn()
while line.setAs(fh.readline()):
    ... do stuff with line.data ...

This is still a 1-line-at-a-time (slow!) idiom, but making
it faster is trivial:

lines = SetAndReturn()
while lines.setAs(fh.readlines(32768)):
    ... do stuff with lines.data ...

you get a bunch of lines at a time, just like in the while 1
idiom you detest so much, so the speed is substantially OK
(the slight overhead of working with instance-objects and
attributes is not as bad as line-at-a-time file input).


This IS concise, though many Pythonistas (typically without a
C background) would probably find it obscure when compared
to the 'while 1' approach.


Alex






More information about the Python-list mailing list