standard input, for s in f, and buffering

Sun Mar 30 17:02:44 EDT 2008

One thing that has annoyed me for quite some time.  I apologize if it
has been discussed recently. If I run this program on Unix (Python
2.4.4, on Debian Linux)

    import sys
    for s in sys.stdin:
        print '####', s ,

and type the input on the keyboard rather than piping a file into it,
two annoying things happen:

- I don't see any output until I have entered a lot of input
  (approximately 8k). I expect pure Unix filters like this to process
  a line immediately -- that is what cat, grep and other utilities do,
  and also what Perl's while(<>) { ... } construct does.

- I have to type the EOF character *twice* to stop the program. This
  is also highly unusual.

If I saw this behavior in a program, as a long-time Unix user, I'd
call it a bug.

I realize this has to do with the extra read-ahead buffering documented for
file.next() and that I can work around it by using file.readline()
instead.

The problem is, "for s in f" is the elegant way of reading files line
by line. With readline(), I need a much uglier loop.  I cannot find a
better one than this:

    while 1:
        s = sys.stdin.readline()
        if not s: break
        print '####', s ,

And also, "for s in f" works on any iterator f -- so I have to choose
between two evils: an ugly, non-idiomatic and limiting loop, or one
which works well until it is used interactively.

Is there a way around this?  Or are the savings in execution time or
I/O so large that everyone is willing to tolerate this bug?

BR,
/Jorgen

-- 
  // Jorgen Grahn <grahn@        Ph'nglui mglw'nafh Cthulhu
\X/     snipabacken.se>          R'lyeh wgah'nagl fhtagn!