standard input, for s in f, and buffering

Tue Apr 1 01:27:39 EDT 2008

On Mar 31, 11:47 pm, Jorgen Grahn <grahn+n... at snipabacken.se> wrote:
> On 31 Mar 2008 06:54:29 GMT, Marc 'BlackJack' Rintsch <bj_... at gmx.net> wrote:
>
>
>
> > On Sun, 30 Mar 2008 21:02:44 +0000, Jorgen Grahn wrote:
>
> >> I realize this has to do with the extra read-ahead buffering documented for
> >> file.next() and that I can work around it by using file.readline()
> >> instead.
>
> >> The problem is, "for s in f" is the elegant way of reading files line
> >> by line. With readline(), I need a much uglier loop.  I cannot find a
> >> better one than this:
>
> >>     while 1:
> >>         s = sys.stdin.readline()
> >>         if not s: break
> >>         print '####', s ,
>
> >> And also, "for s in f" works on any iterator f -- so I have to choose
> >> between two evils: an ugly, non-idiomatic and limiting loop, or one
> >> which works well until it is used interactively.
>
> >> Is there a way around this?  Or are the savings in execution time or
> >> I/O so large that everyone is willing to tolerate this bug?
>
> > You can use ``for line in lines:`` and pass ``iter(sys.stdin.readline,'')``
> > as iterable for `lines`.
>
> Thanks.  I wasn't aware that building an iterator was that easy. The
> tiny example program then becomes
>
> #!/usr/bin/env python
> import sys
>
> f = iter(sys.stdin.readline, '')
> for s in f:
>     print '####', s ,
>
> It is still not the elegant interface I'd prefer, though. Maybe I do
> prefer handling file-like objects to handling iterators, after all.
>
> By the way, I timed the three solutions given so far using 5 million
> lines of standard input.  It went like this:
>
>   for s in file     :  1
>   iter(readline, ''):  1.30  (i.e. 30% worse than for s in file)
>   while 1           :  1.45  (i.e. 45% worse than for s in file)
>   Perl while(<>)    :  0.65
>
> I suspect most of the slowdown comes from the interpreter having to
> execute more user code, not from lack of extra heavy input buffering.
>
> /Jorgen
>
> --
>   // Jorgen Grahn <grahn@        Ph'nglui mglw'nafh Cthulhu
> \X/     snipabacken.se>          R'lyeh wgah'nagl fhtagn!

Hi Juergen,
>From the python manpage:
       -u     Force  stdin,  stdout  and stderr to be totally
unbuffered.
              On systems where it matters, also put stdin, stdout and
              stderr in binary mode.  Note that there is internal
              buffering in xreadlines(), readlines() and file-object
              iterators ("for line in sys.stdin") which is not
influenced
              by this option.  To work around this, you will want to
use
              "sys.stdin.readline()" inside a "while 1:" loop.
Maybe try adding the python -u option?

Buffering is supposed to help when processing large amounts of I/O,
but gives the 'many lines in before any output' that you saw
originally. If the program is to be mainly used to handle millions of
lines from a pipe or file, then why not leave the buffering in?
If you need both interactive and batch friendly I/O modes you might
need to add the ability to switch between two modes for your program.

- Paddy.