standard input, for s in f, and buffering

Mon Mar 31 18:47:26 EDT 2008

On 31 Mar 2008 06:54:29 GMT, Marc 'BlackJack' Rintsch <bj_666 at gmx.net> wrote:
> On Sun, 30 Mar 2008 21:02:44 +0000, Jorgen Grahn wrote:
>
>> I realize this has to do with the extra read-ahead buffering documented for
>> file.next() and that I can work around it by using file.readline()
>> instead.
>> 
>> The problem is, "for s in f" is the elegant way of reading files line
>> by line. With readline(), I need a much uglier loop.  I cannot find a
>> better one than this:
>> 
>>     while 1:
>>         s = sys.stdin.readline()
>>         if not s: break
>>         print '####', s ,
>> 
>> And also, "for s in f" works on any iterator f -- so I have to choose
>> between two evils: an ugly, non-idiomatic and limiting loop, or one
>> which works well until it is used interactively.
>> 
>> Is there a way around this?  Or are the savings in execution time or
>> I/O so large that everyone is willing to tolerate this bug?
>
> You can use ``for line in lines:`` and pass ``iter(sys.stdin.readline,'')``
> as iterable for `lines`.

Thanks.  I wasn't aware that building an iterator was that easy. The
tiny example program then becomes

#!/usr/bin/env python
import sys

f = iter(sys.stdin.readline, '')
for s in f:
    print '####', s ,

It is still not the elegant interface I'd prefer, though. Maybe I do
prefer handling file-like objects to handling iterators, after all.

By the way, I timed the three solutions given so far using 5 million
lines of standard input.  It went like this:

  for s in file     :  1
  iter(readline, ''):  1.30  (i.e. 30% worse than for s in file)
  while 1           :  1.45  (i.e. 45% worse than for s in file)
  Perl while(<>)    :  0.65

I suspect most of the slowdown comes from the interpreter having to
execute more user code, not from lack of extra heavy input buffering.

/Jorgen

-- 
  // Jorgen Grahn <grahn@        Ph'nglui mglw'nafh Cthulhu
\X/     snipabacken.se>          R'lyeh wgah'nagl fhtagn!