Lazy "for line in f" ?

Sun Jul 22 19:03:39 EDT 2007

Alexandre Ferrieux wrote:
> On Jul 22, 7:21 pm, Miles <semantic... at gmail.com> wrote:
>> On 7/22/07, Alexandre Ferrieux <alexandre.ferrieux at gmail dot com> wrote:
>>
>>> The Tutorial says about the "for line in f" idiom that it is "space-
>>> efficient".
>>> Short of further explanation, I interpret this as "doesn't read the
>>> whole file before spitting out lines".
>>> In other words, I would say "lazy". Which would be a Good Thing, a
>>> much nicer idiom than the usual while loop calling readline()...
>>> But when I use it on the standard input, be it the tty or a pipe, it
>>> seems to wait for EOF before yielding the first line.
>> It doesn't read the entire file, but it does use internal buffering
>> for performance.  On my system, it waits until it gets about 8K of
>> input before it yields anything.  If you need each line as it's
>> entered at a terminal, you're back to the while/readline (or
>> raw_input) loop.
> 
> How frustrating ! Such a nice syntax for such a crippled semantics...
> 
> Of course, I guess it is trivial to write another iterator doing
> exactly what I want.
> But nonetheless, it is disappointing not to have it with the standard
> file handles.
> And speaking about optimization, I doubt blocking on a full buffer
> gains much.
> For decades, libc's fgets() has been doing it properly (block-
> buffering when data come swiftly, but yielding lines as soon as they
> are complete)... Why is the Python library doing this ?
> 
What makes you think Python doesn't use the platform fgets()? As a 
matter of policy the Python library offers as thin as possbile a shim 
over the C standard library when this is practical - as it is with "for 
line in f:". But in  the case of file.next() (the file method called to 
iterate over the contents) it will actually use getc_unlocked() on 
platforms that offer it, though you can override that configuration 
feature by setting USE_FGETS_IN_GETLINE,

It's probably more to do with the buffering. If whatever is driving the 
file is using buffering itself, then it really doesn't matter what the 
Python library does, it will still have to wait until the sending buffer 
fills before it can get any data at all.

Try running stdin unbuffered (use python -u) and see if that makes any 
difference. It should, in the shell-driven case, for example.

regards
  Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC/Ltd           http://www.holdenweb.com
Skype: holdenweb      http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------