Lazy "for line in f" ?

Mon Jul 23 02:56:18 EDT 2007

On Jul 23, 1:03 am, Steve Holden <st... at holdenweb.com> wrote:
>
> What makes you think Python doesn't use the platform fgets()?

The fact that it does that extra layer of buffering. Stdio is already
buffered, duplicating this is useless.

> ... in  the case of file.next() (the file method called to
> iterate over the contents) it will actually use getc_unlocked() on
> platforms that offer it, though you can override that configuration
> feature by setting USE_FGETS_IN_GETLINE

Does nothing. And anyway, stdio's getc() does not stubbornly block on
8k either.
So switching from getc to gets seems orthogonal to the problem.

> It's probably more to do with the buffering. If whatever is driving the
> file is using buffering itself, then it really doesn't matter what the
> Python library does, it will still have to wait until the sending buffer
> fills before it can get any data at all.

Nonsense. In all three cases of pipe, socket, terminal, I control the
writer and make sure that it writes in unbuffered manner. To convince
you, here is an strace of the Python process while I type random lines
like "fdsfdsfds":

    read(0, "sdfsdf\n", 8192)               = 7
    read(0, "sdfds\n", 7168)                = 6

which proves that the Python process actually gets the lines one by
one, but buffers them internally... for much too long. Sigh.

> Try running stdin unbuffered (use python -u) and see if that makes any
> difference. It should, in the shell-driven case, for example.

No effect. As a matter of fact, -u is documented as affecting only
output (stdout and stderr).

So I'll reiterate the question: *why* does the Python library add that
extra layer of (hard-headed) buffering on top of stdio's ?

-Alex