Flush stdin

Tue Oct 21 19:16:33 EDT 2014

On Mon, Oct 20, 2014 at 9:41 PM, Marko Rauhamaa <marko at pacujo.net> wrote:
> Dan Stromberg <drsalists at gmail.com>:
>
>> Often with TCP protocols, line buffered is preferred to character
>> buffered,
>
> Terminal devices support line buffering on write.

Yes, though that's not the only place it's useful.

> Line buffering on read is an illusion created by higher-level libraries.
> The low-level read function reads in blocks of bytes.

Actually, doesn't line buffering sometimes exist inside an OS kernel?
stty/termios/termio/sgtty relate here, for *ix examples.  Supporting
code: http://stromberg.dnsalias.org/~strombrg/ttype/  It turns on
character-at-a-time I/O in the tty driver via a variety of methods for
portability.  I wrote it in C before I took an interest in Python.

Also, here's some supporting documentation:
http://man7.org/linux/man-pages/man3/stdout.3.html - excerpt:
    Indeed, normally terminal input is line buffered in the kernel.

But even if line buffering (or even character buffering) were never in
the kernel, calling it an illusion is perhaps going a little far.
It's useful sometimes, irrespective of where it comes from.
"Illusion" has a bit of an undeserved pejorative connotation.

>> Also, it's a straightforward way of framing your data, to avoid
>> getting messed up by Nagle or fragmentation.
>
> Nagle affects the communication between the peer OS kernels and isn't
> directly related to anything the application does.

Actually, Nagle can cause two or more small packets to be merged,
which is something an application must be able to deal with, because
they could show up in the receiving application as one or more (but
anyway: fewer) merged recv()'s.  That's one reason why something like
http://stromberg.dnsalias.org/~strombrg/bufsock.html can be helpful.

> Also, Nagle doesn't
> play any role with pipes.

Yes, but pipes aren't the only thing involved in the OP's question.
You "simplified" the problem down to pipes, but that doesn't really
capture the complete essence of the matter.  Nagle is one of the
reasons.

>>> ========================================================================
>>> $ bash ./test.sh | strace python3 ./test.py
>>> ...
>>> read(0, "x", 4096)                      = 1
>>> read(0, "x", 4096)                      = 1
>>> read(0, "x", 4096)                      = 1
>>> read(0, "x", 4096)                      = 1
>>> read(0, "x", 4096)                      = 1
>>> fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), ...}) = 0
>>> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
>>> 0) = 0x7f3143bab000
>>> write(1, "120\n", 4120
>>> )                    = 4
>>> ...
>> ========================================================================
>>
>> This is tremendously inefficient.  It demands a context switch for
>> every character.
>
> Inefficiency isn't an issue when you generate one byte a second.

Of course, but who's doing one byte per second?  You and I in our
tests, and perhaps some application developers with remarkably
undemanding I/O.  That doesn't really mean we should _recommend_ a
series of os.read(0, 1)'s.

> If data
> were generated at a brisker pace, "read(0, ..., 4096)" could get more
> bytes at a time. Notice that even if the Python code requests 5 bytes,
> CPython requests up to 4096 bytes in a single read.

Not if you use os.read(0, 1), for example, which was what you appeared
to be recommending.  os.read(0, 1) (when on a pipe) makes a call into
kernel space via a context switch, once for each os.read(0, 1).

I guess I should add that when you do an os.read(0, 1), and see it
show up in strace, strace is showing kernel<->userspace interactions,
not library stuff, and not stuff in an application that sits above
libraries.  ltrace shows some of the library stuff, but probably not
all of it - I haven't studied ltrace as much as I have strace.

Just wondering: Are we helping the OP?