Help with pipes, buffering and pseudoterminals

Sun Apr 5 20:11:09 EDT 2015

On 05Apr2015 12:20, Daniel Ellis <ellisd23 at gmail.com> wrote:
>I have a small little tool I'd like to make.  It essentially takes piped input, modifies the text in some way, and immediately prints the output.  The problem I'm having is that any output I pipe to the program seems to be buffered, removing the desired effect.

That depends on the upstream program; it does the buffering. The pipe itself 
presents received data downstream immediately.

However, as you've seen, almost every program buffers its standard output if 
the output is not a tty; this is automatic in the stdio C library and results 
in more fficient use of I/O.

>From what I understand, I need to somehow have the input be retrieved via a pseudoterminal.

This is a rather gross hack, though sometimes all you can do. While some 
programs have an option to force unbuffered output, most do not. Attaching 
their output to a pty is one way to encourage them to at least line buffer 
their output.

However, you should bear in mind that the reason that programs line buffer to a 
terminal is that they presume they are in an interactive situation with a 
person watching. The program _may_ act differently in other ways as well, such 
as asking question it might not otherwise ask in "batch" mode (where it might 
cautiously not ask and presume "no").

Also, output sent through a pty is subject to the line discipline in the 
terminal; temrinals are funny things with much historical behaviour. At the 
least you pobably want your pty in "raw" mode to avoid all sorts of stuff that 
can be done to your data.

>The problem that I'm having is that most examples on the internet seem to assume I would like to launch a program in a forked pty process, which doesn't really fit my use case.

Indeed not, but not to worry. You don't need to fork.

>I've tried a number of things, but I seem to be unable to get even a basic understanding of how to use the pty module.

Have you every used a pty from C? Do you know how ptys work? (master side, 
slave side, etc).

>Here's a piece of code I whipped up just to try to get a feel for what is going on when I use pty.fork, but it doesn't seem to do what I think it should:
>
>    import pty
>    import os
>    import sys
>
>    pid, fd = pty.fork()
>    print pid, fd
>    sys.stdout.flush()
>    os.read(fd, 1024)
>
>This only seems to print from the parent process.

The documentation for pty.fork says:

  Return value is (pid, fd). Note that the child gets pid 0, and the fd is invalid.

So the child cannot used "fd". It further says that the child has its stdin and 
stdout attached to the pty, and that the pty is the child's controlling 
terminal (this means it is affected by things like "typing" ^C at the pty, 
etc).

>I read that I need to do the os.read call for the fork to happen.  I've also 
tried printing *after* the os.read call.

Don't try to adapt fork-based tutorials to your needs. Understand ptys directly 
first.

>I realize this does very little to solve my overall goal, but I figure understanding what is going on is probably a worthwhile first step.

What you probably want to use is pty.openpty() instead. No fork. You will get 
back file descriptors for the master and slave sides of the pty. Then you can 
use these with the subprocess module to connect your input program. Or, 
guessing from your opening sentence, you can write a wrapper script whose whole 
purpose is to run a program on a pty.

Regarding terminology: a pseudoterminal (pty) is a device that looks like a 
traditional serial terminal. All terminal emulators like xterm use one, and so 
do other programs presenting a terminal session such as the sshd process 
handling an interactive remote login.

When you call pty.openpty() you are handed two file descriptors: one for the 
master side of the pty and one for the slave side. The slave side is the side 
that looks like a terminal, and is what a typical use would connect a child 
process to. The master side is the other side of the pty. When a program writes 
to the "slave" side, the output is available for read on the master side, much 
like a pipe. When a program writes to the master side, the output is available 
for read on the slave side, _as_ _if_ _typed_ at the terminal.

A pty is not necessarily going to solve your problem unless you can get your 
input via the pty. From the sounds of it you're in this situation:

  command-generating-output | your-program

such that your input is attached to a pipe, and because 
"command-generating-output" is attached to a pipe it is block buffering its 
output, hence your problem.

You can't undo that situation after the fact.

To solve your problem via a pty you need to contrive to set up 
"command-generating-output" already attached to a pty. One way to do that is 
for "your-program" to open a pty and itself invoke "command-generating-output" 
with its output via the pty, which is why so many tutorials suppose a "fork" 
situation.

One typical away to do that is to pass the "command-generating-output" command 
name and args to your program, eg:

  your-program command-generating-output [args...]

Then your main program can gather that up:

  import sys
  command_generating_output = sys.argv[1:]

Then you can call pty.openpty(), and then use the slave file descriptor with 
subprocess.Popen to invoke command_generating_output. Thus the generating 
command will be talking to you via a pty instead of a pipe.

Cheers,
Cameron Simpson <cs at zip.com.au>

It is interesting to think of the great blaze of heaven that we winnow
down to animal shapes and kitchen tools.        - Don DeLillo