Python 2.2.1 and select()

Thu Mar 27 02:21:59 EDT 2008

On Wed, Mar 26, 2008 at 07:11:15PM -0700, Noah Spurrier wrote:
> >def set_nonblock(fd):
> >	flags = fcntl.fcntl(fd, fcntl.F_GETFL)
> >	fcntl.fcntl(fd, fcntl.F_SETFL, flags | os.O_NONBLOCK)
> >
> >Then in the function, after calling popen:
> >	set_nonblock(io.fromchild.fileno())
> >	set_nonblock(io.childerr.fileno())
> >
> >Yay for smart people.
> 
> You should still try Pexpect :-) As I recall there are also gotchas
> on the non-blocking trick. 

Well, you need to remember to read ALL the file descriptors (objects)
that select() returns, and if you don't, your program will hang and
spin...  It might also be the case that if the child is using stdio
functions for output, you'll need to set the buffering mode explicitly
(which you can theoretically do, see below).  Aside from that, there
are none, and actually the problem with my program had nothing to do
with stdio buffering modes.

> Pexpect is 100% pure Python. No extra libs to install.

I looked at it, and (what I believe is) the very reason it manages to
solve this particular problem is also the reason it won't work for me:
it combines STDOUT and STDERR to one I/O stream.  The reason I'm
bothering with all this is because I need to keep them separate.  

Interestingly, were it not for that fact, I'm not sure that
pexpect wouldn't still suffer from the same problem that plagued my
original implementation.  I had to drag out W. R. Stevens to remind
myself of a few things before I continued with this discussion...
Even though it forces the program to use line buffering, read() would
still try to read until EOF, and if STDOUT and STDERR were separate
files, it seems likely that it would eventually block reading from one
file when the child program was sending its output to the other.  The
only way to prevent that problem, aside from non-blocking I/O, is to
do a read(1) (i.e.  read one character at a time), which will use
silly amounts of CPU time.  But mixing stdio and non-stdio functions
is kind of funky, and I'm not completely sure what the behavior would
be in that case, and couldn't quickly ind anything in Stevens to
suggest one way or the other.

Also, you could combine the streams yourself without using pexpect by
having your subproc use the shell to redirect STDERR to STDOUT, or (if
Python has it) using the dup() family of system calls to combine the
two in Python [i.e. dup2(1,2)].  As far as I can tell, the whole pseudo
terminal thing (though it definitely does have its uses) is a red
herring for this particular problem...  

I also read (some of) the pexpect FAQ, and there are a number of
incorrect statements in it, particularly in the section 'Why not just
use a pipe (popen())?"

 - a pipe, if programmed correctly, is perfectly fine for controlling
   interactive programs, most of the time.  You will almost certainly
   need to use non-blocking I/O, unless your communicating programs
   are perfectly synchronized, or else you'll have I/O deadlocks.  The
   only time a pipe isn't OK is where the program tries to use terminal
   services (e.g. writing to /dev/tty), in which case you will need a
   pseudo-terminal device (as the FAQ correctly points out with regard
   to entering passwords in SSH).  

 - Any application which contains "#include <stdio.h>" does not
   necessarily make use of the stdio library (which isn't really a
   separate library at all, it's part of the standard C library).
   The file stdio.h is just a C header file which contains
   declarations of the prototypes for stdio-related functions, and
   various constants.  It's often included in source files simply
   because it's so common to need it, or to make use of some constants
   defined there.  You're only actually using stdio if you use
   stdio functions in your program, which are:

   printf, fopen, getc, getchar, putc, scanf, gets, puts, etc.

   In particular, open(), read() and write() are *not* stdio
   functions, and do *not* buffer I/O.  They're Unix system calls, and
   the C functions by the same name are simply interfaces to those
   system calls.  There is a kernel I/O buffer associated with all of
   the streams you will use them on, but this is not a stdio buffer.

   I have not checked Python's code, but based on its behavior, I
   assume that its read() function is a wrapper around the Unix read()
   system call, and as such it is not using stdio at all, and thus the
   stdio buffers are not relevant (though if the child is using stdio
   functions, that could be an issue).

 - The FAQ states: "The STDIO lib will use block buffering when
   talking to a block file descriptor such as a pipe."  This is only
   true *by default* and indeed you can change the buffering mode of
   any stdio stream using the setbuf() and setvbuf() stdio functions
   (though I don't know if Python provides a way to do this, but I
   assume it does).  Since the python program is the one opening the
   pipe, it controls the buffering mode, and you have only to change
   it in your program, and the child will honor that.  

   The way it works is because popen() opens the pipe, and then forks
   a child process, which inherits all of the parent's open files.
   Changing the properties on the files can be done in the parent, and
   will be honored in the child, because *it's the same file*. :)

 - It states: "[when writing to a pipe] In this mode the currently
   buffered data is flushed when the buffer is full. This causes most
   interactive programs to deadlock."  That's misleading/false.

   Deadlocks can easily occur in this case, but it's definitely
   avoidable, and it's not *necessarily* because of buffering, per se.
   It's because you're trying to read or write to a file descriptor
   which is not ready to be read from or written to, which can be
   caused by a few things.  STDIO buffers could be the cause, or it
   could simply be that the parent is reading from one file
   descriptor, but the child is writing to a different one.  Or
   (perhaps even more likely) it is trying to read from the parent, so
   both are reading and no one is writing.  Non-blocking I/O allows
   you to recover from all such I/O synchronization problems, though
   you'll still need (your program) to figure out where you should be
   reading and/or writing in order to avoid an infinite loop.  But
   non-blocking I/O may not help with interactive programs that make
   use of the stdio library, unless you also explicitly set the
   buffering mode.

All of the above are explained in much more detail in W. Richard
Stevens' "Advanced Programming in the Unix Environment" than I can
possibly go into here.  See chapter 3 for the stdio stuff, chapter 12
for non-blocking I/O, and chapter 14 for discussions about pipes,
popen(), and how buffering modes can be controlled by your program and
how that affects the child.

Don't get me wrong... pexpect is useful.  But some of the problems
you're trying to solve with it have other solutions, which in some
cases might be more efficiently done in those other ways.

-- 
Derek D. Martin
http://www.pizzashack.org/
GPG Key ID: 0x81CFE75D

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20080327/7b7071bf/attachment-0001.sig>