read stdout/stderr without blocking

Thu Sep 15 12:19:03 EDT 2005

In article <ebGdnZ2dnZ1wQWfKnZ2dnTtdu96dnZ2dRVn-zZ2dnZ0 at powergate.ca>,
 Peter Hansen <peter at engcorp.com> wrote:

> Jacek Pop³awski wrote:
> > Grant Edwards wrote:
> > 
> >> On 2005-09-12, Jacek Pop?awski <jpopl at interia.pl> wrote:
> >>
> >>>>        ready = select.select(tocheck, [], [], 0.25) ##continues 
> >>>> after 0.25s
> >>>>        for file in ready[0]:
> >>>>            try:
> >>>>                text = os.read(file, 1024)
> >>>
> >>>
> >>> How do you know here, that you should read 1024 characters?
> >>> What will happen when output is shorter?
> >>
> >> It will return however much data is available.
> > 
> > My tests showed, that it will block.
> 
> Not if you use non-blocking sockets, as I believe you are expected to 
> when using select().

On the contrary, you need non-blocking sockets only if
you don't use select.  select waits until a read [write]
would not block - it's like "if dict.has_key(x):" instead of
"try:  val = dict[x] ; except KeyError:".  I suppose you
knew that, but have read some obscure line of reasoning
that makes non-blocking out to be necessary anyway.
Who knows, but it certainly isn't in this case.

I don't recall the beginning of this thread, so I'm not sure
if this is the usual wretched exercise of trying to make this
work on both UNIX and Windows, but there are strong signs
of the usual confusion over os.read (a.k.a. posix.read), and
file object read.  Let's hopefully forget about Windows for
the moment.

The above program looks fine to me, but it will not work
reliably if file object read() is substituted for os.read().
In this case, C library buffering will read more than 1024
bytes if it can, and then that data will not be visible to
select(), so there's no guarantee it will return in a timely
manner even though the next read() would return right
away.   Reading one byte at a time won't resolve this problem,
obviously it will only make it worse.   The only reason to
read one byte at a time is for data-terminated read semantics,
specifically readline(), in an unbuffered file.  That's what
happens -- at the system call level, where it's expensive --
when you turn off stdio buffering and then call readline().

In the C vs. Python example,  read() is os.read(), and file
object read() is fread();  so of course, C read() works
where file object read() doesn't.

Use select, and os.read (and UNIX) and you can avoid blocking
on a pipe.  That's essential if as I am reading it there are supposed
to be two separate pipes from the same process, since if one is
allowed to fill up, that process will block, causing a deadlock if
the reading process blocks on the other pipe.

Hope I'm not missing anything here. I just follow this group
to answer this question over and over, so after a while it
gets sort of automatic.

   Donn Cave, donn at u.washington.edu