Do subprocess.PIPE and subprocess.STDOUT sametime

Tue May 9 19:17:13 EDT 2023

On 5/9/23, Thomas Passin <list1 at tompassin.net> wrote:
>
> I'm not sure if this exactly fits your situation, but if you use
> subprocess with pipes, you can often get a deadlock because the stdout
> (or stderr, I suppose) pipe has a small capacity and fills up quickly
> (at least on Windows),

The pipe size is relatively small on Windows only because
subprocess.Popen uses the default pipe size when it calls WinAPI
CreatePipe(). The default size is 4 KiB, which actually should be big
enough for most cases. If some other pipe size is passed, the value is
"advisory", meaning that it has to be within the allowed range (but
there's no practical limit on the size) and that it gets rounded up to
an allocation boundary (e.g. a multiple of the system's virtual-memory
page size). For example, here's a 256 MiB pipe:

    >>> hr, hw = _winapi.CreatePipe(None, 256*1024*1024)
    >>> _winapi.WriteFile(hw, b'a' * (256*1024*1024))
    (268435456, 0)
    >>> data = _winapi.ReadFile(hr, 256*1024*1024)[0]
    >>> len(data) == 256*1024*1024
    True

> then it blocks until it is emptied by a read.
> But if you aren't polling, you don't know there is something to read so
> the pipe never gets emptied.  And if you don't read it before the pipe
> has filled up, you may lose data.

If there's just one pipe, then there's no potential for deadlock, and
no potential to lose data. If there's a timeout, however, then
communicate() still has to use I/O polling or a thread to avoid
blocking indefinitely in order to honor the timeout.

Note that there's a bug in subprocess on Windows. Popen._communicate()
should create a new thread for each pipe. However, it actually calls
stdin.write() on the current thread, which could block and ignore the
specified timeout. For example, in the following case the timeout of 5
seconds is ignored:

    >>> cmd = 'python -c "import time; time.sleep(20)"'
    >>> t0 = time.time(); p = subprocess.Popen(cmd, stdin=subprocess.PIPE)
    >>> r = p.communicate(b'a'*4097, timeout=5); t1 = time.time() - t0
    >>> t1
    20.2162926197052

There's a potential for deadlock when two or more pipes are accessed
synchronously by two threads (e.g. one thread in each process). For
example, reading from one of the pipes blocks one of the threads
because the pipe is empty, while at the same time writing to the other
pipe blocks the other thread because the pipe is full. However, there
will be no deadlock if at least one of the threads always polls the
pipes to ensure that they're ready (i.e. data is available to be read,
or at least PIPE_BUF bytes can be written without blocking), which is
how communicate() is implemented on POSIX. Alternatively, one of the
processes can use a separate thread for each pipe, which is how
communicate() is implemented on Windows.

Note that there are problems with the naive implementation of the
reader threads on Windows, in particular if a pipe handle leaks to
descendants of the child process, which prevents the pipe from
closing. A better implementation on Windows would use named pipes
opened in asynchronous mode on the parent side and synchronous mode on
the child side. Just implement a loop that handles I/O completion
using events, APCs, or an I/O completion port.