subprocess and stdin.write(), stdout.read()

Tue Mar 24 15:52:18 EDT 2015

On Tue, Mar 24, 2015 at 12:08 PM, Tobiah <toby at tobiah.org> wrote:
> The docs for the subprocess.Popen() say:
>
>         Use communicate() rather than .stdin.write, .stdout.read
>         or .stderr.read to avoid deadlocks due to any of the other
>         OS pipe buffers filling up and blocking the child process
>
> But if I want to send a string to stdin, how can I do that without
> stdin.write()?
>
> This seems to work:
>
> import subprocess as s
>
> thing = """
>         hey
>         there
>         foo man is here
>         hey foo
>         man is there
>         so foo
> """
> p = s.Popen(['grep', 'foo'], stdin = s.PIPE, stdout = s.PIPE)
> p.stdin.write(thing)
> print p.communicate()
>
> ######################
>
> ('\they foo\n \tfoo there\n', None)
>
>
> Will this always avoid the deadlock problem?

What you should do is use "print p.communicate(thing)". That will
always avoid the deadlock issue.

Your code MAY deadlock in some cases as the stdin pipe could fill up
fully, but the other process is not reading it as it is waiting for
you to read output. What this means is that, you must be reading from
stdout AND stderr if you are possibly waiting for the process (such as
when writing to stdin or using .wait() or looping on .poll()).

subprocess.communicate() takes care of that issue internally, however
you can write your own variations (useful if you need to process
stdout to produce stdin, for example), however you must either be
using a select or threads to be sure to be reading stdout and stderr.
You should also pay attention to the note on communicate - if
potentially large amounts of data will be produced, you may need to
write your own method to avoid memory paging/OOM issues due to
communicate filling up the system's RAM.

In the example you provide, you will probably never hit the deadlock
as the data being written is small enough that it should never fill
the buffers (typically, they are ~2k). Additionally, if you know the
process never produces output on stdout or stderr, you can ignore them
(but then, why would you pipe them?).

>
> This also works:
>
> p = s.Popen(['grep', 'foo'], stdin = s.PIPE, stdout = s.PIPE)
> p.stdin.write(thing)
> p.stdin.close()
> print p.stdout.read()
>
> Is that vulnerable to deadlock?  Is there a better way
> to write to and read from the same process?

This is more likely to cause deadlocks as, if the process writes too
much to stderr, it may stall waiting for you to read it, while you are
waiting for it to close stdout.