[Tutor] subprocess.Popen / proc.communicate issue

Cameron Simpson cs at zip.com.au
Thu Mar 30 18:43:06 EDT 2017


On 30Mar2017 13:51, bruce <badouglas at gmail.com> wrote:
>Trying to understand the "correct" way to run a sys command ("curl")
>and to get the potential stderr. Checking Stackoverflow (SO), implies
>that I should be able to use a raw/text cmd, with "shell=true".

I strongly recommend avoiding shell=True if you can. It has many problems. All 
stackoverflow advice needs to be considered with caution. However, that is not 
the source of your deadlock.

>If I leave the stderr out, and just use
>     s=proc.communicate()
>the test works...
>
>Any pointers on what I might inspect to figure out why this hangs on
>the proc.communicate process/line??

When it is hung, run "lsof" on the processes from another terminal i.e. lsof 
the python process and also lsof the curl process. That will make clear the 
connections between them, particularly which file descriptors ("fd"s) are 
associated with what.

The run "strace" on the processes. That shoud show you what system calls are in 
progress in each process.

My expectation is that you will see Python reading from one file descriptor and 
curl writing to a different one, and neither progressing.

Personally I avoid .communicate and do more work myself, largerly to know 
precisely what is going on with my subprocesses.

The difficulty with .communicate is that Python must read both stderr and 
stdout separately, but it will be doing that sequentially: read one, then read 
the other. That is just great if the command is "short" and writes a small 
enough amount of data to each. The command runs, writes, and exits. Python 
reads one and sees EOF after the data, because the command has exited. Then 
Python reads the other and collects the data and sees EOF because the command 
has exited.

However, if the output of the command is large on whatever stream Python reads 
_second_, the command will stall writing to that stream. This is because Python 
is not reading the data, and therefore the buffers fill (stdio in curl plus the 
buffer in the pipe). So the command ("curl") stalls waiting for data to be 
consumed from the buffers. And because it has stalled, the command does not 
exit, and therefore Python does not see EOF on the _first_ stream. So it sits 
waiting for more data, never reading from the second stream.

[...snip...]
>  cmd='[r" curl -sS '
>  #cmd=cmd+'-A  "Mozilla/5.0 (X11; Linux x86_64; rv:38.0)
>Gecko/20100101 Firefox/38.0"'
>  cmd=cmd+"-A  '"+user_agent+"'"
>  ##cmd=cmd+'   --cookie-jar '+cname+' --cookie '+cname+'    '
>  cmd=cmd+'   --cookie-jar '+ff+' --cookie '+ff+'    '
>  #cmd=cmd+'-e "'+referer+'"   -d "'+tt+'"  '
>  #cmd=cmd+'-e "'+referer+'"    '
>  cmd=cmd+"-L '"+url1+"'"+'"]'
>  #cmd=cmd+'-L "'+xx+'" '

Might I recommand something like this:

  cmd_args = [ 'curl', '-sS' ]
  cmd_args.extend( [ '-A', user_agent ] )
  cmd_args.extend( [ '--cookie-jar', ff, '--cookie', ff ] )
  cmd_args.extend( [ '-L', url ]

and using shell=False. This totally avoids any need to "quote" strings in the 
command, because the shell is not parsing the string - you're invoking "curl" 
directly instead of asking the shell to read a string and invoke "curl" for 
you.

Constructing shell commands is tedious and fiddly; avoid it when you don't need 
to.

>  try_=1

It is preferable to say:

  try_ = true

>  while(try_):

You don't need and brackets here:

  while try_:

More readable, because less punctuation.

>    proc=subprocess.Popen(cmd,
>shell=True,stdout=subprocess.PIPE,stderr=subprocess.PIPE)

  proc = subprocess.Popen(cmd_args,
           stdout=subprocess.PIPE,
           stderr=subprocess.PIPE)

>    s,err=proc.communicate()
>    s=s.strip()
>    err=err.strip()
>    if(err==0):
>      try_=''

It is preferable to say:

  try_ = False

Also, you should be looking at proc.returncode, _not_ err. Many programs write 
informative messages to stderr, and a nonempty stderr does not imply failure.

instead, all programs set their exit status to 0 for success and to various 
nonzero values for failure. So check:

  if proc.returncode == 0:
    try_ = False

Or you could bypass try_ altogether and go:

  while True:
    ... subprocess ...
    if proc.returncode == 0:
      break

That may not fit your larger scheme.

Cheers,
Cameron Simpson <cs at zip.com.au>


More information about the Tutor mailing list