[Tutor] subprocess.Popen / proc.communicate issue

Thu Mar 30 18:51:44 EDT 2017

I wrote a long description of how .communicate can deadlock.

Then I read the doco more carefully and saw this:

  Warning: Use communicate() rather than .stdin.write, .stdout.read
  or .stderr.read to avoid deadlocks due to any of the other OS
  pipe buffers filling up and blocking the child process.

This suggests that .communicate uses Threads to send and to gather data
independently, and that therefore the deadlock situation may not arise.

See what lsof and strace tell you; all my other advice stands regardless, and
the deadlock description may or may not be relevant. Still worth reading and
understanding it when looking at this kind of problem.

Cheers,
Cameron Simpson <cs at zip.com.au>

On 31Mar2017 09:43, Cameron Simpson <cs at zip.com.au> wrote:
>On 30Mar2017 13:51, bruce <badouglas at gmail.com> wrote:
>>Trying to understand the "correct" way to run a sys command ("curl")
>>and to get the potential stderr. Checking Stackoverflow (SO), implies
>>that I should be able to use a raw/text cmd, with "shell=true".
>
>I strongly recommend avoiding shell=True if you can. It has many 
>problems. All stackoverflow advice needs to be considered with 
>caution. However, that is not the source of your deadlock.
>
>>If I leave the stderr out, and just use
>>    s=proc.communicate()
>>the test works...
>>
>>Any pointers on what I might inspect to figure out why this hangs on
>>the proc.communicate process/line??
>
>When it is hung, run "lsof" on the processes from another terminal 
>i.e. lsof the python process and also lsof the curl process. That will 
>make clear the connections between them, particularly which file 
>descriptors ("fd"s) are associated with what.
>
>The run "strace" on the processes. That shoud show you what system 
>calls are in progress in each process.
>
>My expectation is that you will see Python reading from one file 
>descriptor and curl writing to a different one, and neither 
>progressing.
>
>Personally I avoid .communicate and do more work myself, largerly to 
>know precisely what is going on with my subprocesses.
>
>The difficulty with .communicate is that Python must read both stderr 
>and stdout separately, but it will be doing that sequentially: read 
>one, then read the other. That is just great if the command is "short" 
>and writes a small enough amount of data to each. The command runs, 
>writes, and exits. Python reads one and sees EOF after the data, 
>because the command has exited. Then Python reads the other and 
>collects the data and sees EOF because the command has exited.
>
>However, if the output of the command is large on whatever stream 
>Python reads _second_, the command will stall writing to that stream. 
>This is because Python is not reading the data, and therefore the 
>buffers fill (stdio in curl plus the buffer in the pipe). So the 
>command ("curl") stalls waiting for data to be consumed from the 
>buffers. And because it has stalled, the command does not exit, and 
>therefore Python does not see EOF on the _first_ stream. So it sits 
>waiting for more data, never reading from the second stream.
>
>[...snip...]
>> cmd='[r" curl -sS '
>> #cmd=cmd+'-A  "Mozilla/5.0 (X11; Linux x86_64; rv:38.0)
>>Gecko/20100101 Firefox/38.0"'
>> cmd=cmd+"-A  '"+user_agent+"'"
>> ##cmd=cmd+'   --cookie-jar '+cname+' --cookie '+cname+'    '
>> cmd=cmd+'   --cookie-jar '+ff+' --cookie '+ff+'    '
>> #cmd=cmd+'-e "'+referer+'"   -d "'+tt+'"  '
>> #cmd=cmd+'-e "'+referer+'"    '
>> cmd=cmd+"-L '"+url1+"'"+'"]'
>> #cmd=cmd+'-L "'+xx+'" '
>
>Might I recommand something like this:
>
> cmd_args = [ 'curl', '-sS' ]
> cmd_args.extend( [ '-A', user_agent ] )
> cmd_args.extend( [ '--cookie-jar', ff, '--cookie', ff ] )
> cmd_args.extend( [ '-L', url ]
>
>and using shell=False. This totally avoids any need to "quote" strings 
>in the command, because the shell is not parsing the string - you're 
>invoking "curl" directly instead of asking the shell to read a string 
>and invoke "curl" for you.
>
>Constructing shell commands is tedious and fiddly; avoid it when you 
>don't need to.
>
>> try_=1
>
>It is preferable to say:
>
> try_ = true
>
>> while(try_):
>
>You don't need and brackets here:
>
> while try_:
>
>More readable, because less punctuation.
>
>>   proc=subprocess.Popen(cmd,
>>shell=True,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
>
> proc = subprocess.Popen(cmd_args,
>          stdout=subprocess.PIPE,
>          stderr=subprocess.PIPE)
>
>>   s,err=proc.communicate()
>>   s=s.strip()
>>   err=err.strip()
>>   if(err==0):
>>     try_=''
>
>It is preferable to say:
>
> try_ = False
>
>Also, you should be looking at proc.returncode, _not_ err. Many 
>programs write informative messages to stderr, and a nonempty stderr 
>does not imply failure.
>
>instead, all programs set their exit status to 0 for success and to 
>various nonzero values for failure. So check:
>
> if proc.returncode == 0:
>   try_ = False
>
>Or you could bypass try_ altogether and go:
>
> while True:
>   ... subprocess ...
>   if proc.returncode == 0:
>     break
>
>That may not fit your larger scheme.
>
>Cheers,
>Cameron Simpson <cs at zip.com.au>
>_______________________________________________
>Tutor maillist  -  Tutor at python.org
>To unsubscribe or change subscription options:
>https://mail.python.org/mailman/listinfo/tutor