[Tutor] subprocess.Popen / proc.communicate issue
Cameron Simpson
cs at zip.com.au
Thu Mar 30 18:43:06 EDT 2017
On 30Mar2017 13:51, bruce <badouglas at gmail.com> wrote:
>Trying to understand the "correct" way to run a sys command ("curl")
>and to get the potential stderr. Checking Stackoverflow (SO), implies
>that I should be able to use a raw/text cmd, with "shell=true".
I strongly recommend avoiding shell=True if you can. It has many problems. All
stackoverflow advice needs to be considered with caution. However, that is not
the source of your deadlock.
>If I leave the stderr out, and just use
> s=proc.communicate()
>the test works...
>
>Any pointers on what I might inspect to figure out why this hangs on
>the proc.communicate process/line??
When it is hung, run "lsof" on the processes from another terminal i.e. lsof
the python process and also lsof the curl process. That will make clear the
connections between them, particularly which file descriptors ("fd"s) are
associated with what.
The run "strace" on the processes. That shoud show you what system calls are in
progress in each process.
My expectation is that you will see Python reading from one file descriptor and
curl writing to a different one, and neither progressing.
Personally I avoid .communicate and do more work myself, largerly to know
precisely what is going on with my subprocesses.
The difficulty with .communicate is that Python must read both stderr and
stdout separately, but it will be doing that sequentially: read one, then read
the other. That is just great if the command is "short" and writes a small
enough amount of data to each. The command runs, writes, and exits. Python
reads one and sees EOF after the data, because the command has exited. Then
Python reads the other and collects the data and sees EOF because the command
has exited.
However, if the output of the command is large on whatever stream Python reads
_second_, the command will stall writing to that stream. This is because Python
is not reading the data, and therefore the buffers fill (stdio in curl plus the
buffer in the pipe). So the command ("curl") stalls waiting for data to be
consumed from the buffers. And because it has stalled, the command does not
exit, and therefore Python does not see EOF on the _first_ stream. So it sits
waiting for more data, never reading from the second stream.
[...snip...]
> cmd='[r" curl -sS '
> #cmd=cmd+'-A "Mozilla/5.0 (X11; Linux x86_64; rv:38.0)
>Gecko/20100101 Firefox/38.0"'
> cmd=cmd+"-A '"+user_agent+"'"
> ##cmd=cmd+' --cookie-jar '+cname+' --cookie '+cname+' '
> cmd=cmd+' --cookie-jar '+ff+' --cookie '+ff+' '
> #cmd=cmd+'-e "'+referer+'" -d "'+tt+'" '
> #cmd=cmd+'-e "'+referer+'" '
> cmd=cmd+"-L '"+url1+"'"+'"]'
> #cmd=cmd+'-L "'+xx+'" '
Might I recommand something like this:
cmd_args = [ 'curl', '-sS' ]
cmd_args.extend( [ '-A', user_agent ] )
cmd_args.extend( [ '--cookie-jar', ff, '--cookie', ff ] )
cmd_args.extend( [ '-L', url ]
and using shell=False. This totally avoids any need to "quote" strings in the
command, because the shell is not parsing the string - you're invoking "curl"
directly instead of asking the shell to read a string and invoke "curl" for
you.
Constructing shell commands is tedious and fiddly; avoid it when you don't need
to.
> try_=1
It is preferable to say:
try_ = true
> while(try_):
You don't need and brackets here:
while try_:
More readable, because less punctuation.
> proc=subprocess.Popen(cmd,
>shell=True,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
proc = subprocess.Popen(cmd_args,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
> s,err=proc.communicate()
> s=s.strip()
> err=err.strip()
> if(err==0):
> try_=''
It is preferable to say:
try_ = False
Also, you should be looking at proc.returncode, _not_ err. Many programs write
informative messages to stderr, and a nonempty stderr does not imply failure.
instead, all programs set their exit status to 0 for success and to various
nonzero values for failure. So check:
if proc.returncode == 0:
try_ = False
Or you could bypass try_ altogether and go:
while True:
... subprocess ...
if proc.returncode == 0:
break
That may not fit your larger scheme.
Cheers,
Cameron Simpson <cs at zip.com.au>
More information about the Tutor
mailing list