[Tutor] Newbie Wondering About Threads

Sun Dec 7 14:58:52 CET 2008

On Sun, Dec 7, 2008 at 12:33 AM, Martin Walsh <mwalsh at mwalsh.org> wrote:
> I'm not certain this completely explains the poor performance, if at
> all, but the communicate method of Popen objects will wait until EOF is
> reached and the process ends. So IIUC, in your example the process 'p'
> runs to completion and only then is its stdout (p.communicate()[0])
> passed to stdin of 'p2' by the outer communicate call.
>
> You might try something like this (untested!) ...
>
> p1 = subprocess.Popen(
>    ["flac","--decode","--stdout","test.flac"],
>    stdout=subprocess.PIPE, stderr=subprocess.PIPE
> )
> p2 = subprocess.Popen(
>    ["lame","-","test.mp3"], stdin=p1.stdout, # <--
>    stdout=subprocess.PIPE, stderr=subprocess.PIPE
> )
> p2.communicate()

That did the trick!  Got it back down to 20s ... which is what it was
taking on the command line.  Thanks for that!

> Here is my simplistic, not-very-well-thought-out, attempt in
> pseudo-code, perhaps it will get you started ...
>
> paths = ["file1.flac","file2.flac", ... "file11.flac"]
> procs = []
> while paths or procs:
>    procs = [p for p in procs if p.poll() is None]
>    while paths and len(procs) < 2:
>        flac = paths.pop(0)
>        procs.append(Popen(['...', flac], ...))
>    time.sleep(1)

I think I got a little lost with the "procs = [p for p in procs if
p.poll() is None]" statement -- I'm not sure exactly what that is
doing ... but otherwise, I think that makes sense ... will have to try
it out (if not one of the more "robust" thread pool suggestions
(below).

On Sun, Dec 7, 2008 at 2:58 AM, Lie Ryan <lie.1296 at gmail.com> wrote:
> I think when you do that (p2.wait() then p3.wait() ), if p3 finishes
> first, you wouldn't start another p3 until p2 have finished (i.e. until
> p2.wait() returns) and if p2 finishes first, you wouldn't start another
> p2 until p3 finishes (i.e. until p3.wait() returns ).
>
> The solution would be to start and wait() the subprocessess in two
> threads. Use threading module or -- if you use python2.6 -- the new
> multiprocessing module.
>
> Alternatively, you could do a "non-blocking wait", i.e. poll the thread.
>
> while True:
>    if p1.poll(): # start another p1
>    if p2.poll(): # start another p2

Yea, looks like it - I think the trick, for me, will be getting a
dynamic list that can be iterated through ... I experimented a little
with the .poll() function and I think I follow how it is working ...
but really, I am going to have to do a little more "pre-thinking" than
I had to do with the bash version ... not sure if I should create a
class containing the list of flac files or just a number of functions
to handle the list ... whatever way it ends up being, is going to take
a little thought to get it straightened out.  And the objected
oriented part is different than bash -- so, I have to "think
different" too.

On Sun, Dec 7, 2008 at 8:31 AM, Kent Johnson <kent37 at tds.net> wrote:
> A simple way to do this would be to use poll() instead of wait(). Then
> you can check both processes for completion in a loop and start a new
> process when one of the current ones ends. You could keep the list of
> active processes in a list. Make sure you put a sleep() in the polling
> loop, otherwise the loop will consume your CPU!

Thanks for that tip - I already throttled my CPU and had to abort the
first time (without the sleep() function) ... smile.

> Another approach is to use a thread pool with one worker for each
> process. The thread would call wait() on its child process; when it
> finishes the thread will take a new task off the queue. There are
> several thread pool recipes in the Python cookbook, for example
> http://code.activestate.com/recipes/203871/
> http://code.activestate.com/recipes/576576/ (this one has many links
> to other pool implementations)

Oh neat!  I will be honest, more than one screen full of code and I
get a little overwhelmed (at this point) but I am going to check that
idea out.  I was thinking something along these lines, where I can
send all the input/ouput variables along with a number argument
(threads) to a class/function that would then handle everything ... so
using a thread pool may make sense ...

Looks like I would create a loop that went through the list of all the
files to be converted and then sent them all off, one by one, to the
thread pool -- which would then just dish them out so that no more
than 2 (if I chose that) would be converting at a time?  I gotta try
and wrap my head around it ... also, I will be using two subprocesses
to accomplish a single command (one for stdoutput and the other taking
stdinput) as well ... so they have to be packaged together somehow ...
hmm!

Great help everyone.  Not quite as simple as single threading but am
learning quite a bit.  One day, I will figure it out.  Smile.

Damon