creating pipelines in python

Stefan Behnel stefan_ml at behnel.de
Wed Nov 25 16:46:58 EST 2009


per, 25.11.2009 17:42:
> Thanks to all for your replies.  i want to clarify what i mean by a
> pipeline.  a major feature i am looking for is the ability to chain
> functions or scripts together, where the output of one script -- which
> is usually a file -- is required for another script to run.  so one
> script has to wait for the other.  i would like to do this over a
> cluster, where some of the scripts are distributed as separate jobs on
> a cluster but the results are then collected together.  so the ideal
> library would have easily facilities for expressing this things:
> script X and Y run independently, but script Z depends on the output
> of X and Y (which is such and such file or file flag).
> 
> is there a way to do this? i prefer not to use a framework that
> requires control of the clusters etc. like Disco, but something that's
> light weight and simple. right now ruffus seems most relevant but i am
> not sure -- are there other candidates?

As others have pointed out, a Unix pipe approach might be helpful if you
want the processes to run in parallel. You can send the output of one
process to stdout, a network socket, an HTTP channel or whatever, and have
the next process read it and work on it while it's being generated by the
first process.

Looking into generators is still a good idea, even if you go for a pipe
approach. See the link posted by Wolodja Wentland.

Stefan



More information about the Python-list mailing list