creating pipelines in python

per perfreem at gmail.com
Wed Nov 25 11:42:01 EST 2009


Thanks to all for your replies.  i want to clarify what i mean by a
pipeline.  a major feature i am looking for is the ability to chain
functions or scripts together, where the output of one script -- which
is usually a file -- is required for another script to run.  so one
script has to wait for the other.  i would like to do this over a
cluster, where some of the scripts are distributed as separate jobs on
a cluster but the results are then collected together.  so the ideal
library would have easily facilities for expressing this things:
script X and Y run independently, but script Z depends on the output
of X and Y (which is such and such file or file flag).

is there a way to do this? i prefer not to use a framework that
requires control of the clusters etc. like Disco, but something that's
light weight and simple. right now ruffus seems most relevant but i am
not sure -- are there other candidates?

thank you.

On Nov 23, 4:02 am, Paul Rudin <paul.nos... at rudin.co.uk> wrote:
> per <perfr... at gmail.com> writes:
> > hi all,
>
> > i am looking for a python package to make it easier to create a
> > "pipeline" of scripts (all in python). what i do right now is have a
> > set of scripts that produce certain files as output, and i simply have
> > a "master" script that checks at each stage whether the output of the
> > previous script exists, using functions from the os module. this has
> > several flaws and i am sure someone has thought of nice abstractions
> > for making these kind of wrappers easier to write.
>
> > does anyone have any recommendations for python packages that can do
> > this?
>
> Not entirely what you're looking for, but the subprocess module is
> easier to work with for this sort of thing than os. See e.g. <http://docs.python.org/library/subprocess.html#replacing-shell-pipeline>




More information about the Python-list mailing list