creating pipelines in python

Lie Ryan lie.1296 at gmail.com
Sun Nov 22 21:28:56 EST 2009


per wrote:
> hi all,
> 
> i am looking for a python package to make it easier to create a
> "pipeline" of scripts (all in python). what i do right now is have a
> set of scripts that produce certain files as output, and i simply have
> a "master" script that checks at each stage whether the output of the
> previous script exists, using functions from the os module. this has
> several flaws and i am sure someone has thought of nice abstractions
> for making these kind of wrappers easier to write.
> 
> does anyone have any recommendations for python packages that can do
> this?
> 
> thanks.

You're currently implementing a pseudo-pipeline: 
http://en.wikipedia.org/wiki/Pipeline_%28software%29#Pseudo-pipelines

If you want to create a unix-style, byte-stream-oriented pipeline, have 
all scripts write output to stdout and read from stdin (i.e. read with 
raw_input and write with print). Since unix pipeline's is byte-oriented 
you will require parsing the input and formatting the output from/to an 
agreed format between each scripts. A more general approach could use 
more than two streams, you can use file-like objects to represent stream.

For a more pythonic pipeline, you can rewrite your scripts into 
generators and use generator/list comprehension that reads objects from 
a FIFO queue and write objects to another FIFO queue (queue can be 
implemented using list, but take a look at Queue.Queue in standard 
modules). Basically an Object Pipeline: 
http://en.wikipedia.org/wiki/Pipeline_%28software%29#Object_pipelines

For unix-style pipeline, you shell/batch scripts is the best tool, 
though you can also use subprocess module and redirect the process's 
stdin's and stdout's. For object pipeline, it can't be simpler than 
simply passing an input and output queue to each scripts.

For in-script pipelines (c.f. inter-script pipeline), you can use 
generator/list comprehension and iterators. There are indeed several 
modules intended for providing slightly neater syntax than 
comprehension: http://code.google.com/p/python-pipeline/ though I 
personally prefer comprehension.



More information about the Python-list mailing list