Pickle based workflow - looking for advice

Fabien fabien.maussion at gmail.com
Mon Apr 13 13:35:28 EDT 2015


On 13.04.2015 19:08, Peter Otten wrote:
> How about a file-based workflow?
>
> Write distinct scripts, e. g.
>
> a2b.py that reads from *.a and writes to *.b
>
> and so on. Then use a plain old makefile to define the dependencies.
> Whether .a uses pickle, .b uses json, and .z uses csv is but an
> implementation detail that only its producers and consumers need to know.
> Testing an arbitrary step is as easy as invoking the respective script with
> some prefabricated input and checking the resulting output file(s).

I think I like the idea because it is more durable. The data I 
manipulate comes with specific formats which are very efficient. With 
the pickle I was kind of "lazy" and, well, saved a couple of read/write 
routines.

Still, your idea is probably more elegant.

With multiprocessing, do I have to care about processes writing 
simultaneously in *different* files? I guess the OS takes good care of 
this stuff but I'm not an expert.

Tahnks,

Fabien




More information about the Python-list mailing list