Storing the state of script between steps

F.R. anthra.norell at bluewin.ch
Sat Feb 22 09:56:41 EST 2014


On 02/21/2014 09:59 PM, Denis Usanov wrote:
> Good evening.
>
> First of all I would like to apologize for the name of topic. I really didn't know how to name it more correctly.
>
> I mostly develop on Python some automation scripts such as deployment (it's not about fabric and may be not ssh at all), testing something, etc. In this terms I have such abstraction as "step".
>
> Some code:
>
> class IStep(object):
>      def run():
>          raise NotImplementedError()
>
> And the certain steps:
>
> class DeployStep: ...
> class ValidateUSBFlash: ...
> class SwitchVersionS: ...
>
> Where I implement run method.
> Then I use some "builder" class which can add steps to internal list and has a method "start" running all step one by one.
>
> And I like this. It's loosely coupled system. It works fine in simple cases. But sometimes some steps have to use the results from previous steps. And now I have problems. Before now I had internal dict in "builder" and named it as "world" and passed it to each run() methods of steps. It worked but I disliked this.
>
> How would you solve this problem and how would you do it? I understant that it's more architecture specific question, not a python one.
>
> I bet I wouldn't have asked this if I had worked with some of functional programming languages.

A few months ago I posted a summary of a data transformation framework 
inviting commentary. 
(https://mail.python.org/pipermail/python-list/2013-August/654226.html). 
It didn't meet with much interest and I forgot about it. Now that 
someone is looking for something along the line as I understand his 
post, there might be some interest after all.


My module is called TX. A base class "Transformer" handles the flow of 
data. A custom Transformer defines a method "T.transform (self)" which 
transforms input to output. Transformers are callable, taking input as 
an argument and returning the output:

     transformed_input = T (some_input)

A Transformer object retains both input and output after a run. If it is 
called a second time without input, it simply returns its output, 
without needlessly repeating its job:

     same_transformed_input = T ()

Because of this IO design, Transformers nest:

     csv_text = CSV_Maker (Data_Line_Picker (Line_Splitter (File_Reader 
('1st-quarter-2013.statement'))))

A better alternative to nesting is to build a Chain:

     Statement_To_CSV = TX.Chain (File_Reader, Line_Splitter, 
Data_Line_Picker, CSV_Maker)

A Chain is functionally equivalent to a Transformer:

     csv_text = Statement_To_CSV ('1st-quarter-2013.statement')

Since Transformers retain their data, developing or debugging a Chain is 
a relatively simple affair. If a Chain fails, the method "show ()" 
displays the innards of its elements one by one. The failing element is 
the first one that has no output. It also displays such messages as the 
method "transform (self)" would have logged. (self.log (message)). While 
fixing the failing element, the element preceding keeps providing the 
original input for testing, until the repair is done.

Since a Chain is functionally equivalent to a Transformer, a Chain can 
be placed into a containing Chain alongside Transformers:

     Table_Maker = TX.Chain (TX.File_Reader (), TX.Line_Splitter (), 
TX.Table_Maker ())
     Table_Writer = TX.Chain (Table_Maker, Table_Formatter, 
TX.File_Writer (file_name = '/home/xy/office/addresses-4214'))
     DB_Writer = TX.Chain (Table_Maker, DB_Formatter, TX.DB_Writer 
(table_name = 'contacts'))

Better:

     Splitter = TX.Splitter (TX.Table_Writer (), TX.DB_Writer ())
     Table_Handler = TX.Chain (Table_Maker, Splitter)

     Table_Handler ('home/xy/Downloads/report-4214')  # Writes to both 
file and to DB


If a structure builds up too complex to remember, the method "show_tree 
()" would display something like this:

     Chain
     Chain[0] - Chain
     Chain[0][0] - Quotes
     Chain[0][1] - Adjust Splits
     Chain[1] - Splitter
     Chain[1][0] - Chain
     Chain[1][0][0] - High_Low_Range
     Chain[1][0][1] - Splitter
     Chain[1][0][1][0] - Trailing_High_Low_Ratio
     Chain[1][0][1][1] - Standard Deviations
     Chain[1][1] - Chain
     Chain[1][1][0] - Trailing Trend
     Chain[1][1][1] - Pegs

Following a run, all intermediary formats are accessible:

     standard_deviations = C[1][0][1][1]()

     TM = TX.Table_Maker ()
     TM (standard_deviations).write ()

          0      | 1      | 2     |

          116.49 | 132.93 | 11.53 |
          115.15 | 128.70 | 11.34 |
            1.01 |   0.00 |  0.01 |

A Transformer takes parameters, either at construction time or by means 
of the method "T.set (key = parameter)". Whereas a File Reader doesn't 
get payload passed and may take a file name as input argument, as a 
convenient alternative, a File Writer does take payload and the file 
name must be set by keyword:

     File_Writer = TX.File_Writer (file_name = '/tmp/memos-with-dates-1')
     File_Writer (input)  # Writes file
     File_Writer.set ('/tmp/memos-with-dates-2')
File_Writer ()  # Writes the same thing to the second file



That's about it. I am very pleased with the design. I developed it to 
wrap a growing jungle of existing modules and classes having no 
interconnectability and no common input-output specifications. The 
improvement in terms of work time and resource management is enormous. I 
would share the base class and a few custom classes, reasonably 
autonomous to not require surgical extraction from the jungle.

Writing a custom class requires no more than defining private keywords, 
if any, and writing the method "transform (self)", or "process_record 
(self, record)" if the input is a list of records, which it often is. 
The modular design encourages to have a Transformer do just one simple 
thing, easy to write and easy to debug. Complexity comes from assembling 
simple Transformers in a great variety of configurations.


Frederic




More information about the Python-list mailing list