Storing the state of script between steps

Fri Feb 21 18:30:59 EST 2014

On 21Feb2014 12:59, Denis Usanov <usanovdd at gmail.com> wrote:
> I mostly develop on Python some automation scripts such as deployment (it's not about fabric and may be not ssh at all), testing something, etc. In this terms I have such abstraction as "step".
> 
> Some code:
> 
> class IStep(object):
>     def run():
>         raise NotImplementedError()
> 
> And the certain steps:
> 
> class DeployStep: ...
> class ValidateUSBFlash: ...
> class SwitchVersionS: ...
> 
> Where I implement run method. 
> Then I use some "builder" class which can add steps to internal list and has a method "start" running all step one by one.
> 
> And I like this. It's loosely coupled system. It works fine in simple cases. But sometimes some steps have to use the results from previous steps. And now I have problems. Before now I had internal dict in "builder" and named it as "world" and passed it to each run() methods of steps. It worked but I disliked this. 

Can you qualify exactly what you dislike about it?

I have a similar system I'm working on which chains operational
steps, and each step can queue multiple following steps.
It is still somewhat alpha.

I think it has pretty much the same state issue that you describe:
you need to keep state around, but passing it to each step feels
clunky: you have this state parameter that you need to maintain and
pass around all the time.

My wishlist for state is twofold; I'd like it to be more implicit,
for example have the state be implicit, such as in the program
scope, and wouldn't it be better to be able to ignore it when you
don't care about the state?

My solution is threefold at present:

First up, the core algorithm/framework always passes the state
variable around. So every "step" function looks somewhat like this:

  def step(self, argument, state):

where "state" is an object instead of a dict; otherwise esssentially
that same as your dict based approach. "argument" it the item to
be worked on in this step; my framework looks like a UNIX shell
pipeline, where arguments are passed down the pipeline from step
to step.

Second, steps which do not care about the state are written like this:

  def step(self, argument)

and installed via a wrapper:

  def step_full(self, argument, state):
    return step(self, argument)

to make it easier to write the simple case.

In your setup, I'd be writing each Step class as a subclass of a
generic step class that incorporates the wrapper:

  class GenericStep:

    def step(self, argument, state):
      return self.stateless_step(argument)

and then classes which do not care about the state would look like
this:

  class SimpleOperation(GenericStep):

    def stateless_step(self, argument):
      ... do stuff with argument ...

and classes which do operate on the state look like this:

  class StepWith SideEffects(GenericStep):

    def step(self, argument, state):
      ... do stuff with argument and modify state ...

From the outside you call .step(....) with the full argument list.
But on the inside your define the method which is the simplest
mapping to what the step does.

That leaves you freer to choose the style for each step function,
using the less cluttered form when you don't care about the state.

Third, in my scheme the return from step() is a sequence of
(new_argument, new_state) tuples because each step can fire multiple
following steps. Depending on the operations in the step, new_argument
might just be the original argument, and new_state will usually be
the original state. But sometimes new_state will be a shallow copy
of the original, with one or more parts deep copied. This is to
accomodate the implicit branching of state you might imagine in a
pipeline: each of the following steps might want its own independent
state from that point onward.

Again, there's some wrapper logic in the GenericStep class to handle the return
value, much as with the stpe() and stateless_step() calls: a simple step might
just think in terms of the argument and ignore the state entirely:

  def one_to_one_step(self, argument):
    ... do stuff with argument ...
    new_argument = ... blah ...
    return new_argument

  def one_to_many_step_with_state(self, argument, state):
    return [ (self.one_to_one_step(argument), state) ]

You can see the "full" step just calls the simple step and then
repackages the result for passing down the pipeline.  This lets you
write the "one_to_one_step" method simply, without clutter.

Hopefully this will give you some ideas for keeping the simple steps simple
while still accomodating the complex cases with state.

> I bet I wouldn't have asked this if I had worked with some of functional programming languages.

Possibly, but you still need state.

Cheers,
-- 
Cameron Simpson <cs at zip.com.au>

Mark Crowder <crowder at spdc.ti.com>: Possessor of a mind not merely twisted,
                                    but actually sprained!