[Baypiggies] I need some help architecting the big picture

Tue Apr 29 07:09:16 CEST 2008

On Mon, Apr 28, 2008 at 3:17 PM, Jeff Younker <jeff at drinktomi.com> wrote:
> > Anyway, so I have all this customer-specifc logic, and all these data
> > pipelines.  How do I pull it together into something an operator would
> > want to use?  Is the idea of an operator appropriate?  I'm pretty sure
> > this is an "operations" problem.
> >
>
>  The pipeline is a product that development delivers to operations.
>  Operations maintains and monitors it.   You do not want to have
>  a system where an operator coordinating it on a a permanent basis.
>  The pipeline should just chug along.  Data gets fed in, information
>  is spit out.
>
>  Pushing pipeline development off to "operations" is a sure way of
>  making your process melt down eventually.  You end up with a
>  system where huge chunks of logic are handled by one group and
>  huge chunks are handled by another, and nobody actually
>  understands how the system works.
>
>  That said, you'll need an interface so that operations can see
>  what is happening with the pipeline.  They need this to trouble
>  shoot the pipeline.  A simple one may just summarize data from logs.
>
>
> >
> > Currently, all of the Python scripts take all their settings on the
> > command line.  I'm thinking that the settings belong in an included
> > Makefile that just contains settings.  By keeping the Python dumb, I'm
> > attempting to follow the "tools, not policy" idea.
> > ...
> >
> > Is there an easier way to share data like database connection
> > information between the Makefile and Python other than passing it in
> > explicitly via command line arguments?
> >
>
>  Command options are just like method arguments.  Too many mandatory
>  ones being passed all over the place are an indication that they need to
>  be replaced by a single entity.  Pass around a reference to a config file
>  instead.
>
>  Use this config file everywhere.  Configuration changes should only be
>  made in one place. Distributing configuration throughout a pipeline
>  system is a recipe for long term failure.
>
>  A Python file that is sourced is a wonderful config file format.  Java
>  style properties files work too.  Simple key-value shell scripts can
>  be eval'd as Python too.  I imagine you already have a config system
>  for your web front end.  Consider re-using that.
>
>  Depending upon how many machines you have interacting you
>  may need a distributed config system.  Publishing a file via HTTP
>  is an easy solution.
>
>
>
> > Does the idea of driving everything from Makefiles make sense?
> >
>
>
>  It sounds to me like a horrible hack that will break down when
>  you start wanting to do recovery and pipeline monitoring.
>
>  Consider writing a simple queue management command.  It It looks
>  for work in one bin, calls an external command to process the work,
>  and then dumps it into the next.  The bins can be as simple as
>  directories:
>
>  File A1 goes into bin A/pending
>  A1 is picked up by job A
>  A/pending/A1 gets moved to A/consuming/A1
>  A/consuming/A1 is processed to B/producing/B1
>  A/consuming/A1 is moved to A/consumed/A1
>  B/producing/B1 is moved to B/pending/B1
>
>  Writing such a simple queue manager should be straight forward.
>  Then your tool chain becomes nothing more than a series of calls
>  to the  managers.   Or you could have each queue command
>  daemonize itself and then poll the queues every so often.
>
>
>
> > I'm having a little bit of a problem with testing.  I don't have a way
> > of testing any Python code that talks to a database because the Python
> > scripts are all dumb about how to connect to the database.  I'm
> > thinking I might need to setup a "pretend" customer with a test
> > database to test all of that logic.
> >
>
>  Standard unit testing stuff should work.  Use mock objects to
>  stub out the database connection.
>
>  I actually do all of my scripts via a little harness that handles
>  all the generic command line setup.  Scripts subclass the
>  tframe.Framework object (which I'm releasing as soon as
>  I'm done with the damn book), and the the script body goes in a
>  run(options, args) method.  Testing involves instantiating the
>  script's Framework class and then poking at it.
>
>  - Jeff Younker - jeff at drinktomi.com -

Ok, this is another approach.  I'm going to have to think about it some more.

Thanks,
-jj

-- 
I, for one, welcome our new Facebook overlords!
http://jjinux.blogspot.com/