[Baypiggies] I need some help architecting the big picture
Shannon -jj Behrens
jjinux at gmail.com
Tue Apr 29 07:09:16 CEST 2008
On Mon, Apr 28, 2008 at 3:17 PM, Jeff Younker <jeff at drinktomi.com> wrote:
> > Anyway, so I have all this customer-specifc logic, and all these data
> > pipelines. How do I pull it together into something an operator would
> > want to use? Is the idea of an operator appropriate? I'm pretty sure
> > this is an "operations" problem.
> >
>
> The pipeline is a product that development delivers to operations.
> Operations maintains and monitors it. You do not want to have
> a system where an operator coordinating it on a a permanent basis.
> The pipeline should just chug along. Data gets fed in, information
> is spit out.
>
> Pushing pipeline development off to "operations" is a sure way of
> making your process melt down eventually. You end up with a
> system where huge chunks of logic are handled by one group and
> huge chunks are handled by another, and nobody actually
> understands how the system works.
>
> That said, you'll need an interface so that operations can see
> what is happening with the pipeline. They need this to trouble
> shoot the pipeline. A simple one may just summarize data from logs.
>
>
> >
> > Currently, all of the Python scripts take all their settings on the
> > command line. I'm thinking that the settings belong in an included
> > Makefile that just contains settings. By keeping the Python dumb, I'm
> > attempting to follow the "tools, not policy" idea.
> > ...
> >
> > Is there an easier way to share data like database connection
> > information between the Makefile and Python other than passing it in
> > explicitly via command line arguments?
> >
>
> Command options are just like method arguments. Too many mandatory
> ones being passed all over the place are an indication that they need to
> be replaced by a single entity. Pass around a reference to a config file
> instead.
>
> Use this config file everywhere. Configuration changes should only be
> made in one place. Distributing configuration throughout a pipeline
> system is a recipe for long term failure.
>
> A Python file that is sourced is a wonderful config file format. Java
> style properties files work too. Simple key-value shell scripts can
> be eval'd as Python too. I imagine you already have a config system
> for your web front end. Consider re-using that.
>
> Depending upon how many machines you have interacting you
> may need a distributed config system. Publishing a file via HTTP
> is an easy solution.
>
>
>
> > Does the idea of driving everything from Makefiles make sense?
> >
>
>
> It sounds to me like a horrible hack that will break down when
> you start wanting to do recovery and pipeline monitoring.
>
> Consider writing a simple queue management command. It It looks
> for work in one bin, calls an external command to process the work,
> and then dumps it into the next. The bins can be as simple as
> directories:
>
> File A1 goes into bin A/pending
> A1 is picked up by job A
> A/pending/A1 gets moved to A/consuming/A1
> A/consuming/A1 is processed to B/producing/B1
> A/consuming/A1 is moved to A/consumed/A1
> B/producing/B1 is moved to B/pending/B1
>
> Writing such a simple queue manager should be straight forward.
> Then your tool chain becomes nothing more than a series of calls
> to the managers. Or you could have each queue command
> daemonize itself and then poll the queues every so often.
>
>
>
> > I'm having a little bit of a problem with testing. I don't have a way
> > of testing any Python code that talks to a database because the Python
> > scripts are all dumb about how to connect to the database. I'm
> > thinking I might need to setup a "pretend" customer with a test
> > database to test all of that logic.
> >
>
> Standard unit testing stuff should work. Use mock objects to
> stub out the database connection.
>
> I actually do all of my scripts via a little harness that handles
> all the generic command line setup. Scripts subclass the
> tframe.Framework object (which I'm releasing as soon as
> I'm done with the damn book), and the the script body goes in a
> run(options, args) method. Testing involves instantiating the
> script's Framework class and then poking at it.
>
> - Jeff Younker - jeff at drinktomi.com -
Ok, this is another approach. I'm going to have to think about it some more.
Thanks,
-jj
--
I, for one, welcome our new Facebook overlords!
http://jjinux.blogspot.com/
More information about the Baypiggies
mailing list