[Baypiggies] clustering

Thu Aug 31 00:39:08 CEST 2006

On 8/30/06, Shannon -jj Behrens <jjinux at gmail.com> wrote:
>
> paralizable pieces, but I don't understand the admin side of it.  My
> current data set is about 16 gigs, and I need to do things like run
> filters over strings, make sure strings are unique, etc.  I'll be
> using Python wherever possible.

Sounds like fun :-)

* Do I have to run a particular Linux distro?  Do they all have to be
> the same, or can I just setup a daemon on each machine?

You can use just about any linux distro - it's easier if all the 'compute'
nodes run the same distro.  This allows you to boot the nodes via tftp and
only have 1 'compute root image' to juggle.

* What does "Beowulf" do for me?

It's the basic cluster infra-structure

* How do I admin all the boxes without having to enter the same command n
> times?

tftp boot with a single 'compute' image.   There are also a bunch of cluster
admin tools - check freshmeat
(also tools for building images for cluster nodes)

* I've heard that MPI is good and standard.  Should I use it?  Can I
> use it with Python programs?

I've never worked with it - but it does appear to be the 'standard' for
cluster work.

* Is there anything better than NFS that I could use to access the data?

personally, I just s/NFS/Samba/ these days.  Given some higher end hardware,
you might want to look at GFS ?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/baypiggies/attachments/20060830/ec8d67e0/attachment.html