Renting CPU time for a Python script

Fri Jul 19 14:00:55 EDT 2002

Zachary Bortolot wrote:

> Hello Everyone,
> 
> I am a graduate student conducting research on using computer vision
> techniques to process digital air photos.  As part of my research I am
> using a genetic optimization routine that I wrote in Python to find
> values for several key parameters.  Unfortunately the genetic
> algorithm is quite CPU intensive, and each run requires anywhere from
> five to twelve days to complete on a 450MHz Pentium II.  This is
> problematic since I have to run the program in a computing lab that is
> frequently used for teaching, which means that I am often unable to
> complete my runs.  I would like to know if anyone has any suggestions
> on where I might go to rent CPU time that is Python and PIL friendly
> (the university I am at does have an AIX-based mainframe, but Python
> is not installed and users are only given 600k of disk space).  Speed
> is not a major issue for me and the program does not use a lot of
> memory or disk space.  However, stability is a definite must.  Thanks
> in advance for any advice or suggestions!

5 to 12 days is a lot. Have you profiled this thing to find the bottlenecks? 
At that point a few weeks spent on writing the time-critical parts in C would 
be well spent time, I think. On the other hand if you've already optimized 
this to death and 5 to 12 days is the best you can get, good luck.

At any rate, any algorithm that takes that long to run should incorporate 
checkpointing and restarting capabilities. That way if a machine crashes or 
your lab stops your runs, you only lose a few hours of cpu time. If you can 
save the state of the code in an object, you can even (using pickle) very 
easily move a job from one machine to another in mid-run. The stability of 
your environment should be a non-issue in terms of the survival of your runs.

So I'd recommend:

1- code in automatic checkpointing and self-restarting abilities. It's fairly 
easy to do, and saves a lot of headaches.

2- profile your code. See if there are bottlenecks in python left. Without 
seeing your code I can't say, but if you have numerical bottlenecks, Numeric 
might help. If not, look at weave (scipy.org). It's often enough and faster 
than writing a full blown extension by hand. Pyrex is another option. Finally 
writing the extension yourself isn't really that difficult, SWIG is pretty 
good (weave can also help there quite a bit).

3- Once your code is optimized in C and self-restarting, you can distribute 
runs over that lab without any problem. Just have your jobs get their state 
from a server on the network so they can migrate automatically, and you 
should be able to 'fire and forget'. With a central job manager keeping track 
of what's been done, you can just set runs for a month and collect the 
results at the end.

Cheers,

f.