Renting CPU time for a Python script
Fernando Pérez
fperez528 at yahoo.com
Fri Jul 19 14:00:55 EDT 2002
Zachary Bortolot wrote:
> Hello Everyone,
>
> I am a graduate student conducting research on using computer vision
> techniques to process digital air photos. As part of my research I am
> using a genetic optimization routine that I wrote in Python to find
> values for several key parameters. Unfortunately the genetic
> algorithm is quite CPU intensive, and each run requires anywhere from
> five to twelve days to complete on a 450MHz Pentium II. This is
> problematic since I have to run the program in a computing lab that is
> frequently used for teaching, which means that I am often unable to
> complete my runs. I would like to know if anyone has any suggestions
> on where I might go to rent CPU time that is Python and PIL friendly
> (the university I am at does have an AIX-based mainframe, but Python
> is not installed and users are only given 600k of disk space). Speed
> is not a major issue for me and the program does not use a lot of
> memory or disk space. However, stability is a definite must. Thanks
> in advance for any advice or suggestions!
5 to 12 days is a lot. Have you profiled this thing to find the bottlenecks?
At that point a few weeks spent on writing the time-critical parts in C would
be well spent time, I think. On the other hand if you've already optimized
this to death and 5 to 12 days is the best you can get, good luck.
At any rate, any algorithm that takes that long to run should incorporate
checkpointing and restarting capabilities. That way if a machine crashes or
your lab stops your runs, you only lose a few hours of cpu time. If you can
save the state of the code in an object, you can even (using pickle) very
easily move a job from one machine to another in mid-run. The stability of
your environment should be a non-issue in terms of the survival of your runs.
So I'd recommend:
1- code in automatic checkpointing and self-restarting abilities. It's fairly
easy to do, and saves a lot of headaches.
2- profile your code. See if there are bottlenecks in python left. Without
seeing your code I can't say, but if you have numerical bottlenecks, Numeric
might help. If not, look at weave (scipy.org). It's often enough and faster
than writing a full blown extension by hand. Pyrex is another option. Finally
writing the extension yourself isn't really that difficult, SWIG is pretty
good (weave can also help there quite a bit).
3- Once your code is optimized in C and self-restarting, you can distribute
runs over that lab without any problem. Just have your jobs get their state
from a server on the network so they can migrate automatically, and you
should be able to 'fire and forget'. With a central job manager keeping track
of what's been done, you can just set runs for a month and collect the
results at the end.
Cheers,
f.
More information about the Python-list
mailing list