Using Python for programming algorithms

Sun May 18 15:50:30 EDT 2008

On May 18, 4:20 pm, David C. Ullrich <dullr... at sprynet.com> wrote:

> Are you going to be doing research _about_ the
> algorithms in question or is it going to be research
> _using_ these algorithms to draw conclusions
> about other things?
>
> Most of the replies seem to be assuming the latter.
> If it's the former then Python seems like definitely
> an excellent choice - when you have want to try
> something new it will be much faster trying it
> out in Python,

I second this. Hence my previous statement that "In scientific
research, CPU time is cheap and time spent programming is expensive."
If it was not clear what I meant, your post can serve as a
clarification. But whether Giner is 'developing' or 'using'
algorithms, he should value his own labour more than the CPU's. CPU
labour (i.e. computation) is very cheap. Manual labour (i.e.
programming) is very expensive. He may in any case benefit from using
Python. Today, the preferred computer language amount scientists is
not Fortran77, but various high-level languages like Matlab, S, IDL,
Perl and Python.

A related question is: How much 'speed' is really needed? If Giner is
analyzing datasets using conventional statistics (ANOVA, multiple
regression, etc.), when will Python (with NumPy) cease to be
sufficient? In my experience, conventional statistics on a dataset of
100,000 or 1,000,000 samples can be regarded child's play on a modern
desktop computer. One really need HUGE amounts of data before it's
worthwhile to use anything else. If one can save a couple of seconds
CPU time by spending several hours programming, then the effort is not
just futile, it's downright wasteful and silly.

Something else that should be mentioned:

The complexity of the algorithm (the big-O notation) is much more
important for runtime performance than the choice of language. If you
can replace a O(N*N) with O(N log N), O(N) or O(1) it is always
adviceable to do so. An O(N*N) algorithm implemented in C is never
preferred over an O(N) algorithm written in Python. The only time when
C is preferred over Python is when N is large, but this is also when
O(N*N) is most painful. Pay attention to the algorithm is things are
running unbearably slow.

Python has highly tuned datatypes like lists, dicts and sets, which a
C programmer will have a hard time duplicating. This also applies to
built-in algorithms like 'timsort'. qsort in the C standard library or
anything a C programmer can whip up within a reasonable amount of time
simply doesn't compare. C vs. Python benchmarks that doesn't take this
into account will falsely put Python in a bad light.