Using Python for processing of large datasets (convincing managment)

Thomas Jensen spam at ob_scure.dk
Mon Jul 8 17:43:38 EDT 2002


Cameron Laird wrote:
> Me, too.  While I know quite well how difficult
> it is to describe any program that's worth wri-
> ting, what we've heard of this one puzzles me.
> I'll summarize by saying simply that I'm with
> Paul:  I *strongly* suspect that database opera-
> tions swamp arithmetic operations in elapsed
> time, and that attention to the former will be
> most rewarding.

I have, on purpose, not described the workings of this program in very 
great detail, since my original post was more about the general idea of 
using Python for this kind of job. Having easy access to distributed 
computations is merely a bonus and, if nothing else, a buzz-word to 
mention to managment *hint*.
Furthermore, the ability to scale the application simply gives a good 
feeling, even if it is *never* needed.

> You've mentioned once already that you might do
> more with your SQL.  I can imagine that much the
> greatest returns in performance will come from
> writing more of your algorithms in SQL.  That's
> likely to be a more scalable and satisfying ap-
> proach than the multi-processing complexities at
> which you've hinted.

Satisfying, perhaps, but could you elaborate on scalable?

I simply fail to see how it is that distributed computing is so bad? 
Everybody seems to think that once you make something distributed, every 
other optimization posibility simply disapear?
I never said distributed computing was a priority or even would be a 
part of the first version. I *is* a design goal however, that should we 
one day, after all other optimizations in the world, using SQL, need 
more speed, we can do so by adding machines/CPUs (be it DB servers or 
application servers).

-- 
Best Regards
Thomas Jensen
(remove underscore in email address to mail me)




More information about the Python-list mailing list