Using Python for processing of large datasets (convincing managment)

Thomas Jensen spam at
Sat Jul 6 17:18:02 EDT 2002

William Park wrote:


> If your cronjob can tackle 1MB but not 1GB, then I don't think this is
> programming language issue.  Rather, you should look at your algorithm and
> data structure.

I am inclined to agree. The current implementation is very inefficient 
in it's database aceess (lots of small queries with no supporting 
indexes, furthermore the same data is often read multiple times).

> If your company is private for-profit company, then use money argument:
It is.

>     - Anyone who knows Python or Unix shell will have the necessary
>       analytical skills.  And, there can easily be found on
>       <comp.lang.python> or <>.

The company is based in Denmark, and I belive that the amount of Danish 
people in the group is rather small?
However I recently heard that some danish universities uses Python as 
the primary language in CS.

>     - Scaling to multiple CPU is OS issue.  Much easier with Linux (no
>       comment on Windows :-)

I've heard that Python threads don't scale (well?) to multiple CPUs ?
Maybe that's only on Windows?
I was planning on (be it Python) using XML-RPC for 
inter-process/-machine communications.

>     - Scaling to GB is algorithm issue.  Python makes development easier,
>       because it's easy to write and read.
> Mostly, he saves money because he will be able to find right people.  The
> fact that they happen to know the right language is just bonus.

Well said.

Best Regards
Thomas Jensen
(remove underscore in email address to mail me)

More information about the Python-list mailing list