Using Python for processing of large datasets (convincing managment)

Thomas Jensen spam at ob_scure.dk
Sat Jul 6 17:18:02 EDT 2002


William Park wrote:

[snip]

> If your cronjob can tackle 1MB but not 1GB, then I don't think this is
> programming language issue.  Rather, you should look at your algorithm and
> data structure.

I am inclined to agree. The current implementation is very inefficient 
in it's database aceess (lots of small queries with no supporting 
indexes, furthermore the same data is often read multiple times).

> If your company is private for-profit company, then use money argument:
It is.

> 
>     - Anyone who knows Python or Unix shell will have the necessary
>       analytical skills.  And, there can easily be found on
>       <comp.lang.python> or <comp.unix.shell>.

The company is based in Denmark, and I belive that the amount of Danish 
people in the group is rather small?
However I recently heard that some danish universities uses Python as 
the primary language in CS.

>     - Scaling to multiple CPU is OS issue.  Much easier with Linux (no
>       comment on Windows :-)

I've heard that Python threads don't scale (well?) to multiple CPUs ?
Maybe that's only on Windows?
I was planning on (be it Python) using XML-RPC for 
inter-process/-machine communications.

>     - Scaling to GB is algorithm issue.  Python makes development easier,
>       because it's easy to write and read.
> 
> Mostly, he saves money because he will be able to find right people.  The
> fact that they happen to know the right language is just bonus.

Well said.

-- 
Best Regards
Thomas Jensen
(remove underscore in email address to mail me)




More information about the Python-list mailing list