Using Python for processing of large datasets (convincing managment)

Paul Rubin phr-n2002b at NOSPAMnightsong.com
Sun Jul 7 07:19:20 EDT 2002


Thomas Jensen <spam at ob_scure.dk> writes:
> Yes and no. In my estimate, the primary reason for the slowness of the
> current job is inefficient dataaccess and datahandling.
> A few examples:
> * Using "SELECT TOP 1 value FROM T_MyTable ORDER BY value DESC" to get
> maximum value.
> * linear seraches
> * LOTS of SQL calls returning only oe row
> 
> When the job was originally written a lot of factors were not known
> and, well, it did it's job.
> However now the amount of data requires better performance.

It sounds like your applciation would speed up a lot by judiciously
adding some indices to your tables.  Talk to your DBA.

> >>* Improved scalability - parallel processing on multiple
> > machines/CPUs This might be more easily accomplished with Java,
> > depending on exactly how you intend to implement it.  Java is
> > probably the best tool for distributed processing; in particular
> > JINI is ideal for this kind of thing.
> 
> I Don't know much Java I must admit. However for my needs I belive
> XMLRPC will do just fine.

It doesn't sound to me like you need anything like this.  Reorganizing
your database may completely solve your problem.



More information about the Python-list mailing list