Using Python for processing of large datasets (convincing managment)
Paul Rubin
phr-n2002b at NOSPAMnightsong.com
Sun Jul 7 07:19:20 EDT 2002
Thomas Jensen <spam at ob_scure.dk> writes:
> Yes and no. In my estimate, the primary reason for the slowness of the
> current job is inefficient dataaccess and datahandling.
> A few examples:
> * Using "SELECT TOP 1 value FROM T_MyTable ORDER BY value DESC" to get
> maximum value.
> * linear seraches
> * LOTS of SQL calls returning only oe row
>
> When the job was originally written a lot of factors were not known
> and, well, it did it's job.
> However now the amount of data requires better performance.
It sounds like your applciation would speed up a lot by judiciously
adding some indices to your tables. Talk to your DBA.
> >>* Improved scalability - parallel processing on multiple
> > machines/CPUs This might be more easily accomplished with Java,
> > depending on exactly how you intend to implement it. Java is
> > probably the best tool for distributed processing; in particular
> > JINI is ideal for this kind of thing.
>
> I Don't know much Java I must admit. However for my needs I belive
> XMLRPC will do just fine.
It doesn't sound to me like you need anything like this. Reorganizing
your database may completely solve your problem.
More information about the Python-list
mailing list