[Tutor] Performance of Python loops vs multiple MySQL queries

Kent Johnson kent37 at tds.net
Thu Dec 22 01:28:18 CET 2005


Bernard Lebel wrote:
> Hello,
> 
> Finally, after a year and a half of learning and messing around with
> Python, I'm writing THE code that made learn Python in the first
> place: a render farm client management software. I may have several
> questions regarding this, but for now I only have one.

Congratulations Bernard, you have come a long way!
> 
> The script I'm writing is the client script that runs on render nodes.
> It checks a database for a job to do, then based on some factors like
> pooling, priority, age and stuff, will determine what job it can work
> on. The client connects, performs evaluation of jobs, get a job,
> update the database, and starts the actual job.
> 
> Now, there might be up to 80 clients connecting to the database at any
> moment to get a job to do. So understandably, I want the "evaluation"
> step to be as fast as possible.
> 
> Right now, my script works this way: it selects a bunch of rows based
> on a few factors. Then when this is done, everything else is done in
> the script, that is, doesn't rely on a MySQL query. The script builds
> a variety of sorted lists and dictionaries, to ultimately end up with
> a single job.
> 
> So I am wondering this: in case there are only a handful of jobs to
> evaluate, then I understand the script may run fast. But if it has to
> build lists and dictionary on thousands of jobs, then I'm affrait that
> it might become slower than simply running a series of queries to the
> database using various ordering schemes.
> 
> 
> Any advice on this?

I haven't looked at your code closely so I will just offer some general advice.

- Don't assume there is going to be a problem. Python dicts are very fast - they are the 
data structure underlying namespaces and they have been heavily optimized for years.

- Measure! The only way to truly answer your question is to try it both ways and time it.

My guess is that the dictionary approach will be faster. I assume the database is on a 
remote host since it is serving multiple clients. So at a minimum you will have the 
network round-trip delay for each query.

- Your getJob() code seems to use some variables before they are assigned, such as 
tPoolIDs and aJob. Is this working code? Also it would be easier to read if you broke it 
up into smaller functions that each do a small piece of the problem.

Kent
> 
> 
> Here is a link to the current code (that build lists and
> dictionaries). Keep in mind it's in early alpha stage. The section to
> look for is the function getJob(), that starts at line 776. I have
> changed the extention to txt for security purposes.
> http://www.bernardlebel.com/scripts/nonxsi/farmclient_2.0_beta03.txt
> 
> 
> 
> Thanks in advance
> Bernard
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 
> 




More information about the Tutor mailing list