[Tutor] Performance of Python loops vs multiple MySQL queries

Bernard Lebel 3dbernard at gmail.com
Thu Dec 22 15:53:24 CET 2005


On 12/21/05, Kent Johnson <kent37 at tds.net> wrote:
> Bernard Lebel wrote:
> > Hello,
> >
> > Finally, after a year and a half of learning and messing around with
> > Python, I'm writing THE code that made learn Python in the first
> > place: a render farm client management software. I may have several
> > questions regarding this, but for now I only have one.
>
> Congratulations Bernard, you have come a long way!

[Bernard] Thanks a lot Kent, without and all the other gurus of this
list, I wouldn't made it that far!

> >
> > The script I'm writing is the client script that runs on render nodes.
> > It checks a database for a job to do, then based on some factors like
> > pooling, priority, age and stuff, will determine what job it can work
> > on. The client connects, performs evaluation of jobs, get a job,
> > update the database, and starts the actual job.
> >
> > Now, there might be up to 80 clients connecting to the database at any
> > moment to get a job to do. So understandably, I want the "evaluation"
> > step to be as fast as possible.
> >
> > Right now, my script works this way: it selects a bunch of rows based
> > on a few factors. Then when this is done, everything else is done in
> > the script, that is, doesn't rely on a MySQL query. The script builds
> > a variety of sorted lists and dictionaries, to ultimately end up with
> > a single job.
> >
> > So I am wondering this: in case there are only a handful of jobs to
> > evaluate, then I understand the script may run fast. But if it has to
> > build lists and dictionary on thousands of jobs, then I'm affrait that
> > it might become slower than simply running a series of queries to the
> > database using various ordering schemes.
> >
> >
> > Any advice on this?
>
> I haven't looked at your code closely so I will just offer some general advice.
>
> - Don't assume there is going to be a problem.

[Bernard] Okay perhaps by "problem" I have not been very accurate. I
meant "sync" problems. You see, when the script finds a job, it makes
updates in the database, that is, it adds an entry into another table,
and updates a certain field in the main jobs table. Other clients then
testing if there is something to do rely on information that must be
totally up-to-date. I just wanted to make sure I would not run into
the case of multiple clients getting incorrect results because of not
so up-to-date informations. Perhaps I should investigate table locks?


> Python dicts are very fast - they are the data structure underlying namespaces and they
> have been heavily optimized for years.

[Bernard] Okay, good to know!


> - Measure! The only way to truly answer your question is to try it both ways and time it.

[Bernard] You are right.

>
> My guess is that the dictionary approach will be faster. I assume the database is on a
> remote host since it is serving multiple clients. So at a minimum you will have the
> network round-trip delay for each query.
>
> - Your getJob() code seems to use some variables before they are assigned, such as
> tPoolIDs and aJob. Is this working code? Also it would be easier to read if you broke it
> up into smaller functions that each do a small piece of the problem.

[Bernard] This is not working code. tPoolIDs is bound after the first
query of the function, but aJob is an error of mine.

Indeed I could break down the getJob() function into smaller
functions. It's just that since the class is already having a fair
amount of methods and this is becoming some long code, I wanted to
keep everything into a single function.

Also there was a consideration of performance. I have one question on
the topic breaking code into small functions and performance. I have
read somewhere that *any* call whatoever, that is, methods, functions
and such, involve a performance cost. Is that right?
In the case it is true, the performance deterioration would be
proportional with the number of calls being made, so the larger the
number of iterations and the more function calls, the slower the code
would run, is that correct?


Thanks
Bernard

>
> Kent
> >
> >
> > Here is a link to the current code (that build lists and
> > dictionaries). Keep in mind it's in early alpha stage. The section to
> > look for is the function getJob(), that starts at line 776. I have
> > changed the extention to txt for security purposes.
> > http://www.bernardlebel.com/scripts/nonxsi/farmclient_2.0_beta03.txt
> >
> >
> >
> > Thanks in advance
> > Bernard


More information about the Tutor mailing list