Performance Issues please help

John Machin sjmachin at lexicon.net
Wed Jun 1 20:58:42 EDT 2005


PyPK wrote:
> I was testing this piece of code and it takes about 24-30 seconds to do
> a look up on a list(m) of size 1000x1000
> m -> list of size 1000x1000
> import time
> print time.ctime()
> box = {}
> for r,row in enumerate(m):
>     for c,e in enumerate(row):
>          if box.has_key(e):
>                params = box[e]
>                box[e] = ( min(c, params[0]), min(r, params[1]),
>                           max(c, params[2]), max(r, params[3] ) )
>           else:
>                 box[e] = [c, r, c, r]
> print time.ctime()
> 
> Can some one tell me what exactly is taking more time here. Is it
> because I am using dictionaries or what is the problem. Can some one
> help me improve this .Is there a better way to write this.
> 

Without gross changes to the algorithm, and in no particular order:
0. Stop worrying. Find something else to do during the 30 seconds.
1. Use psyco, if your [unspecified] platform supports it.
2. Why has_key()? Try "if e in box:" instead, if your [unspecified] 
version of Python supports it. If it doesn't, (a) consider upgrading (b) 
try doing box_has_key = box.has_key outside the loops and using the 
result inside the loops.
3. Ensure this code is inside a function, not at global level in a 
script [not apparent from your post].
4. Outside the loops, put "local_min = min", ditto for max. Then use 
box[e] = (local_min(c, etc etc
5. Upgrade your [unspecified] hardware.
6. Use try/except, if indicated by the [unspecified] percentage of times 
"e in box" is true. Hint: printing len(box) at the end might be useful.
7. Consider using pyrex.
8. Consider using numarray/mumeric.

Info req'd for discussing algorithm changes:
1. How much free memory do you have?
2. What can you tell us about the distribution of "e"?
3. Is "m" a rectangle, or are the rows of variable length?
4. Do you have control over "m" i.e. are you stuck with that data 
structure, or can we design a different one?



More information about the Python-list mailing list