My script is taking 12 hours+ any suggestions?

Mon Sep 1 16:29:11 EDT 2003

On Sat, 30 Aug 2003 13:08:24 -0400, Ideasman <cpbarton at pacific.net.au> wrote:

>Hi I have a made a script that process normals for a flat shaded 3D mesh's.
>It compares every vert with every other vert to look for verts that can 
>share normals and It takes ages.
n*(n-1)/2 can be big. How big is it in your case?
Others have suggested using a dict to avoid that kind of search. Take the advice,
unless the whole problem is bogus somehow, in which case find that out first.

>
>I'm not asking anyone to rewrite the script- just have a look for any 
>stupid errors that might be sucking up time.
>
Some thoughts:

Check your memory usage. If your data set is huge, you could be paging and killing all
semblance of performance.

Upgrade to 2.3, so you can benefit from its speedups and neato features.

Perhaps go back to whoever produced the source data and see if this apparent editing
of a text file is really the best way to accomplish your end goal.

For more detailed help with the file, post enough of an example and explanation to
spec it exactly. There may be ways to avoid stuff you are doing.

Further thoughts, if you really have to do what you are doing:

Try timing it, so you know where the time is going. (See time, profile, hotshot modules).

Don't use eval to convert strings to numeric values (use int(s) or float(s))
Don't re-evaluate expressions (especially complex ones) even twice in a hot loop.
Don't evaluate constant (with respect to current loop, not necessarily globally constant)
      expressions within a loop (hoist them out to outer loops, and break them into
      subexpressions that can be hoisted out as far as possible.
Don't create and store redundant information.
Don't index through a list when you can iterate through it. Use enumerate if you need an index in parallel.
Don't walk through a sequence to find something if you can use a dict to look it up directly.
Don't leave stuff like str(saneFloat(NormX)) in code you expect to go fast.
      I.e., clean out desperate hacks that you don't really know what they are doing,
      and fix any problems that reemerge properly ;-)
Don't use builtin names like float, str, int, etc. for your own variable or parameter names.
      It will bite you eventually.
If your input text is well formatted, perhaps sorting chunks as text can accomplish some useful grouping.
Consider decorate-sort to group things according to some computed feature value.
Or use a dict with list values as suggested by others.
Look at zip and/or list comprehensions for creating/manipulating sequences.

HTH

Regards,
Bengt Richter