Strange Execution Times

John Machin sjmachin at lexicon.net
Thu May 26 05:28:16 EDT 2005


curi42 at gmail.com wrote:
> I am running two functions in a row that do the same thing.

1. I see no functions here.

You should set out a script like this:

def main():
     your_code_goes_here()

if __name__ == '__main__':
     main()

for two reasons (a) your code will be referring to locals instead of 
globals; this is faster, which might appeal to you (b) if somebody 
accidentally imports the script, nothing happens.

2. The two loops to which you refer do *not* do the same thing; see later.

> One runs
> in .14 seconds, the other 56.  I'm confused.  I wrote another version
> of the program and couldn't get the slow behavior again, only the fast.
>  I'm not sure what is causing it.  Can anyone figure it out?
> 
> Here is my code (sorry it's a bit of a mess, but my cleaned up version
> isn't slow!).  Just skim to the bottom where the timing is.  The first
> time printed out is .14, the seond is 56.56.
> 
> 

[snip]


[following has extraneous blank lines and comments removed]
> t1 = time.clock()
> for j in r:
>     digest = m(jo(s[starts[j]+1:ends[j]])).hexdigest()
>     a[starts[j]+1:ends[j]] = digest
> t2 = time.clock()
> print "time is", round(t2-t1, 5)
> 
> t1 = time.clock()
> for i in r:
>     data[starts[i]:ends[i]] = \
>         md5(join(s[starts[i]:ends[i]])).hexdigest()
> t2 = time.clock()
> print "second time is", round(t2-t1, 5)

General questions: what platform? what version of Python? how large is 
the file? how much free memory do you have? how many passwords are 
there? what is the average length of a password?

Ignoring the superficial-but-meaningless differences (i vs j, md5 
[aarrgghh!!] vs m), jo vs join), these two loops differ in the following 
respects:

(1) 'data' is a copy of 'a'
(2) the first loop's body is effectively: digest = RHS; LHS = digest 
whereas the 2nd loop's body is: LHS = RHS
(3) the first loop uses starts[j]+1 whereas the second loop uses starts[j]

Item (1) may affect the timing if file is large compared with available 
memory -- could be 'a' has to be swapped out, and 'data' swapped in.

Item (2) should make the 2nd loop very slightly faster, so we'll ignore 
that :-)

Item (3) means you are not comparing like with like. It means that the 
1st loop has less work to do. So this could make an observable 
difference for very short passwords -- but still nothing like 0.14 
compared with 56.

So, some more questions:

The 56.56 is suspiciously precise -- you ran it a few times and it 
printed exactly 56.56 each time?

Did you try putting the 2nd loop first [refer to Item (1) above]?
Did you try putting in a switch so that your script runs either 1st loop 
or 2nd loop but not both? Note that each loop is making its target list 
expand in situ; this may after a while (like inside loop 2) cause the 
memory arena to become so fragmented that swapping will occur. This of 
course can vary wildly depending on the platform; Win95 used to be the 
most usual suspect but you're obviously not running on that.

Some observations:

(1) 's' is already a string, so ''.join(s[x:y]) is a slow way of doing 
s[x:y]

(2) 'a' ends up as a list of one-byte strings, via a very circuitous 
process: a = array.array('c', s).tolist()

A shorter route would be: a = list(s)

However what's wrong with what you presumably tried out first i.e. a = 
array.array('c', s) ?? It doesn't need the final ''.join() before 
writing to disk, and it takes up less memory. NOTE: the array variety 
takes up 1 byte per character. The list variety takes up at least 4 
bytes per character (on a machine where sizeof(PyObject *) == 4); to the 
extent that the file contains characters that are not interned (i.e. not 
   [A-Za-z_] AFAIK), much more memory is required as a separate object 
will be created for each such character. Was it consistently slower?

(3) If memory is your problem, you could rewrite the whole thing to 
simply do one write per password; that way you only need 1.x copy of the 
   file contents in memory, not 2.x.

Hoping some of this helps,
John



More information about the Python-list mailing list