Strange Execution Times

Thu May 26 18:22:32 EDT 2005

Elliot Temple wrote:
[copying Elliot's e-mail reply back to the list because it's educational 
  and scarcely private]

> 
> 
> 
> On 5/26/05, John Machin <sjmachin at lexicon.net> wrote:
>  > curi42 at gmail.com wrote:
>  >
>  > > I am running two functions in a row that do the same thing.
>  >
>  > 1. I see no functions here.
>  >
>  > You should set out a script like this:
>  >
>  > def main():
>  >      your_code_goes_here()
>  >
>  > if __name__ == '__main__':
>  >      main()
>  >
>  > for two reasons (a) your code will be referring to locals instead of
>  > globals; this is faster, which might appeal to you (b) if somebody
>  > accidentally imports the script, nothing happens.
> 
> Oops, I meant code blocks not functions.  Good advice, thanks.
> 
>  > 2. The two loops to which you refer do *not* do the same thing;  see 
> later.
> 
> <snip>
> 
>  > General questions: what platform? what version of Python? how  large is
>  > the file? how much free memory do you have? how many passwords are
>  > there? what is the average length of a password?
> 
> OS X 10.4.1    Python 2.3.5  (I wonder why they bundled an old  
> version..)  The file is 4 megs, about 8000 passwords.  I have 375  megs 
> of RAM free.  the passwords are mostly about 5-6 chars long.

Huh-uh -- evidently (from what you said later) a *GUESS* on the password 
size; measurement on the actual file that you were using would have 
given the answer "Oops, mean = 32, standard dev = 0".

> 
> 
>  > Ignoring the superficial-but-meaningless differences (i vs j, md5
>  > [aarrgghh!!] vs m), jo vs join), these two loops differ in the  
> following
>  > respects:
> 
> Sorry, I wrote a nicer version of the program with things named well,  
> but it was only getting the fast time, so I copied it into the old  
> version of the program and then I had to write join=jo etc to avoid  
> changing it.

Avoid changing what? And did you get the message that doing (in effect)

import md5
m = md5.new
md5 = m

is a horrifyingly dangerous disgusting and ugly stunt?

> 
>  >
>  > (1) 'data' is a copy of 'a'
>  > (2) the first loop's body is effectively: digest = RHS; LHS = digest
>  > whereas the 2nd loop's body is: LHS = RHS
>  > (3) the first loop uses starts[j]+1 whereas the second loop uses  
> starts[j]
> 
> oops, 3 is because the nicer version created a slightly different  index 
> list.  Hey, turns out that matters (see end)
> 
>  > Item (1) may affect the timing if file is large compared with  available
>  > memory -- could be 'a' has to be swapped out, and 'data' swapped in.
>  >
>  > Item (2) should make the 2nd loop very slightly faster, so we'll  ignore
>  > that :-)
> 
> yeah
> 
>  > Item (3) means you are not comparing like with like. It means that  the
>  > 1st loop has less work to do. So this could make an observable
>  > difference for very short passwords -- but still nothing like 0.14
>  > compared with 56.
>  >
>  > So, some more questions:
>  >
>  > The 56.56 is suspiciously precise -- you ran it a few times and it
>  > printed exactly 56.56 each time?
> 
> No, it got 55 or 56 something.
> 
>  >
>  > Did you try putting the 2nd loop first [refer to Item (1) above]?
> 
> Yes, that didn't change which was fast.
> 
>  > Did you try putting in a switch so that your script runs either  1st 
> loop
>  > or 2nd loop but not both?
> 
> No, good idea.     ....  OK tried it, and it didn't change how fast  
> each loop ran.  I also changed it so they both work on the same list  in 
> the version with a switch, and that didn't matter.
> 
>  > Note that each loop is making its target list
>  > expand in situ; this may after a while (like inside loop 2) cause the
>  > memory arena to become so fragmented that swapping will occur.  This of
>  > course can vary wildly depending on the platform; Win95 used to be  the
>  > most usual suspect but you're obviously not running on that.
> 
> Nod
> 
>  > Some observations:
>  >
>  > (1) 's' is already a string, so ''.join(s[x:y]) is a slow way of  doing
>  > s[x:y]
> 
> Oops!  That happened because it used to be ''.join(the_list[x:y]) but  
> then i realised i could just grab sections of the original string but  
> didn't fully change it.
> 
>  > (2) 'a' ends up as a list of one-byte strings, via a very circuitous
>  > process: a = array.array('c', s).tolist()
>  >
>  > A shorter route would be: a = list(s)
> 
> Oh cool.  I looked for a string-to-list function a little, but didn't  
> find that.  I thought I tried that exact one too, but I guess not.

Be aware of list comprehensions; when list(s) escaped your scan of the 
manuals, you could have done this: a = [x for x in s]

NOTE: a string is an iterable! (see later)

> 
>  > However what's wrong with what you presumably tried out first i.e.  a =
>  > array.array('c', s) ?? It doesn't need the final ''.join() before
>  > writing to disk, and it takes up less memory.
> 
> The problem was I couldn't put the new passwords in as a single  
> element.

Indeed. It's annoying enough that you can't write a[x:y] = 'foo' where a 
  is of type array.array('c') that it's reaching the level where I might 
get motivated to submit a patch :-)

>  I guess I could have turned them into arrays, though.
> 
>  > NOTE: the array variety
>  > takes up 1 byte per character. The list variety takes up at least 4
>  > bytes per character (on a machine where sizeof(PyObject *) == 4);  to 
> the
>  > extent that the file contains characters that are not interned  (i.e. 
> not
>  >    [A-Za-z_] AFAIK), much more memory is required as a separate  object
>  > will be created for each such character. Was it consistently slower?
> 
> The second loop is consistently slower in this exact version of the  
> program, even if put first or only one runs.
> 
>  > (3) If memory is your problem, you could rewrite the whole thing to
>  > simply do one write per password; that way you only need 1.x copy  of 
> the
>  >    file contents in memory, not 2.x.
> 
> I actually got my project finished (only 3 files, at a minute each).   
> Then I started to write a faster version of the program for fun.   When 
> it got about 400 times faster, that was a surprise!
> 
> BTW when it takes a minute it keeps a CPU at full use the whole  time.  
> (I have dual 2 ghz G5s).
> 
> ....
> 
> Hey, I just found the difference!  Somehow adding the +1's to the  
> second version makes it fast, and removing them makes it slow again.
> 
> slow:
> data[starts[i]:ends[i]] = md5(join(s[starts[i]:ends[i]])).hexdigest()
> 
> fast:
> data[starts[i]+1:ends[i]] = md5(join(s[starts[i]+1:ends [i]])).hexdigest()
> 
> OK I figured it out:  I accidentally copied an output file to use for  
> my test data.  So all the passwords were already MD5ed, and thus were  
> all exactly 32 chars long.  So setting a slice equal to something of  
> the same length seems to be very fast (even though I'm replacing  
> multiple list elements with a single string).

You are (in this case) replacing multiple list elements with the *same* 
number of elements; not a single string -- see below.

 >>> alist = list('fubar')
 >>> alist
['f', 'u', 'b', 'a', 'r']
 >>> alist[1:4] = 'izz'
 >>> alist
['f', 'i', 'z', 'z', 'r']
 >>> alist[1:4] = 42
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
TypeError: can only assign an iterable
 >>> alist[1:4] = [42]
 >>> alist
['f', 42, 'r']
 >>>

It's very fast because you are not changing the size of the list. A 
Python list is not a linked list; it is implemented as a contiguous 
array of pointers to objects. Your file is about 4MB. Thus on a 32-bit 
machine  your list is taking up about 16MB. Reducing the size means 
copying part of the list. Increasing the size means allocating more 
memory and may mean copying all of the list.

> 
> Thanks again for your help.
> 

Well we ain't finished yet. Let's summarise where we are at:

Your "fast" loop 1 was fast because it was (correctly, given the 
atypical input data) replacing a 32-element slice by a 32-element slice. 
Your loop 2 was slow because it was buggy and replacing a 33-element 
slice by a 32-element slice.

However with *typical* input data and correct code you will be replacing 
a 5-or-6-element slice with a 32-element slice, slowly. So your 
fantastic speed-up was just a fantasy.

You should consider, for both fun and education, writing a version along 
the lines I suggested i.e. have only one (single string) copy of your 
input file in memory.

Then make the small change to use the mmap module and see what effect 
that has.

For extra points (and a useful increment to your education), try using 
the re module, after reading the section in the docs about how the 
'repl' argument to the 'sub' function/method can be a function.

Then post your summarised results back to the newsgroup for the benefit 
of all -- there's this vague hope that folk actually read other peoples' 
posts before firing off questions :-)

Cheers,
John