Any suggestions?

Mon Sep 25 11:18:32 EDT 2000

Hi,

I just ran your code on my machine (Athlon 650MHz, SuSe 63, Python 1.5.2) and
found that map was faster.

Figures are (test1 is the append and test2 is the map) :-

garyp at python:/tmp > time python test1.py < /usr/share/dict/words

real    0m0.436s
user    0m0.410s
sys     0m0.030s

garyp at python:/tmp > time python test2.py < /usr/share/dict/words

real    0m0.298s
user    0m0.290s
sys     0m0.010s

Did you repeat the test a number of times to ensure disk cache was warm?

I used strace to try and see what the differences were and found that test1
made 25607 system calls during testing whilst test2 made 13643 calls, ie test2
required the system to perform about half the work. The crucial difference is
the mremap call, in test1 there were 24712 mremaps and in test2 only 12406.
This probably accounts for the difference and is (likely) an artifact of the
way in which memory allocation C libraries are implemented on Linux.

In fact even 12406 is far from ideal, it would be nicer still to reduce this
number as close as possible to 1. However, this would require a way to hint
Python about list utilisation (ie size) at creation time (e.g. as you can using
Java collection classes), which AFAIK isn't possible.

I suppose this shows that different platforms will yield different results, I'd
stick to using map as it *SHOULD* be faster on most computers and is cleaner
code. I'd be interested to know what environment you measured your tests in.

Gary

Jon Ribbens wrote:

> Simon Brunning <SBrunning at trisystems.co.uk> wrote:
> > See <http://www.musi-cal.com/~skip/python/fastpython.html> (IIRC) for some
> > hints on this.
>
> I took a look at this, and it all looks very sensible. I decided to try
> out the map-versus-loops thing to see the difference, however, and I am
> completely bemused by the results. Using Python 2.0b1, I tried the
> following code:
>
> import string
> import sys
>
> def func():
>   list = sys.stdin.readlines()
>   newlist = []
>   upper = string.upper
>   append = newlist.append
>   for word in list:
>     append(upper(word))
>
> func()
>
> When run over /usr/share/dict/words, time(1) says:
>
> real    0m5.010s
> user    0m3.992s
> sys     0m0.951s
>
> The map version:
>
> import string
> import sys
>
> list = sys.stdin.readlines()
> newlist = map(string.upper, list)
>
> takes:
>
> real    0m5.775s
> user    0m2.893s
> sys     0m2.823s
>
> The map version is *slower*! I had expected it to be considerably faster.
> I am also somewhat bemused by the increase in the 'sys' time.
> What's going on?
>
> Cheers
>
> Jon