Counting iterations

Sat Apr 9 14:17:27 EDT 2005

runes wrote:
> You should avoid the "a" + "b" + "c" -kind of concatenation. As strings
> at immutable in Python you actually makes copies all the time and it's
> slow!

The OP wrote

    print "pet" + "#" + num_pets

(properly str(num_pets) )

You recommended the "alternative used in Steven Bethard's example"

     print 'pet#%i' % (i + 1)

because "it's slow".  I disagree, it isn't for this code.
It's comparable in performance to interpolation and most
of the time is spent in converting int -> string.  Indeed
if the object to be merged is a string then the addition
version is faster than interpolation.

Here's the details.

The string concatenation performance that you're
talking about doesn't hit until there are multiple
appends to the same string, where "multiple" is rather
more than 2.  The advice usually applies to things like

  text = ""
  for line in open(filename, "U"):
    text += line

which is much slower than, say
  lines = []
  for line in open(filename, "U")
    lines.append(line)
  text = "".join(lines)

or the more modern
  text = "".join(open(filename, "U"))

to say nothing of
  text = open(filename, "U").read()  :)

Anyway, to get back to the example at hand,
consider what happens in 

  "pet#%i" % (i+1)

(NOTE: most times that's written %d instead of %i)

The run-time needs to parse the format string
and construct a new string from the components.
Internally it does the same thing as 

   "pet#" + str(i+1)

except that it's done at the C level instead
Python and the implementation overallocates
100 bytes so there isn't an extra allocation
in cases like this.

Personally I would expect the "%" code to be
about the same performance as the "+" code.

Of course the real test is in the timing. 
Here's what I tried.  NOTE: I reformatted by
hand to make it more readable.  Line breaks and
the \ continuation character may have introduced
bugs.

First, the original code along with the 'str()'
correction.

% python /usr/local/lib/python2.3/timeit.py -s \
'pets = ["cat", "dog", "bird"]' \
'num_pets=0' 'for pet in pets:' \
'  num_pets += 1' \
'  s="pet" + "#" + str(num_pets)'
100000 loops, best of 3:
14.5 usec per loop

There's no need for the "pet" + "#" so I'll
turn that into "pet#"

% python /usr/local/lib/python2.3/timeit.py -s \
'pets = ["cat", "dog", "bird"]' \
'num_pets=0' \
'for pet in pets:' \
' num_pets += 1' \
'  s="pet#" + str(num_pets)' 
100000 loops, best of 3: 12.8 usec per loop 

That's 1.3 extra usecs.

By comparison here's the "%" version.

% python /usr/local/lib/python2.3/timeit.py -s \
'pets = ["cat", "dog", "bird"]'\
'num_pets=0' \
'for pet in pets:' \
' num_pets += 1' \
'  s="pet#%s" % num_pets' 
100000 loops, best of 3: 10.8 usec per loop

I'm surprised that it's that much faster - a
good 2 microseconds and that isn't the only
code in that loop.

But both the "%" and "+" solutions need to
convert the number into a string.  If I
use an existing string I find

% python /usr/local/lib/python2.3/timeit.py -s \
'pets = ["cat", "dog", "bird"]' \
'num_pets=0' \
'for pet in pets:' \
' num_pets += 1' \
'  s="pet#" + pet'
100000 loops, best of 3: 4.62 usec per loop

So really most of the time - about 8 usec - is
spent in converting int -> string and the
hit for string concatenation or interpolation
isn't as big a problem.

Compare with the string interpolation form
of the last version

% python /usr/local/lib/python2.3/timeit.py -s \
'pets = ["cat", "dog", "bird"]' \
'num_pets=0' 'for pet in pets:' \
' num_pets += 1' \
'  s="pet#%s" % pet' \
100000 loops, best of 3: 7.55 usec per loop

In this case you can see that the % version is
slower (by 2usec) than the + version.

I therefore disagree with the idea that simple
string concatenation is always to be eschewed
over string interpolation.

				Andrew
				dalke at dalkescientific.com